r7 - 17 May 2007 - 09:22:31 - ArlinStoltzfusYou are here: CAMEL >  ProteinExpressionProject Web  > WebHome

Welcome to the ProteinExpressionProject web

Proposal draft

What is the problem, and why is it hard?

The production of a specific target protein by a 3-stage process of cloning, expression, and purification is required for a variety of uses in the biomedical and biotechnology industries, such as production of antibodies and protein-based therapeutics. Typically a cloned gene is expressed in a “heterologous” system, e.g., a human gene may be expressed in bacterial cells that can be grown easily in the lab. When the host cells have grown to a high density, the cloned gene is turned on (“induced”), resulting in high levels of expression (“over-expression”) of the protein.

Of the three stages in protein production, cloning of a protein-coding gene is the easiest because DNA molecules are relatively soluble, stable, regular and non-reactive; any gene can be manipulated and purified with standardized tools. By contrast, proteins may be insoluble or unstable, and every one of them is different. Heterologous over-expression of a protein often leads to low yield, toxic effects on the host cells, and insoluble aggregates. This is partly a problem of over-expression, since the same effects are common when a protein-coding gene is over-expressed in a homologous system.

How is it solved today?

Researchers today have a plethora of choices if a protein seems difficult to express, including the following, each of which has achieved some success as recorded in the research literature:
  • Choosing a different host for expression (E. coli, Pichia, baculovirus)
  • Altering growth conditions (temperature, growth medium)
  • Altering induction regime (inducer concentration, induction start and duration)
  • Engineering the cloned sequence (adjust codon use, add epitope tag)
  • Adding specific stabilizing or refolding agents (chaperones, natural ligands)

Typically researchers have a tight focus on a single protein of interest, and they use a trial and error approach to finding successful conditions for over-expression. Because of this, the time and expense involved in this search are difficult to estimate, as is the ultimate chance of success. What is the new technical idea; why can we succeed now? Our approach is to combine the engineering of expression systems with both i) a much needed effort in standards and evaluation; and ii) an integrative, systems-based approach to analysis.

What is the new technical idea; why can we succeed now?

Our approach is to combine the engineering of expression systems with both i) a much needed effort in standards and evaluation; and ii) an integrative, systems-based approach to analysis.

Standards and evaluation

Cutting-edge biomedical researchers in academia and industry have developed many useful and innovative technologies, each illustrated by an anecdote. However, these anecdotes do not provide reliable estimates of how effective is any given technology. Thus, the time is ripe for a generalized evaluation of methodologies for improving expression. To do this, we first will assemble a standard reference clone set for protein expression consisting of a random sample of 100 human protein-coding genes, as well as 100 cloned genes from other sources. From this standard reference material, we will generate standard reference data on protein expression, assaying the effect of various modifications that have been proposed to improve protein expression, using E. coli and Pichia as hosts.

Integrative systems analysis

The standard experiments described above will characterize expression of the target protein for various clone-host combinations under various conditions. In order to improve our ability to understand, detect, and remediate mis-expression, these same standardized materials will be subjected to an integrative systems-based analysis, including
  • Microarray analysis of host gene expression
  • Meso-scale imaging of cells expressing protein
  • { some non-disruptive way to detect protein state }
  • Bioinformatics analysis of protein family
  • Dynamic modeling of mis-expression and aggregation

Prior work with E. coli demonstrates the utility of microarrays for detecting mis-expression biomarkers, and the utility of bioinformatics on the target protein to identify statistical predictors of expression success such as native expression level, disordered content, and family evolutionary diversity. Dynamic models will integrate these predictors with actual data on mis-expression and aggregation for the first time.

What is the impact if successful?

Protein over-expression is a major bottleneck in strategies for structural characterization (e.g., of protein candidates for drug targets), generating antibodies, and producing protein therapeutics. The total yearly expenditure by US industry on protein production is ??? Most of this is spent on expression and purification. Much effort is also spent on proteins that cannot be over-expressed and that, for this reason, are side-lined from the path that leads to protein-based knowledge, reagents and therapeutics. Reports from structural genomics projects suggest that the fraction of problematic proteins is ???

If this project is successful, the standards, knowledge and technologies that we generate will have several impacts on the protein production process. First, the knowledge that we gain from the analysis of protein expressibility will allow biomanufacturers and researchers to allocate resources more effectively by targeting those proteins that are most likely to be expressible. Second, for any given protein that can be expressed, the outputs of this project will reduce the time and expense needed to find a successful expression procedure. Third, the outputs of this project will increase the fraction of proteins of interest that can be over-expressed. As part of this project, we will estimate the degree of these effects using the standard reference clone set.

How will the program be organized?

This section not written. Some thoughts:
  • Leadership team.
  • Scientists, facility managers
  • Locations – CARB, NIST
  • Equipment – CARB II facilities

How will intermediate results be generated?

This section not written. Some ideas
  • easy to get early results on the standard reference clone set. This aspect of the project will keep generating papers with important practical results right from the start.

How will you measure progress?

This section not written. Some ideas
  • Estimate benefits to industry using results from standards evaluation
  • industry partners do cost accounting on protein expression before and after.
  • knowledge from systems analysis saves time and money in choosing targets
  • biomarkers from systems analysis save time and money in choosing targets

What will it cost?

How much do you have?

ProteinExpressionProject Web Utilities

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r7 < r6 < r5 < r4 < r3 < r2 < r1 | More topic actions
 
CAMEL TWiki home
This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CAMEL? Send feedback