July 1, 2006 (Vol. 26, No. 13)
Applying Metabolomics and Multivariate Data Analysis Tools for Strain Improvement and Medium Optimization
The improvement of production processes to achieve commercially viable production levels is a prerequisite to any bioprocess. Large-scale industrial fermentation processes are continuously being bettered, resulting in improvements of, on average, 1𔃁% per year. Currently, production strains are enhanced using a combination of random and targeted approaches.
In targeted strain improvement, potential bottlenecks, feed-back inhibition, and side routes are removed by the targeted overexpression or knock-out of the gene(s) of interest. The selection of these targets is at its best based on expert knowledge, but educated guesses and gut feelings play a role as well. Time and thus money is also wasted on targets that later prove to be irrelevant or only result in a minor improvement.
Moreover in current approaches biological processes that are not known to be involved in or important for the formation of a specific bioproduct (e.g., metabolite or enzyme) are often overlooked, and it is impossible to rank the relative importance of the different targets postulated. By combining metabolomics technology and multivariate data analysis, it is possible to replace empirical target-selection approaches with a more scientific approach.
Metabolomics/MVDA Approach
Functional genomics technologies are revolutionizing research in the life sciences. Although these technologies were originally set up to elucidate the function of many orphan genes that came out of genome-sequencing projects, the true value of these technologies lies in the paradigm shift in methodological approaches that they have initiatedfrom a reductionistic, one-biomolecule-at-a-time, and hypothesis-driven approach, toward a holistic and question-driven approach.
This genomics approach, whether to study gene function or not, allows one to unbiasedly identify biomolecules important for specific biological processes. The strength of the functional genomics technologies is that they are nonbiased (they simply measure everything), and therefore new and unintuitive insights in cellular behavior can be gained.
Functional Genomics Tool
Metabolomics, the comparative, non-targeted analysis of a complete set of metabolites in a cell, recently emerged as the newest functional genomics tool. As the biochemical level of the metabolome is closest to that of the functioning of a cell, metabolomics is an effective way to understand biological functioning. Validated metabolomics technology platforms are now available, including TNO’s(www.tno.nl) inert and robust platform.
These platforms essentially comprise three pillarsmultimetabolite analytics, data preprocessing, and data analysis. They allow the quantitative analysis of more than 1,000 metabolites simultaneously. However, the key issue in meta-bolomics is not the generation of reliable data but the translation of the differences in the compositions of the metabolomes into differences in the bioproduction-related phenotypes of the cells that these metabolomes were derived from (Figure 1).
To this end, TNO has applied multivariate data analysis (MVDA) tools to allow the identification and ranking of targets, based on the strength of the correlation of the metabolites with the bioproduction-related phenotype of interest, such as yield or productivity.
Target Selection for Strain Improvement
To demonstrate that the combined metabolomics/MVDA approach results in targets that are important for strain improvement, we studied the production of phenylalanine by E. coli. A patented phenylalanine-producing strain (ATCC 31884) was obtained that had already been optimized by eight or more steps of rational design. This strain was cultivated in a batch fermentor under different environmental conditions to achieve large variations in the amount of phenylalanine produced.
Samples were taken from these controlled fermentations at different time points, quenched to halt cellular metabolism, and worked-up for metabolome analysis using GC and LC-MS. Subsequently, the raw GC and LC-MS output data files were preprocessed using home-made software. The resulting clean data set was analyzed using Partial Least Squares (PLS). This regression tool results in a model that predicted the phenylalanine titer ([P]), based on all metabolites (A,B,C,) measured:
[P] = b1A + b2B + b3C + …
The relative statistical importance of all the metabolites toward the phenylalanine titer is determined by their weight factors (regression values [b1, b2, b3]). When the metabolites are subsequently ordered, based on the absolute value of the regression value in the PLS model, those metabolites that contribute most to the phenylalanine titer can be identified and ranked.
Results showed that approximately one-half of the metabolites that strongly correlate with phenylalanine titer were intermediates or side-products of the phenylalanine biosynthesis route. Subsequently, the biological interpretation of the results allowed the identification of genes that should be knocked-out or overexpressed to achieve higher phenylalanine titers (Figure 2). Several of these leads were validated resulting in a phenylalanine titer increase of up to 50% (Figure 3). The entire improvement effort took less than nine months.
Medium Optimization
Similarly, the combined metabolomics/ MVDA approach can be applied for medium optimization. Currently, medium optimization is mainly an empirical process, although more recently experimental design approaches have been introduced to speed up medium optimization. However, both approaches are black-box approaches with no real understanding of what is going on with the different medium components/metabolites present in the extracellular environment.
By applying metabolomics for the analysis of the medium components (exo-metabolome approach) in combination with MVDA, critical medium components can be identified. Based on this knowledge, a more directed medium optimization can be achieved by specifically increasing or decreasing the concentrations of these critical medium components.
The combined metabolomics/MVDA approach was applied on a metabolite-producing process that had been optimized continuously for more than seven years. All the highest ranking medium components showed a positive correlation with the specific productivity for this metabolite. Moreover, most of these metabolites were relatively cheap and simple commercially available compounds, which made it economically attractive to add them to the medium. When 10 mM of these compounds was subsequently added to the original medium, an increase in titer of up to 12% for this metabolite was achieved (Figure 4).
Conclusions
These studies demonstrate that the combined metabolomics/MVDA approach can be applied to speed-up bioprocess optimization. Instead of the normal 1𔃁% improvement per year, improvements of 12󈞞% were achieved in a single round of process optimization. This approach not only results in more economical bioprocesses, it also reduces time-to-market for novel bioprocesses.
MVDA tools proved to be useful for the unbiased identification and ranking of targets from the large metabolome data sets. The identified targets for strain improvement and medium optimization were not only logical targets, but new insights in cellular functioning were also gained, thereby generating IP opportunities. Overall, the combined metabolomics/MVDA approach worked like a navigator showing users the quickest way and opening up the black boxes of cellular metabolism and complex media.
Other applications of the combined metabolomics/MVDA approach include developing chemically defined media, monitoring lot-to-lot variation and predicting the effect of these differences on the bioprocess performance, monitoring the shelf life of complex media, identifying bioactives in complex mixtures, characterizing mutant strains, assigning functions to orphan genes, and identifying metabolite-dependent regulatory interactions.