October 1, 2011 (Vol. 31, No. 17)

Elaine R. Mardis Ph.D. Professor of Genetics Washington University School of Medicine

Thriving Sector Expected to Remain on Its Upward Trajectory

When asked about the daily reality of developing DNA sequencing technology for a high-throughput sequencing center—a job that I’ve enjoyed every minute for the past 20 years—I typically respond, “Never a dull moment!”

When I find time to reflect upon the amazing trajectory of innovation that has brought the field to the current state, I marvel at what has been accomplished in a relatively short time period. Think about it.

Fred Sanger and colleagues first began publishing DNA sequencing methods in the mid-1970s, culminating in their description of the dideoxynucleotide chain termination method in PNAS in 1977.1 I learned DNA sequencing in the mid-1980s from Bruce Roe, my Ph.D. mentor, who learned from Fred Sanger and Alan Coulson.

At that time, the state-of-the-art included labeling four separate reactions for each


Elaine R. Mardis, Ph.D.

template with 32P-dATP and the Klenow fragment of E. coli DNA polymerase, separation on hand-assembled and poured polyacrylamide-urea gels, and autoradiography. Sequence data was entered by one hand into the computer (thankfully, the QWERTY keyboard contains the A, C, G and T keys on the same half!), while peering at the gel on a fluorescent lightbox and keeping track with the other hand.

Sounds like ancient history, but that was less than 30 years ago. In retrospect the dramatic changes in DNA sequencing in terms of its scale of execution, the interdisciplinary efforts required to produce, analyze, and interpret the data, the expanding impact of DNA sequencing on biological research, and, ultimately, on our economy have compounded at an amazing rate in that time frame.

This acceleration seems quite likely to continue as the pace of sequencing and its reach across the biological sciences shows no sign of slowing. Research activities in several areas that have been transformed by next-generation sequencing technology may be illustrative of this trajectory and its predicted acceleration.


Expertise in computational biology and bioinformatics is now an absolute given for researchers seeking to analyze and make sense of the large datasets obtained from DNA sequencing projects. [Fotolia V-Fotolia.com]

Metagenomics

In human metagenomics studies, for example, the sampling of bacterial, viral, and eukaryotic organisms taken from a specific area of the human body by gross sampling methods and DNA isolation has been greatly accelerated by next-generation sequencing (NGS) technology.

Generating NGS data from these metagenomic isolates addressed the two major issues—cost and ease of data generation—that kept early metagenomic studies from achieving their full potential.

With the decreased cost of NGS reads, significantly deeper sampling of each population isolate can be achieved and hence minor species detected. The digital nature of these reads further enables a measure of the relative proportions of each population member.

The simplicity of NGS library preparation, the ability to make these libraries from tiny input amounts of DNA, and the elimination of a bacterial cloning intermediate have been invaluable to improving the representation of all species present in the populations sampled and to expanding the types of samples we can address.

The development of data-analysis approaches that encompass the increasing size of the sequence datasets that result from deep NGS sampling and that can accurately mine information from them has re-engaged computational biologists and has resulted in a staggering amount of innovation.

Resulting “big science” projects such as the NIH’s Human Microbiome Project, Europe’s MetaHit, and other international projects, as well as the research of independent investigators, have begun to define bacterial diversity in human health and disease.2-3

Interestingly, we now have a bacterial census of the intestines of cats and dogs4, pigs5, and the octopus.6 Novel pathogenic viruses have been discovered by mining metagenomic datasets, and etiologic agents have been identified in disease outbreaks.7-8

By studying RNA isolates converted to cDNA from various sources, a new experimental approach called “metatranscriptomics” has resulted, and can be applied to characterize the metabolic potential of each population.

Metagenomics also has been applied to characterize environments unrelated to human health, such as soils9-10, lakes11, and thermal springs in Russia12, among myriad others. In fact, a quick search of PubMed with “metagenomic” reveals 1,382 references. Most have been published since 2005, across an incredible breadth of topics that reflect the explosion of this scientific endeavor in basic biological discovery, data analysis, data mining, and methods development.

As the transformation of metagenomics by NGS and advanced analytical approaches continues, it will be interesting to see its impact on diverse areas such as food safety monitoring (evidence the need for continuing vigilance as yet another E. coli strain impacts human health in Europe), and pharmaceutical product development and quality control (such as in vaccines or other live-cell products).

Another possible use will be diagnosis for optimal antibiotic treatment in patients affected with pathogens known to harbor a spectrum of antibiotic resistance such as methicillin-resistant Staphylococcus aureus (MRSA). There are numerous other applications.

Crop Improvement

It is interesting to posit that work to sequence important crop genomes and their subsequent geznetic engineering to obtain desirable traits that increase the world food supply may ultimately have a greater impact on human health than has the sequencing of the human genome.

This possibility seems more likely when one considers that, while most crop genomes sequenced to date have used conventional methods, the use of NGS to sequence and identify loci contributing to desirable or “domesticated” traits such as drought or salinity tolerance, resistance to pathogens, or reduced time to maturity, will markedly accelerate these efforts.

In particular, with a high-quality reference sequence in hand, short-read technologies can be utilized to sequence and then align reads from a strain exhibiting desirable traits onto the reference genome.13

Proper analysis reveals genes and regulatory regions that differ from the reference, and a secondary or “interpretational” analysis promotes those variants with the most likely phenotype-altering impact for downstream functional studies. Each suspect variant then can be biologically evaluated to determine its contribution to the trait using the wide variety of transformation techniques available.

Once identified, engineering of the crop species for the desired new trait can begin.14 Quality reference genomes are now available for rice15, maize16, and grape17, and are well under way for tomato, potato, wheat, and cassava, among others.

Cancer Genomics

One significant impact of NGS has been to accelerate efforts in cancer genomics, i.e., identifying the differences in DNA sequence variation, RNA expression, or methylation (or all three) in matched tumor and normal tissues from the same patient.

The large capacity of NGS instrumentation and the emphasis on cancer genomics as a fundamental discovery mechanism has enabled large cooperative projects—such as The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Pediatric Cancer Genome Project (PCGP)—to aim at characterizing these differences in hundreds of cancer cases.

Furthermore, the comprehensive scope and digital sensitivity of NGS methods has allowed genome-wide comparisons of tumor-normal pairs18-21, a re-examination of hypotheses about tumor evolution22-23, and whole-genome sequencing for therapeutic options in patients.24-25

The incorporation of NGS into the clinical trial setting to characterize patient samples prior to determining the trial arm to which each patient is best assigned and into the prognostic/diagnostic setting for cancer care will combine to demonstrate that NGS-based methods can improve clinical efficacy, an important step toward transforming the standard of cancer care in the near term.

Meanwhile, the discipline of pathology26 is being radically altered, with training in genomic tests and their interpretation, to provide the medical expertise required to support the use of DNA-based tests in the clinical setting.

In addition to cancer genomics, NGS has now successfully been applied to solve causes of pediatric genetic disease27-28, also demonstrating clinical efficacy in diagnosis as well as valuable insights into de novo genetic disease.

The Future

This brief survey of the impact of next-generation sequencing and its transformative power is not meant to be comprehensive but rather exemplary. The remarkable reach of NGS into disparate scientific endeavors, both commercial and research-oriented, is revitalizing associated aspects of science, computation, and the economy.

The challenges and innovation required to analyze and properly interpret large sequence datasets has effectively breathed new life into the disciplines of computational biology and bioinformatics. Similarly, the resulting NGS-driven computational infrastructure demands have increased both the need to build data centers and the subscription to large server farms in the grid/cloud environment. These demands not only increase hardware sales and drive innovation but also create jobs across the spectrum from advertising to engineering, administration to construction.

Sequencing instrumentation and associated reagent sales represent an almost uniquely American enterprise at present, and the competition for market share is at a feverish pitch. This has escalated of late, as a new wave of so-called third-generation instruments is being introduced to the market. In this class of instrumentation, run times are measured in a few hours rather than days and although the data volume per instrument is much lower than NGS instruments, so is the cost of reagents and consumables per run.

Using a combination of speed and economy, these new instruments will likely ease the introduction of massively parallel DNA sequencing into the clinical setting, facilitate food and pharmaceutical product safety testing, revolutionize agricultural genomics, and expand even further the effective reach of DNA sequencing technology into our daily lives.

However, threats to this very desirable trajectory exist. They clearly include the likely dramatic decreases in government funding for scientific research, and as importantly, the downward spiral in our emphasis as a country on the importance of education.

Only our complacency as a nation in these areas will allow this current area of scientific innovation to slip away.

References

References

1 Sanger, F., Nicklen, S. & Coulson, A.R. Proc Natl Acad Sci U S A 74, 5463-5467 (1977).

2 Arumugam, M., et al. Nature 473, 174-180 (2011).

3 Gupta, S.S., et al. Gut Pathog 3, 7 (2011).

4 Suchodolski, J.S. Vet Clin North Am Small Anim Pract 41, 261-272 (2011).

5 Lamendella, R., Santo Domingo, J.W., Ghosh, S., Martinson, J. & Oerther, D.B. BMC Microbiol 11, 103 (2011).

6 de la Cruz-Leyva, M.C., Zamudio-Maya, M., Corona-Cruz, A.I., Gonzalez-de la Cruz, J.U. & Rojas-Herrera, R. Lett Appl Microbiol (2011).

7 Presti, R.M., et al. J Virol 83, 11599-11606 (2009).

8 Loh, J., et al. J Virol 83, 13019-13025 (2009).

9 Parsley, L.C., et al. FEMS Microbiol Ecol (2011).

10 Tao, W., Lee, M.H., Wu, J., Kim, N.H. & Lee, S.W. J Microbiol 49, 178-185 (2011).

11 Chistoserdova, L. Appl Environ Microbiol (2011).

12 Mardanov, A.V., et al. Extremophiles 15, 365-372 (2011).

13 Ossowski, S., et al. Genome Res 18, 2024-2033 (2008).

14 Yamamoto, T., Yonemaru, J. & Yano, M. DNA Res 16, 141-154 (2009).

15 The map-based sequence of the rice genome. Nature 436, 793-800 (2005).

16 Schnable, P.S., et al. Science 326, 1112-1115 (2009).

17 Velasco, R., et al. PLoS One 2, e1326 (2007).

18 Ley, T.J., et al. Nature 456, 66-72 (2008).

19 Mardis, E.R., et al. N Engl J Med 361, 1058-1066 (2009).

20 Pleasance, E.D., et al. Nature 463, 184-190 (2010).

21 Pleasance, E.D., et al. Nature 463, 191-196 (2010).

22 Shah, S.P., et al. Nature 461, 809-813 (2009).

23 Ding, L., et al. Nature 464, 999-1005 (2010).

24 Welch, J.S., et al. JAMA 305, 1577-1584 (2011).

25 Jones, S.J., et al. Genome Biol 11, R82 (2010).

26 Tonellato, P.J., Crawford, J.M., Boguski, M.S. & Saffitz, J.E. Am J Clin Pathol 135, 668-672 (2011).

27 Ng, S.B., et al. Nat Genet 42, 30-35 (2010).

28 Worthey, E.A., et al. Genet Med 13, 255-262 (2011).

Elaine R. Mardis, Ph.D. ([email protected]), is an associate professor of genetics and molecular microbiology, co-director, The Genome Institute of Washington University School of Medicine.

Previous articleAdvertorial: ATR
Next articleEmergent Wins Potentially $1.25M Anthrax Vaccine Supply Contract