Recent reports in Science illustrate the range of applications enabled by sequencing technologies pioneered by Roche (www.roche.com) and 454 Life Sciences (www.454.com), now a Roche company. In one study (Science 2007; 317:1927-1930), researchers performed whole genome shotgun sequencing of mitochondrial DNA extracted from the ancient Siberian mammoth to examine population diversity.
Each single-stranded DNA template was captured on a bead and amplified within an individual emulsion droplet. The beads were collected on a PicoTiterPlate™ at a density of 1.9 million beads per plate and sequenced on Roche’s Genome Sequencer (GS) FLX instrument.
In another report (Science DOI: 10.1126/science.1149504), researchers used the GS FLX to detect structural variations in the human genome caused by additional, lost, or rearranged chunks of DNA. This type of genetic analysis, combined with SNP data, will help researchers paint a broader picture of the newly discovered variability present in the human genome.
Structural variation appears to be more prevalent than previously thought, and examples of evolutionary conservation based on comparisons with early human genomes will contribute to an understanding of their role.
Tim Harkins, marketing manager for the 454 Sequencer at Roche Applied Science, says that these papers, together with a report linking gene expression with maternal behavior in the wasp (Science DOI: 10.1126/science.1149504), “demonstrate the flexibility of the platform” and the diversity of applications enabled by high-throughput sequencing, as each paper targets a different biological question.
“There is quite a bit of hype in the market regarding instrument specifications for next-generation sequencing and how much data you can get from an instrument run—total data versus usable data,” observes Harkins.
During parallel sequencing some reads will go awry and generate spurious data, which the Roche system filters out using a variety of quality control metrics, according to Harkins. Users should compare instrument systems based on usable data figures and understand “coverage models,” Harkins emphasizes. “Coverage model refers to how efficiently a sequencing read can be used either to detect a genetic variant or assembly of a genome—is one read needed or 100 reads?”
The first-generation 454 system was able to sequence 20 mb/run. Improved versions of the sequencer now output 100 mb/run in less than eight hours, Harkins reports, and by mid-2008 the company expects to introduce upgrades that would push that figure to 1 gb/run in less than 24 hours using the same instrument system and improved reagents.
Initial read lengths were approximately 100 bp and presently average about 250 bp, with published read lengths surpassing 400 bp by mid-2008, he adds.
The first iterations of next-generation sequencing platforms now available from companies such as 454 Life Sciences, Illumina (www.illumina.com), and Applied Biosystems (ABI; www.appliedbiosystems.com) represent the tip of the iceberg in terms of potential for future improvements in sequencing throughput and cost.
In addition to stepwise improvements that will accompany the evolution of individual technologies, there will be “disruptive spurts of improvement in either cost or throughput, or both,” in the view of Susan H. Hardin, Ph.D., president and CEO of VisiGen Biotechnologies (www.visigenbio.com).
Single Molecule Sequencing
Single-molecule sequencing approaches will represent one such burst, asserts Dr. Hardin, describing VisiGen’s platform as a combination of real-time fluorescence signaling and single-molecule detection in massively parallel arrays.
“We have engineered the DNA polymerase and nucleotides to act as direct molecular sensors of base identity during nucleotide incorporation,” says Dr. Hardin. Using fluorescence resonance energy transfer (FRET), and by attaching a donor fluorophore to the polymerase and putting an acceptor fluorophore on the gamma phosphate of the nucleotide, VisiGen’s strategy allows for real-time detection of each nucleotide addition.
During the extension reaction, as each nucleotide is added to the growing DNA strand, energy transfers from the polymerase to the nucleotide resulting in a base-specific fluorescent signal. The acceptor signal can be detected when the fluorophore is in close proximity to the donor fluor in the immobilized polymerase-DNA-nucleotide complex. When the beta and gamma phosphates are cleaved off, the acceptor fluor is released from the active site of the polymerase and its signal disappears, allowing the donor signal to reappear, “acting as a punctuation mark” between synthesis steps, explains Dr. Hardin.
“The $1,000 genome is reasonable using a parallel single-molecule approach,” Dr. Hardin says. Equally important, she adds, will be the ability to sequence a human genome in a single day using VisiGen’s nanosequencing machines. She envisions read lengths in the 1 kb range and possibly higher. The company intends to introduce its technology by launching a sequencing service before making the instrument available for sale.
The initial version of the HeliScope™ instrument system, soon to be launched by Helicos BioSciences (www.helicosbio.com), will reportedly generate approximately 25–90 mb/hour, or about 10x9 bases/day. The instrument will have an imaging capacity of four billion bases/hour, and future improvements in reagents will enable throughputs approaching 10x9 bases/hour.
Future kits, with enhanced reagents and chemistries and an improved flow cell, will be available as upgrades for the instrument system and will “get you to the $1,000 genome,” predicts Stephen Lombardi, president and COO at Helicos.
A key advantage of Helicos’ True Single Molecule Sequencing technology is the ability to sequence DNA without the need for amplification, thus simplifying the workflow and eliminating the bias and risk of error that PCR-based amplification can introduce, Lombardi explains.
Transforming Medical Genomics
As next-generation sequencing technologies lead the evolution from capillary-based Sanger sequencing methods to novel chemical and electrophysical strategies, the number of samples processed in a single run has increased so dramatically that it is transforming genomic research and medical resequencing efforts, according to Lombardi.
“Single molecule technology offers the promise of interrogating 109 samples—three billion individual strands—at the same time,” he says.
With this level of throughput, researchers performing disease-association studies and pursuing personalized medicine applications will be able to use DNA sequencing as a multifunctional genetic analysis tool, eventually enabling the use of whole genome sequencing to replace SNP analysis, to detect emerging measures of genomic variation such as copy number variation (CNV) for digital gene expression analysis, and to distinguish between alleles of a gene and different transcripts derived from a single gene sequence.
With these platforms, “we are able to do molecular biology at the single-molecule level,” says Lombardi.
Perennial innovators in the high-throughput sequencing arena include Applied Biosystems and Illumina.
Illumina’s Solexa Sequencing employs the company’s clonal single-molecular-array technology and reversible terminator-based sequencing chemistry to achieve parallel sequencing of millions of fragments of genomic DNA. The fragments are attached to a planar, optically transparent surface and undergo solid-phase amplification, yielding approximately 40 million clusters, each with approximately 1,000 copies of template/cluster. A fluorescence-based sequencing-by-synthesis strategy using reversible terminators generates the sequence of each template, and these are then aligned based on a reference genome.
A recent application of the Solexa technology focused on whole genome scanning to identify DNA-binding sites. ChIP-Seq combines chromatin immunoprecipitation and single molecule sequencing to sequence DNA regions that bind to a protein of interest.
ABI recently launched a software initiative aimed at supporting the development and commercialization of bioinformatics applications for next-generation DNA-sequencing platforms. As part of this initiative, the company announced plans to expand its Software Development Community to include sample data sets, data file formats, and data conversion tools for its SOLiD™ high-throughput sequencing platform.
SOLiD, an acronym for sequencing by oligonucleotide ligation and detection, relies on a stepwise ligation method that can generate more than 1 gb of DNA sequence per run and allows for sample multiplexing, ABI reports. The technology combines microbead sample preparation and fluorescently tagged nucleotide detection. By interrogating two adjacent bases simultaneously, the method has intrinsic redundancy and can be used to distinguish between SNPs and random sequencing errors.
When Sadig M. Faris, Ph.D., founder, chairman, and CEO of Reveo (www.reveo.com), approached the problem of how to reduce the speed and cost of DNA sequencing while ensuring 100% accuracy, he did so from the perspective of a physicist, engineer, and materials scientist, and drew on acquired expertise in photonics, quantum mechanics, and nanotechnology.
His goal was to design a system capable of sequencing a human genome in less than one minute, which translates to no more than a nanosecond to detect each base. With this limiting factor in mind, he turned to a technology called quantum-mechanical tunneling, for which nanosecond measurements are relatively slow.
The sequencing process involves stretching each chromosome in the genome across a substrate, onto which the DNA is then immobilized. Using an array of nanofingers positioned along the DNA strand with piezoelectric motion control, each chromosome is scanned and the bases identified based on the frequency with which they vibrate when excited by electrons accelerated through a tunnel-like device on the nanofingers.
Dr. Faris describes each nanofinger as resembling a knife-edge, with a linear contact surface rather than a point tip to minimize errors. The nanofingers, designed to detect each of the four bases/colors and arrayed in sets of 64, are interconnected, with each of the 64 channels scanning an individual DNA strand across its entire length.
“If 50 or more of the 64 nanofingers say it [a base] is an ‘A,’ then it is reported as an A,” explains Dr. Faris. If there is no consensus, the instrument will automatically go back and reread a particular nucleotide, making this method virtually error-free. For applications in personalized medicine, “if you have one single error and it correlates to a disease, that cannot be tolerated,” Dr. Faris says.
The company’s electro-optic sequencer is in early development. It will be marketed by an offshoot of Reveo to be called Revase. The commercial potential for Reveo’s sequencing system extends beyond basic genome analysis. The same device could also be used to determine the amino acid sequence of a protein or to analyze the epigenome by detecting the presence of methyl groups. It will be able to simultaneously characterize the genome and the epigenome, to superimpose one on the other to aid in understanding gene regulation, and to interrogate the effects of environmental and pharmaceutical factors on gene expression and function.
Toward the $1,000 Genome
NABsys (www.nabsys.com) is developing its hybridization-assisted nanopore sequencing system, based on technology acquired from Brown University, to enable personalized medicine. Barrett Bready, M.D., CEO of NABsys, sees the future of high-throughput DNA sequencing and the push toward the $1,000 genome in a clinical context.
A whole genome sequencing system needs to produce “clinically relevant data at a clinically affordable price,” Dr. Bready asserts. With existing, post-Sanger sequencing technology there is a trade-off between cost and read length from a performance perspective, and it is “still orders-of-magnitude too costly for clinical use,” Dr. Bready says.
NABsys’ approach combines the advantages of nanopore sequencing and sequencing by hybridization. Nanopore sequencing involves creating small pores in a silicon chip. DNA fragments with probes bound translocate through the pores and are detected by current fluctuations. Detection is therefore electronic rather than optical.
By combining these two technologies, NABsys is able to overcome the scalability issues associated with traditional sequencing-by-hybridization methods, explains Dr. Bready. Furthermore, “there is no inherent limit on read length, and you don’t need single-base resolution from the nanopore readout.”
Dr. Bready anticipates that the company will complete development of an early access instrument in about three years.
Intelligent Bio-Systems (www.intelligentbiosystems.com), a two-year-old company based on sequencing-by-synthesis technology licensed from Columbia University, exploits PCR-based amplification of genomic fragments that are attached to a DNA sequence primer, arrayed on a high-density chip, and exposed to engineered DNA bases containing a removable fluorescent dye and an end-cap. Repetitive cycles of strand extension, array scanning to detect the fluorescent signal emitted by the label on the terminal base, and cleavage of the label and end-cap culminate when the fragments have been sequenced.
“Each spot on the chip has many identical fragments on it, resulting in increased accuracy and less sensitivity to errors than other high-throughput sequencing methods,” says Steven Gordon, Ph.D., CEO of Intelligent Bio-Systems.
A comparison of the error rate from single-molecule strategies versus a PCR-amplification-based process demonstrated a two to three order-of-magnitude decrease in errors using the amplification method, even with the errors from amplification, according to Dr. Gordon.
Dr. Gordon estimates that the company’s Pinpoint™ sequencer, in development, will be about half the price of comparable systems—in the range of $250,000 to $300,000—and will offer higher throughput, with the high-density configuration of the chip and the speed of the chemistry and imaging steps contributing to a predicted capacity of several gb/day. The company is working to increase read lengths to facilitate de novo sequencing. It has shipped its first early access system to a genome center partner and is looking to place the system in additional research centers with plans to begin selling the instrument in 2008.