Over the past decade, DNA sequencing throughput has increased over 50-fold. Advances in DNA sequencing have enabled logarithmic growth of data points and breadth in coverage of an individual genome. High-throughput sequencing holds great promise for population-wide analysis that may influence treatment of human diseases, development of prognostic genetic biomarkers, elucidation of somatic cancer-generating mutations, or viral drug-resistance.
However, considerable collection, processing, and analysis costs remain and still impede studies involving multiple samples. If the cost of a single genome analysis were lowered, without jeopardizing performance standards, many previously impractical studies could become an everyday reality. The key challenges in achieving this goal are miniaturization of the reactions and increase in quantity and density of reactions. The same volume previously occupied by a single reaction now contains millions of reactions, proportionally decreasing per-reaction cost.
New technologies aim to achieve price-performance points that bring the cost to process a human genome on par with the cost of the bacterial genome, ultimately aiming for a coveted goal of $1,000 per genome.
Most of the current applications are based on the comparison of already sequenced genomes. Resequencing of an entire genome can be successfully accomplished with short reads of about 30 bp, as long as the coverage is somewhere between 25–30x. This paradigm shift gave a boost to short-read technologies. The companies highlighted are capitalizing on the new trends by providing high-throughput sequencing products that are able to support a multitude of applications at a fraction of the costs.
Individual Molecule Sequencing
Helicos Biosciences(www.helicosbio.com) miniaturized the sequencing reaction to the level of individual DNA molecules. In its system each DNA molecule from a sample occupies a distinct position in a flow cell, enabling simultaneous sequencing of billions of DNA strands.
“Because we employ a True Single Molecule Sequencing (tSMS™) approach, we have been able to develop an exceptionally simple sample preparation process,” says Steve Lombardi, senior vp of marketing at Helicos Biosciences. “We do not need to use amplification. Thus we do not bias our samples, nor do we have to deal with amplification errors.”
After genomic DNA is sheared, poly-A tails with a terminal Cy3-labeled nucleotide are added to both ends of the DNA fragments. The Cy3 label is used to localize the templates once they are bound to immobilized oligo-dT primers on the surface of the flow cell. Next, the polymerase and Cy5-labeled nucleotides are added, and incorporated bases are detected. The label is cleaved, and the process is repeated for multiple cycles to generate strand lengths required for specific applications.
“Our first-generation instrument is expected to deliver 109 bases per day, and in the future, we aim to deliver close to 109 bases per hour,” adds Lombardi. “The combination of throughput and a scalable workflow from our simple sample prep will enable experiments that involve extensive coverage of the genome and require hundreds or thousands of samples. These experiments are simply not possible today.”
The company emphasizes that its technology can produce such diverse information as expression analysis, digital karyotyping, or genome-wide methylation status. Even though resequencing is its main focus, the technology is also capable of de novo sequencing.
An antisense copy of the single-molecule template is produced enzymatically using the immobilized oligo-dT base as a primer. The antisense strand is sequenced from the 5´ end, after which a predetermined number of cold or unlabeled nucleotides are added to form a continuous, complementary strand. Next, another round of sequencing is initiated using the single molecule process. This combination can be reiterated with various amounts of cold nucleotides on long templates, generating sequence information from the same template for contig assembly. Helicos has established an early access partnership with the Institute of Systems Biology and is actively seeking expert opinions on the applicability of the system to different types of experiments.
Complete DNA Structures
“Whole genome sequencing is on the way to becoming a common research tool,” says David Bentley, Ph.D., chief scientist at Solexa(www.solexa.com). “There are many other applications beyond SNP identification. We are focusing on changes in DNA sequences that occur in drug-resistant pathogens and on subtle somatic DNA mutations leading to cancer. In the majority of cases, cancer arises due to aberrations in multiple genes. With a virtually unlimited DNA sequencing capacity, we can reveal the changes in the entire genome.”
The company aims to lower DNA sequencing costs by orders of magnitude and it projects that as early as next year a complete human genome could be sequenced for $100,000. Solexa’s technology is based on clonal expansion of DNA fragments captured on a solid support. Each DNA strand carries forward and reverse adaptors and forms a loop when both ends bind to the immobilized, complementary primers. The loops become the substrate to the DNA polymerase, thus amplifying and forming a 1,000-copy cluster. A 100-micrometer square of the flow cell may contain as many as 1,000 clusters.
All four proprietary fluorescent terminator nucleotides are added at the same time. High efficiency of incorporation is achieved by using a DNA polymerase specifically evolved for this purpose. The fluorophores and 3´ termination are cleaved, and the reaction is repeated. The technology has been validated in projects as large as successful resequencing of the human X chromosome. According to the company, its sequencing results average Q18 in quality over 35 bases on both BAC and human genomic DNA.
“Bacteria often incorporate new DNA sequences that are not found in reference DNA,” continues Dr. Bentley. “To decode these new loci, we developed the method for sequencing paired ends of the same fragment. The fragment is circularized to include an adaptor in between the two ends. A universal sequence and the intervening adaptor are used to initiate two separate sequencing reads. The two sequence datasets are mapped to the original template in order to extract pairing information.” In mid-2006 the company shipped first-generation instruments to its early access partners, which include the Broad Institute in Cambridge and the Genome Sequencing Center (GSC) at the Washington University School of Medicine in St. Louis.
In July, Applied Biosystems (ABI; www.appliedbiosystems.com) entered the realm of high-throughput sequencing with the acquisition of Agencourt Personal Genomics (APG). “We have carefully evaluated over 40 types of high-throughput sequencing technologies, from those in embryonic stages to those that are well developed,” says Andy Watson, Ph.D., senior director of market development in the genetic analysis department of the molecular and cell biology division.
“APG’s technology is complementary to current ABI platforms and applicable to many experiments. We chose cancer gene resequencing, high-throughput gene expression, and resequencing of bacterial genomes as our first target applications.”
APG’s sequencing chemistry, Sequencing by Oligonucleotide Ligation and Detection (SOLiD™), has a unique, inherently built ability to interrogate two bases at the same time, Dr. Watson reports, resulting in extremely accurate readings. The SOLiD method utilizes random 8-mer primers with a fifth position containing A, G, T, or C. Primers are labeled with one of the four fluorescent dyes. The color of a fluorescent dye indicates the base in the fifth position. A random primer is ligated to the template only when the labeled nucleotide complements the fifth nucleotide on the template, counting from the end of the previously ligated primer.
After visualizing the color, the fluorescent tag is removed by cleaving the primer between the fifth and sixth positions. The process is repeated and every fifth position is recorded. Next, the system is reset to generate the recording for every n-1, n-2, n-3, and n-4 positions.
“Sequencing by synthesis starts generating noise after 25 nucleotides. Our technology demonstrated feasibility up to 50 bases,” adds Dr. Watson.
“We have also developed two-step encoding with known nucleotides in every fourth and fifth position of the primer. This is a unique method offering outstanding base-calling accuracy for SNP detection and is being used successfully in our cancer-resequencing work with Victor Velculescu, M.D., Ph.D., at John Hopkins University.” The company maintains that APG would not be in competition with ABI’s line of microcapillary sequencers based on the Sanger method. “Our primary competition is really microarrays,” states Dr. Watson.
Visigen Biotechnologies (www.visigenbio.com) utilizes a fluorescently tagged polymerase and color-coded fluorescent nucleotides to detect the addition of a particular base to a growing DNA chain, thus determining DNA sequence. The polymerase is modified with a donor fluorophore and immobilized on a glass slide. When an acceptor-labeled nucleotide is incorporated into the growing polymer, energy transfers from the polymerase to the nucleotide, stimulating emission of a base-specific signal. Because nucleotides are modified with the acceptor fluorophore on ¡-phosphate, the fluorescent moiety is released in a pyrophosphase complex.
“Our technology results in native nucleotides incorporated into the growing DNA strand and immediate detection of sequence information,” says Susan Hardin, Ph.D., CEO of Visigen. “Moreover, we are observing and detecting the DNA synthesis in real time. We expect to achieve the sequencing rate of 1 megabase per second.”
To detect the emission of an individual nucleotide on each individual strand, the company developed an extremely sensitive detection system. Visigen demonstrated proof-of-principle by detecting interactions between the polymerase and 30 sequential acceptor-labeled nucleotides. The company plans to enter the market with custom sequencing services. “We believe, however, that our system will become a solution for any DNA-sequencing project,” adds Dr. Hardin. Although Visigen technology is still in development, it has attracted attention from ABI, which completed an equity investment and entered into a scientific collaboration agreement with the company last year.
“Technologies utilizing fluorescent nucleotides carry an inherent problem of detection,” comments Jonathan Rothberg, Ph.D., founder of 454 Life Sciences (www.454.com). “Not only do you have to spend time on detecting your fluorophore, you also have to cleave it after each cycle. It all takes time. We are able to sequence within hours samples that other companies take days to complete.”
454 Life Sciences utilizes miniaturized, pyrosequencing reactions to detect the addition of a nucleotide by conversion of luciferase into oxyluciferin. “Most importantly, we finetuned our sample-preparation method, the emPCR™, to where we can detect attomoles of DNA,” continues Dr. Rothberg. “This is especially critical for de novo sequencing of complex genomes, such as the 38,000-year-old Neandertal genome, where 95% of DNA is closer to ancient bacteria than to modern human.”
According to Dr. Rothberg, 25-bp reads are not sufficient for complete resequencing or de novo assembly of a complex genome. “We have presented data on 500-bp reads and routinely achieve 100-bp reads. Also with longer reads, we can sequence an entire exon in one go. With 25-bp reads, the sequencing of PCR fragments becomes too complex and too expensive. PCR fragments are used in a wide variety of applications: cancer genome studies, genetic studies using pools of samples, and HIV research.”
The company reported success in detecting low-frequency mutations in heterogeneous samples such as tumors or HIV. By detecting individual mutated viruses among the wild-type in the same blood sample, HIV drug-resistance could be predicted before its onset.
A collaborative study between Dana Farber Cancer Institute and 454 Life Sciences developed a method for the detection of cancer mutations present at low levels in the heterogenic samples from patients with lung cancer. Tumor biopsies present a complex mix of cells that may carry different somatic mutations, some of which could be responsible for sensitizing tumor cells to chemotherapy, while other mutations render tumor cells completely resistant to drug treatments. Identification of these mutations may potentially enable the personalization of targeted therapies.
deltaDOT (www.deltadot.com) is perfecting a technology that also does not utilize fluorescent nucleotides to detect biological molecules. The company is working on applying its core technology, LFII™, to sequencing and Short Tandem Repeat analysis.
DNA is processed using a standard cycle-sequencing kit excluding fluorescent nucleotides. As the terminated strands of A, G, T, or C reactions separate in the microcapillary, DNA is detected by a photodiode array. Each of the 512 pixels of the photodiode collects information independently. Signal information is deconvoluted, picking up signals with as low as a 1/10,000 signal-to-noise ratio.
“Our system generates unprecedented separation of signal from background,” comments Stuart Hassard, Ph.D., co-founder and head of biology for deltaDOT. “The data quality is approaching that of mass spec. Our technology combines dramatic increases in resolution with excellent reproducibility, with a standard deviation of less than 1%.
“We are applying principles of particle physics to molecular biology,” continues Dr. Hassard. “Just like particle colliders use multiple detectors to track individual events, we can use multiple detection to monitor overlapping reactions. Several overlapping, unlabeled sequence reactions can be detected in the same capillary while maintaining the distinction between each reaction. Therefore, we can serially inject A, T, G, and C reactions into the same capillary and collect data from all four simultaneously.”
The company’s beta testing sequencing system, Merlin, is a benchtop instrument employing LFII and serial injection technology. SNP analysis, QA/QC of new constructs, and STR analysis can be run on the same platform. “We follow a personal PC model. If current core facilities are mainframes, then our system is a laptop. Eventually, Merlins will be as common as PCR machines. With our technology, you can get a 200-base sequence in about one hour using a well-known cycle-sequencing procedure. This means that you are not dependent on the schedule and throughput capabilities of a centralized location,” adds Dr. Hassard.
The instrument does not require fluorescent consumables and is fully automated. The company plans to increase the throughput of the system by incorporating microfluidics and multiwell sample loading at the front end. Serial, multiplexed injection will allow at least a 10-fold increase in throughput. The company is also looking to develop a portable system for DNA genotyping, with specific emphasis on STRs.