With the advent and development of high-throughput technologies, DNA sequencing takes center stage as a powerful tool in decoding genetic information. However, the goal of sequencing many genomes in parallel can only be made by considerable reduction in sequencing costs. A price tag of $1,000 per three gigabases—the size of the human genome—has become the theoretical benchmark.
The current challenge is to reach a price-performance point that would enable previously impossible genome-wide studies. Making use of genomic information will require extensive validation of correlations between mutations and phenotype. This can be done either by large-scale association studies, bioinformatic predictions, or biological methods. Only then can genetic information be used to explain many facets of human life and society.
At the same time, rapid advances in sequencing methodologies are far ahead of current discussions of the ethical, social, and legal implications of genetic revelations. Anticipated benefits of whole genome sequencing are just starting to be weighed against real issues of prejudice, discrimination, and even potential efforts to create a “better human.”
Why Is It Important?
Although any two people are 99.9% identical at the genetic level, understanding the one-tenth of 1% difference is important as it helps explain differences in disease susceptibility and response to therapeutic drugs, toxic substances, and environmental factors.
Phase I of the International HapMap—a multiyear project aimed at mapping common human mutations by genotyping—identified one million SNPs. In Phase II, the HapMap will test another 4.5 million SNPs. Albeit a success in itself, the project served to re-emphasize the inherent limitations of genotyping in discovering critical variations in the human genome. Genotyping is able to find the disease-causing alleles when they occur in diseased population with a frequency greater than 5%.
However, when the frequency of an allele falls below 5%, genotyping is not enough. “Genotyping rests on the hypothesis that common alleles contribute to common diseases,” says Jonathan Rothberg, founder and the chairman of 454 Life Sciences (www.454.com). “What if very uncommon alleles contribute to common diseases? Only deep sequencing would be able to answer this question. The deeper the sequencing, the less frequent variant you can find. You need deep coverage to ensure the statistical likelihood of finding rare mutations.”
“Some rare variations will be important in understanding human diseases,” adds Jane Peterson, Ph.D., NHGRI associate director for the division of extramural research. “And whole genome analysis would be able to detect these rare changes in the whole genome.”
Some low frequency mutations are associated with Mendelian diseases (one gene, one disease) and affect about 1/10,000 to 1/100,000 of individuals (or less than 0.05%). These mutations are often heterogeneous and unique to specific families, thus requiring DNA sequencing for diagnosis.
Some of the less rare variations (0.05–5% frequency) are already known to contribute to complex multigene disorders such as diabetes, cholesterol disregulation, and cancer. “To find such rare mutations, we do not even need to study a diseased population. We can find them by sequencing coding regions of a normal diverse human population, independent of disease or phenotype, if the techniques used can give the deep coverage,” adds Richard Gibbs, Ph.D., director of the Human Genome Sequencing Center at Baylor College of Medicine (www.bcm.edu).
Baylor’s Human Genome Sequencing Center is promoting an ambitious program of sequencing all exons of 20,000 human genes of 2,000 individuals. Such sampling would offer a good chance of finding mutations present at the 0.05% level in the general population. This data can be used by others in studies of specific diseases.
Dr. Gibbs plans to reduce costs and time by sequencing pools of several individuals using the sequencing technology from 454 Life Sciences and its marketing partner and high throughput sequencing products co-developer, Roche Applied Science (www.roche-applied-science.com). “With the Sanger method you lose low-frequency mutations if you pool DNA because your sequencing sample represents an average of all amplified strands. Because with the 454 technology we sequence an individual DNA strand in each well, and because of the increased coverage, this method has the ability to detect very low frequency variations in heterogeneous samples,” says Dr. Gibbs.
What Does it Take to Get There?
The ultimate goal continues to be a de-novo sequence assembled with 99.9% accuracy, or no more than one error per 10,000 nucleotides, with essentially no gaps. The Sanger method remains the gold standard of sequencing, generating read length and data quality reportedly exceeding that offered by its competitors. In the past three years the costs of Sanger sequencing decreased to less than $0.6 per 1,000 bases, but this efficiency is enjoyed only by large genome centers.
Most of the new approaches are using methods other than Sanger. Direct methods determine each base of DNA individually. Indirect methods assemble DNA sequences based on the experimental determination of oligonucleotide content.
“Any upcoming technology has to demonstrate that it is cheaper and better than Sanger or has unique advantages,” adds Chad Nusbaum, Ph.D., co-director of the genome sequencing and analysis program at the Broad Institute (www.genome.wi.mit.edu). “Datatypes coming from new-generation sequencers vary quite significantly. Read length as well as quality varies dramatically, and because reads are short, assembly is a challenge. 454, for example, requires 20-fold coverage or more of their 100–200 bp reads to be able to assemble contigs of the useful size.
Shorter reads require even deeper coverage to assemble. By comparison, a typical draft assembly of 700-base reads (as generated by an Applied Biosystems instrument) requires only sevenfold coverage. On the other hand, one advantage of 454 is that with a single instrument you can set up a small-scale genome center with minimal investment in infrastructure and personnel.” According to the company, the estimated cost of 454 sequencing is about $0.5/1,000 bases.
“It is good to have different technologies, and eventually we may see some specialization according to the advantages of each technology,” emphasizes Dr. Peterson. “Sanger sequencing is able to read the homopolymeric regions that are still problematic with other methods. However, if DNA is not clonable then Sanger cannot be used. It is important to continue improving the Sanger process.”
The installed base of Sanger-based sequencers is estimated to be close to 10,000 instruments. According to the Association of Biomolecular Resource Facilities (www.abrf.org), the majority of the installed base resides at the core labs and university sequencing centers. At the same time, perhaps only 25% of all Sanger reactions are run at these facilities.
The cost of core sequencing has progressively decreased, and in 2005 was estimated to be approximately $7 per combined reaction and run (2006 General Survey of DNA Sequencing Facilities).
“This market is not going to disappear for a long time,” says Stevan Jovanovich, Ph.D., president/CEO of Microchip Biotechnologies (MBI; www.microchipbiotech.com). “Core facilities run a range of projects from simple clone verification to large-scale sequencing, and they are not going to spend millions of dollars to replace the existing infrastructure. However, they would look favorably at decreasing costs while continuing to use a well-established Sanger method.”
MBI is developing a miniaturized version of cycle sequencing and integrated clean-up on a reusable chip. Eventually, the cycle-sequencing reaction would take place in a total volume of 25 nL. Microfluidic circuits for DNA amplification and purification would be operated by microrobotic valves and pumps, also integrated into the chip. The resulting reaction could be run on the existing instrumentation.
“If core facilities were able to use 1/1,000 of the current materials, their sequencing cost could be less than a cent per base, bringing them to par with genome centers,” says Dr. Jovanovich. In the future MBI plans to etch the actual capillary array onto the chip.
Many recognize the value of inexpensive technologies for high-quality resequencing. Sequencing-by-hybridization (SBH) is based on high-throughput miniaturized sequencing on microarrays of specially designed overlapping oligos. SBH does not require determination of nucleotide positions experimentally, but instead derives the sequence information indirectly from oligonuclotide content.
Other advantages include high parallelism, long reads, and the ability to sequence heterogenous mixtures. While it is costly to make an initial array, the sequencing process itself is inexpensive and could be efficiently used for repetitive interrogation of the same DNA region.
“One of the areas where SBH would be very effective is the analysis of isolates of pathogenic bacteria,” offers Dr. Nusbaum. “With SBH you can efficiently sequence thousands of almost identical isolates. Genetic results combined with the data on virulence would give an insight into the molecular mechanism of pathogenicity.”
“However, SBH could run into problems if a target is substantially different from the reference,” adds Dr. Gibbs. “SBH also continues to struggle with signal-to-noise ratio. Adequate discrimination of all sequences under the same hybridization conditions is still a problem.”
Sequence information can also be extracted from single DNA molecules without amplifying DNA or incorporating the labels. The advantages of this method include high sensitivity, minimal use of reagents, and high parallelism. Nanopore sequencing is the most familiar model for this approach. “Theoretically, we can sequence at the speed of one million bases per second. This means that the whole human genome could be sequenced in less than one hour,” says Scott Collins, Ph.D., professor, University of Maine.
“In the electronic world such speeds are actually very low. A computer microprocessor operates at one hundred times higher speed.” Dr. Collins’ group is developing a silicon-based inorganic nanopore. As a DNA strand is threaded through the nanopore, four microscopic electrodes are used to detect the differences in tunneling current of each individual nucleotide.
“Our major obstacle lies in machining components in the nanometer range—1.8-nm nanopore and electrodes. At this scale the device components are only 6–7 atoms wide,” says Dr. Collins. Eventually all necessary electronic equipment could fit on a 1 mm x 1 mm disposable chip, which may cost only a few hundred dollars. Only a connection to a computer would be required to read the entire genome.
While nanopore technology is still in the future, a single-molecule sequencing technology developed by Helicos Biosciences (www.helicosbio.com) is close to commercialization. According to the company, its first-generation system, available in the second half of 2007, will be able to provide 103 price advantage over current Sanger sequencing methods and will produce 109 bases per day using sequencing-by-synthesis.
With the reads of only 25 bases, the assembly of contigs is still a challenge, at least initially limiting the scope of possible applications. But because each DNA strand is analyzed as a separate sequence, Helicos was able to develop a pair-end read strategy that involves sequencing the initial 25 base pairs of the template, followed by an extension with the predetermined number of dark nucleotides. The extended fragment, still bound to the template, is treated with another round of sequencing by synthesis. This process is a focus of a recently awarded NHGRI grant.
What Do We Do Once We Get There?
It is realistic to expect that within the next ten years, rapid low-cost sequencing of the human genome will become a reality. “Nearly 70% of expensive medical decisions are made literally on the spot. Rapid sequencing could provide essential information about individual genomes that could immediately affect bedside care,” says Rothberg.
“Sequence-directed choice of care will be especially critical for heterogeneous diseases like cancer. For insurance purposes, sequencing should decrease overall patient-related costs, including days in the hospitals or number of procedures. In this case, the cost of diagnostic sequencing could be substantial, perhaps several thousands of dollars, and still provide financial benefit for the insurers.”
Undoubtedly, genetic information could provide for great benefits to patients. At the same time, issues of fairness in the use of this information loom. “Consequences of the abuse of the genetic information are important to consider,” says Dr. Peterson. The U.S. DOE and the NIH devote 3–5% of their annual Human Genome Project budgets toward studying the ethical, legal, and social issues surrounding availability of genetic information.
At present, no federal legislation prevents discrimination of an individual on the basis of his/her genetic makeup.
Data on correlation of mutations and psychological and behavioral traits may lead to stigmatization and discrimination against the carriers, even though environmental factors play as much a role in the development of an individual as genetics. Comprehensive fetal genetic testing could contribute to increased embryo-selection decisions.
Because so few SNPs are needed to identify an individual, current views of privacy and anonymity may undergo drastic changes. A more immediate concern is that insurers might use the genetic information to deny or cancel coverage. If the concern about insurance availability is realized, universal insurance by the federal government may become the only solution.
“Science is moving way ahead of the ethics,” concludes Dr. Nusbaum. “We can’t stop the technological advancements but the gap keeps widening. It is our responsibility to understand the implications of our work and educate the public and elected officials so that a proper dialog can take place.”