With the tidal wave of next-generation sequencing sweeping through the landscape of biomedical sciences and technology, researchers are inundated with a flood of data.
Yet the data will only be as useful as the quality of the libraries from which it is generated, and can be no more informative than the use to which it is put.
Researchers at the “International Plant & Animal Genome” (PAG) conference presented their insights and opinions on such issues as how to prepare a DNA library from limited, degraded, or even an overabundance of material, as well as how best to piece together the information once it’s obtained.
When there is a lot of high-quality DNA to start with, it’s relatively easy to make a good sequencing library. Yet in too many cases only a limited quantity is available. New England Biolabs (NEB) has optimized its new NEBNext®Ultra DNA Library Prep Kit to address this issue—“you can use it with as little as 5 ng of human genomic DNA to make a library, and the quality of the library is very good,” said Pingfang Liu, Ph.D., application and product development scientist.
Another advantage for this kit is that it can be used with both intact DNA and fragmented DNA, like FFPE-preserved biopsies or circulating plasma DNA, she said, adding that other NGS library preparation kits on the market claiming facility with low DNA input require that DNA to be intact. “We have a lot of customers with precious samples, and they have only one shot to make a library.”
Being a company that specializes in enzymes, NEB focused on optimizing the enzymes and conditions found in the library prep kit. For example, it developed a polymerase for the PCR amplification with very good efficiency with all types of amplicons—including GC-rich and AT-rich—which boasts about 100-fold greater fidelity than Taq. The cleanup step between different enzyme reactions was also minimized, “because with a small amount of DNA, if you need to transfer from one tube to another tube you tend to lose a lot of your substrate,” Dr. Liu said.
The ligase master mix has been specially optimized to ligate the types of ends that are present in Illumina library prep methods, and at this point the kit is being supplied with adapters for Illumina library prep. It is compatible with the workflows of other sequencing platforms such as the Applied Biosystems’ 5500 Series SOLiD and Roche 454 platforms that use single-base TA overhangs as well, “although the sequences of the actual adapters are different between the different platforms,” she said.
Dr. Liu discussed results from a collaboration with Cynthia Hendrickson, Ph.D., of the Genomic Service Lab at HudsonAlpha Institute for Biotechnology, on exome capture. Using the NimbleGen SeqCapEZ Human Exome v3, Dr. Hendrickson was able to obtain virtually the same results starting with only 100 ng of genomic DNA using the NEBNextUltra DNA Library Prep Kit as with 2.5 µg genomic DNA using a traditional protocol.
This is all the more impressive when keeping in mind that “you will lose 99% of your library because the human exome is only 1% of the whole genome,” Dr. Liu remarked.
Divide, Skim, and Conquer
Ending up with only 1% of the genome has the obvious advantage of needing to sequence and analyze that much less DNA.
David Edwards, Ph.D., professor at the University of Queensland, used different means to address the challenges of the large and complex wheat genome, which is six times larger than the human genome. “It’s 80–90% repetitive. It contains three genomes—it’s hexaploid (as opposed to humans that are diploid). You can imagine the challenge of trying to sequence something like that!” he exclaimed.
One of Dr. Edwards’ collaborators isolates individual arms of chromosomes in microgram quantities, dissecting this complex genome into manageable pieces, reducing the complexity of assembling sequence data. “We’ve sequenced both arms of chromosome 7 from each of the three genomes now, and each one is the size of a rice genome,” he said.
Another way they reduce complexity is by using a skim-based genotyping-by-sequencing method—that is, sequencing at very low density, and calling the SNP where it matches a known polymorphism on the reference genome. “The advantage is that it’s essentially dial-able, you only need a very small amount of sequence data if you’re doing trait association,” he explained. “You only need very low coverage and it’s very cheap.”
Increasing the amount of data generated, on the other hand, yields a very high density of SNP genotypes, more than 3 million SNPs on chromosome 7 alone, which allows the group to examine the reference genome itself and compare haplotype blocks. If the haplotype block breaks down, part of the genome may have been mis-assembled or rearranged.
“So we’re using this high-density skim sequencing—that’s almost a contradiction in terms—to go through and validate and to fix genome assemblies,” Dr. Edwards said. “We’ve also used it for trait mapping, which is very straightforward and provides a physical rather than a genetic location, useful to identify candidate genes.”
They looked at the relationship among the three genomes (termed A, B, and D) to determine the impact that early farmers had on bread wheat. Most genes are conserved among the three genomes, and the differential gene loss that was found supported current theories of wheat’s evolution and domestication. The A and B genomes came together about 50,000 years ago—“the gene networks that are lost relate to it being grown in the wild,” Dr. Edwards explains.
The tetraploid wheat was then domesticated and dispersed by migration up through the Middle East into what is now southern Turkey where it came into contact with D genome wheat about 10,000 years ago. He said, “this was presumably growing in the same field as the domesticated tetraploid, and that formed a hybrid hexaploid wheat that became the bread wheat we eat today. The types of genes and the gene networks that are lost are really quite different, and this tells us that the new bread wheat was under very different selective pressure.”