Home Insights GEN Roundup: The Long and Short of NGS Sample Preparation

GEN Roundup: The Long and Short of NGS Sample Preparation

September 29, 2017

October 1, 2017 (Vol. 37, No. 17)

GEN’s expert panel explains the challenges that encompass Next-Generation Sequencing

Ever since next-generation sequencing (NGS) was introduced, its main advantage has been its ability to outpace Sanger sequencing. NGS, unlike Sanger sequencing, is an array-based technique. It can process millions of reactions in parallel, dramatically increasing sequencing throughput.

To gain this advantage in throughput, however, NGS had to sacrifice some degree of accuracy. NGS is less accurate than Sanger sequencing mainly because it relies on shorter read lengths. Basically, during sample preparation, NGS breaks genomic DNA into smaller fragments, providing the flexibility needed to realize various array-based approaches, but introducing certain complications. For example, the smaller fragments typical of NGS can be harder to align with reference sequences. Also, when smaller fragments are used, repetitive genomic regions are especially hard to analyze.

At first, the trade-off between speed and accuracy was easy to accept. Speedier NGS reduced costs, brought DNA sequencing to more investigators, and facilitated the development of new sequencing applications. Of late, however, the old trade-off is starting to try the patience of investigators who are pushing the limits of NGS.

Increasingly, NGS is being asked to handle samples of diverse origin, samples of dubious quality, and samples of vanishingly small size. And NGS is still being asked, as always, to process more material at lower cost, while delivering higher resolution and greater accuracy. To meet these demands, NGS is optimizing anything and everything it can. NGS is particularly busy optimizing various aspects of sample preparation, a multistep process that must skirt numerous pitfalls.

Some NGS platforms are being set up to accommodate longer reads, prompting modifications to sample preparation protocols. Shorter reads, however, remain optimal for many applications. Whether reads are long or short, NGS finds itself grappling with similar sample preparation challenges. As our panel of experts explain, these challenges encompass DNA preservation and repair strategies, target-enrichment approaches, suppression of reagent-induced artifacts, and data analysis.

GEN‘s expert panel

GEN: When samples are prepared for NGS, what challenges arise? How are these challenges being addressed by the life sciences industry?

Dr. Sismour: The biggest challenge facing NGS sample preparation is not associated with sample prep itself, but rather the rate at which sequencing technology advances. As sequencing technologies and their applications advance, the life sciences industry must quickly evolve to mitigate the new challenges and bottlenecks.

NGS sample prep tends to be a laborious, multistep process and increases in the number of samples and the precision needed for each step become problematic. As the cost of sequencing drops and throughput increases, usability and automation become important. As NGS technologies are increasingly being adopted in the clinic, precision and reproducibility become important. Industry is tackling their biggest challenge yet—generating end-to-end solutions for NGS sample preparation.

Dr. Corney: One of the main challenges in genomics is assaying more targets with less starting material. This problem was first addressed with the invention of multiplex PCR. The search for solutions continued apace with the development of microarrays.

Today, NGS technology can analyze entire genomes and transcriptomes of a single cell. Many developments in NGS approaches have resulted in this capability, including production of highly efficient ligases and polymerases, novel barcoding strategies with unique molecular indices (UMIs), and microfluidics/microdroplet-encapsulation technologies.

Despite this progress, many challenges remain. Many NGS approaches can only be applied to certain types of samples, have intricate upstream processing procedures, are costly, and require significant expertise to carry out.

Dr. Dimalanta: Perhaps the biggest challenge investigators face in preparing samples for NGS is providing the highest quality sequencing data from increasingly diminishing sample amounts, and from samples that are potentially damaged or degraded.

To address this challenge, developers have optimized reagent chemistries to maximize the efficiency with which DNA or RNA molecules are converted into sequencer-ready libraries. Recently introduced library preparation kits can produce high-quality, high-diversity libraries from extremely low amounts of input nucleic acids.

Additionally, products have been developed that can repair damaged DNA and reduce the error-inducing impact of from processes used during sample storage and downstream NGS workflows.

Dr. Kong: Suboptimal sample treatment, storage, extraction, and preparation can lead to poor downstream sequencing results. Eliminating artefacts while faithfully preserving sample genetic profile remains one of the biggest challenges in the industry.

For example, formalin-fixed paraffin-embedded (FFPE) treatment is known to cause cytosine-to-uracil deamination, making the use of an uracil-DNA glycosylase digest essential to eliminate such artefacts. In addition, all amplicon-based target-enrichment approaches are subject to PCR errors, which can be minimized by the addition of UMIs.

For liquid biopsy samples, where detection at extremely high levels of sensitivity is often required, preserving DNA integrity (without introducing interfering substances such as EDTA) is critical. Only careful deployment of a combination of such strategies will ensure the highest NGS data quality.

Dr. Cunningham: Investigators are always pushing the limits of sequencing by using increasingly lower quantities of DNA, including fresh or archived tissue specimen. Oftentimes, these limited DNA samples can be degraded or of lesser quality, which poses an additional challenge. Having samples that vary in both quantity and quality can make it difficult to produce highly diverse libraries to maximize the depth and uniformity of sequencing coverage.

To address this challenge, library preparation kits must efficiently repair and adapt DNA. We have optimized our library kits for subnanogram or damaged samples to retain more DNA sample during the preparation process, which helps minimize duplicates and adapter dimers. As a result, researchers have highly complex libraries with more useful sequencing data from less sample.

GEN: Do NGS sample-preparation techniques significantly differ depending on the sequencing technology, that is, short-read platforms vs. long-read platforms? How does this affect the sample prep process?

Dr. Sismour: NGS library preparation methods are correlated with the sequencing technology, but more importantly, with the type of ‘feature’ that is being sequenced. The goal is obviously to have prep-less sequencing, wherein one sequences the nucleic acids directly from a sample without any intermediate manipulation. At present, sequencing a single molecule via physical measurement of the DNA—nanopore sequencing is the only commercially available technology that offers these capabilities, and they are still somewhat limited.

In single-molecule sequencing, the molecule itself is what is being sequenced. In most other technologies, it is a collection of molecules (bridged PCR, emulsion PCR on beads, DNA nanoballs) that is being sequenced, which requires addition of adaptors to generate the ‘feature.’ Any manipulation of the sample nucleic acids generates bias, so we must manage these biases until technology allows for robust sequencing of single molecules requiring no library prep.

Dr. Corney: Short- and long-read platforms have significantly different sample prep workflow because of their very different sample quality requirements. Standard or ‘dirty’ nucleic acid preparations, even from degraded samples, are typically suitable for short-read sequencing. However, the bar is far higher for single-molecule, long-read technologies. Seemingly minor, and often overlooked, variables such as the growth phase of the source material or pipetting technique can dramatically affect the success of long-read sequencing.

Furthermore, whereas success of short-read sequencing can be reliably predicted based on spectrophotometry and measurements of DNA integrity, no technology exists to adequately identify inhibitors of single-molecule sequencing. Consequently, long-read sequencing can, at times, be more of an art than a science.

Dr. Dimalanta: NGS sample preparations do differ somewhat relative to the sequencing platform being used, with some variation in the process for adding a universal sequence to a diversity of molecules. The true variation is tied more to the specific application for which the sequencing technology is being used, which can differ greatly based on the type of sequencer.

For example, a long-read platform will likely be used for applications including de novo assembly and detection of structural variation, whereas short-read sequencing is more applicable to resequencing applications for variant discovery. The plethora of new applications for which sequencing data is being used continues to drive the diversity of NGS sample preparation methods.

Dr. Kong: Different sequencing platforms require different library preparation processes, but these differences are minor relative to other steps of the workflow in their impact on final data quality. For example, obtaining high-quality DNA or RNA sequencing templates may require extraction from different starting materials and hence, different sample preparation methods. With its inherent instability, RNA is more susceptible to sample storage and fixation conditions. Maintaining RNA integrity can be critical in obtaining high-quality RNA for sequencing.

In addition, the mechanical, chemical, or enzymatic shearing needed for DNA fragmentation also has advantages and limitations. Template length, uniformity, cost, and speed are key factors in considering each approach.

Above all, a fully integrated and well-optimized bioinformatics pipeline is key to obtaining meaningful sequencing results, irrespective of sequencing platform.

Dr. Cunningham: Although the fundamental goal of NGS sample preparation is to efficiently convert DNA fragments into functional library molecules, each different sequencing technology has unique considerations.

For example, short-read sequencing libraries contain similarly sized adapter dimers that can be challenging to remove prior to sequencing. Alternatively, long-read sequencing libraries can be easily differentiated from adapter dimers, but require minimal handling and damage to maximize the length of library inserts. In either case, library preparations maximize the amount of retained DNA while providing a quick and easy workflow.

By providing library preparation kits available for both short- and long-read sequencing technologies, we try to help researchers minimize hands-on time while still providing high-complexity libraries for high depth of coverage and uniform representation of the genome.

GEN: How important are target -enrichment strategies to overall sequencing analysis results?

Dr. Sismour: Any kind of sample manipulation generates bias, and target enrichment is no exception. GC bias, molecular conversion rates, binding capacity and/or efficiency, and many other factors influence sequencing results. Bias is never good, so the question becomes, “Why are we doing targeted sequencing?”

Sequencing costs have been decreasing faster than Moore’s law for the past decade, so we must look at applications for the answer. A genome needs to be sequenced only one time, and costs are decreasing. Some are looking toward sequencing whole genomes or exomes regardless of the current utility of the information. Because the one-time cost for gaining information is so low, some feel that utility can wait.

When we look at routine sequencing applications for monitoring health (liquid biopsy, etc.), however, we see that targeted sequencing is necessary to obtain the appropriate throughput to serve customer and patient needs. Thus, variation within target enrichment techniques is becoming more relevant.

Dr. Corney: It is well known that the cost of sequencing has decreased faster than Moore’s law, enabling investigators to sequence genomes at a fraction of the cost of just a decade ago. However, population-scale whole-genome sequencing is still too costly for many investigators. For these reasons, target enrichment offers a good compromise.

Deeper targeted sequencing allows for detection of rare cellular subpopulations, such as those found in many cancers, and permits for longitudinal monitoring of residual disease. Two main types of target capture exist, multiplexed PCR and probe capture. These approaches help define target region size, starting material requirements, time spent preparing libraries, types of variants detected, and of course, cost.

Dr. Dimalanta: Target-enrichment strategies can impact the quality of sequencing analysis greatly, and this is driven primarily by the particular mechanism of enrichment used. PCR-based strategies, where enrichment is driven solely though the amplification of targeted regions relative to the remaining genome, can produce seemingly high depths of coverage, but much of the coverage is the result of multiple duplicate copies of the same starting molecules, which does not necessarily help variant calling algorithms. For hybridization-based strategies, the specificity of enrichment is difficult to maintain as the amount of targeted territory decreases, which can impact the ability to generate even depths of coverage required for uniform variant calling.

Dr. Kong: A well-designed gene panel that targets only the genomic regions relevant to a specific research case, can be invaluable to generating meaningful insights from NGS.

There are two types of approach to target enrichment: hybridization-based and amplicon-based. The latter is often favored for its simpler workflow, faster TAT (turnaround time), and higher sensitivity. However, the main challenge lies in the design of strategically placed primer sets to capture desired variant positions, while ensuring uniform amplifiability and minimal dimerization.

Primer design is especially important in the detection of fusion genes with multiple possible sites for translocation. In such cases, thorough primer design covering all potential loci followed by rigorous testing are required to ensure performance and minimize false-negative results.

Dr. Cunningham: Target-enrichment strategies are very important because they enable cost-effective high-throughput sequencing for many applications. Whether performing low-frequency variant detection or CpG island-methylation analysis, enrichment methods are a key component of genetic discovery. Currently, sequencing and analyzing entire genomes remains prohibitively costly and exceeds throughput limits.

Many researchers want to develop more disease-focused panels and begin screening larger samples (thousands per study) for clinically significant gene regions or known variants. They also want to increase biological or statistical replicates per sample for more comprehensive analysis.

Our technology is compatible with multiple enrichment methods, allowing researchers to pursue both whole-genome and targeted approaches. Additionally, our new 768-plex indexed adapters enable higher multiplex sequencing to both further reduce ‘per sample’ costs and increase throughput per run.

GEN Roundup: The Long and Short of NGS Sample Preparation

Single-Cell Cloning Remains a Challenge

New Approach to Nanopore Sequencing That Is Sure to CATCH Your...