January 1, 2018 (Vol. 38, No. 1)
Sequencing Enters the Realm of Copy Number Variations
Although array comparative genomic hybridization (aCGH) is currently established as the gold standard for copy number variation (CNV) detection, next-generation sequencing (NGS) is increasingly being used to identify CNVs along with single nucleotide variants (SNVs) simultaneously. Using NGS to identify CNVs and SNVs could be of great benefit to laboratories, saving time and reducing costs while creating a more comprehensive picture of genomic variation with a single assay.
Rapid and cost-effective sequencing of whole genomes, whole exomes, or targeted regions is currently widely adopted. Studies are gradually yielding more robust simultaneous detection of CNVs and SNVs, particularly in targeted NGS panels using hybridization capture approaches. Such panels offer excellent coverage across targeted regions due to highly optimized bait designs, and are more cost-effective than whole-genome approaches.
Here we discuss various methods and their suitability for reliable CNV detection in clinical research. We also take a look at some of the studies yielding reliable CNV calling from NGS data.
How to Reliably Detect CNVs?
Historically, CNVs were studied via cytogenetics techniques such as fluorescence in situ hybridization and karyotyping, but these methods only allow the identification of large CNVs (>5 Mbp). As newer techniques such as aCGH evolved, finer analysis of CNVs became possible, uncovering both the presence and relevance of smaller CNVs.
Previously the detection of these small CNVs—known to be relevant in a range of Mendelian diseases—relied on methods such as Sanger sequencing or multiplex ligation-dependent probe amplification (MLPA), which are limited in the number of loci they can target. Today, developments in aCGH technology have enabled detection of single- and multi-exon aberrations, down to as small as 250 bp, across a wide range of genes with a single assay—hence its use as the current gold standard for CNV detection.
As aCGH has evolved, NGS has also emerged as a technology with the capability to detect both CNVs and SNVs in a single assay. However, CNV analysis via NGS has not been straightforward, with aCGH techniques regularly being used side-by-side in a clinical setting.
Reliable CNV calls from NGS data depend on high depth and uniformity of coverage across all target sites—something that is not always easily achievable in a cost- and time-effective manner. Additionally, a robust bioinformatics approach is required that can scale with the size and complexity of the data set. Numerous factors therefore need to be considered when choosing an appropriate NGS method for CNV detection.
Choice of NGS Method
The size and complexity of the genomic regions of interest will largely determine which NGS method should be used. Whole-genome sequencing (WGS) offers the potential to capture all genetic variation including CNVs, SNVs, and structural changes such as inversions and translocations in a single assay. WGS has the benefit of producing a comprehensive, unbiased, and uniform picture, making it simple comparatively to call CNVs from the data.
Accurate CNV detection, however, requires a high depth of coverage, and for WGS, a sequencing depth of 40–50× is recommended for confidence in results. As most WGS is performed at 15–30×, this represents a significant increase in costs. Additionally, significant bioinformatics challenges arise from the sheer quantity and complexity of data produced. Such cost and accessibility barriers still need to be overcome before WGS can be adopted as a routine assay for CNV detection.
In contrast to WGS, targeting all exons with whole-exome sequencing (WES) considerably reduces costs and complexity of data analysis. WES is primarily used for analysis of SNVs and indels (CNVs ~1–100 bp), but approaches are steadily improving to provide data suitable for larger CNVs.
Problems with WES primarily arise from differences in probe hybridization and efficiency, which introduce bias and noise, affecting the uniformity and consistency of coverage across targets. As WES covers many different regions, it can be particularly tricky to optimize for uniformity of coverage, and increasing depth of coverage can considerably increase sequencing costs.
Targeted NGS panels focus on individual genes or specific regions of interest. This method supports the detection of known and novel variants within targeted regions, but unlike WES and WGS, it requires previous knowledge of relevant areas of the genome.
Advanced bait designs in some commercially available hybridization-based NGS panels offer high uniformity of coverage of targeted regions, and the targeted nature results in lower costs when increasing depth, opening up the possibility for reliable CNV calling. Panels can be customized and optimized for different regions and sample types, allowing determination of CNVs and SNVs from NGS in a more cost-effective manner.
Targeted Approach
The following studies demonstrate that combined with an appropriate bioinformatics approach, targeted NGS panels are capable of robust calling of both CNVs and SNVs for a range of genetic disorders—confirmed by and concordant with gold-standard aCGH.
CNVs across any of the 79 exons of the dystrophin gene, or DMD gene, can cause severe muscle wasting, resulting in an X-linked recessive disorder known as Duchenne muscular dystrophy. A study of 50 samples with confirmed copy number status, including 29 known CNVs, were analyzed by aCGH and a targeted NGS panel. Using similar approaches to aCGH design, hybridization baits were optimized to give even coverage across all exons. All known CNVs within the DMD gene for the sample set were detected by the NGS panel, ranging from single exons to larger regions covering multiple exons, with 100% concordance with aCGH results (Figure 1, A & B).
Monogenic Mendelian autosomal dominant mutations in genes involved in low-density lipoprotein cholesterol (LDL-C) uptake and degradation, primarily LDLR, PCSK9, LDLRAP1, and APOB, can result in familial hypercholesterolemia (FH). The condition is characterized by elevated blood cholesterol increasing the risk of cardiovascular disease. Multiple point mutations are involved in FH, and in ~10% of cases, intragenic CNVs are routinely detected by MLPA. Variants in related genes and SNPs can also be indicative of polygenic hypercholesterolemia, and these can all be targeted simultaneously by NGS.
In this study of 48 samples known to include five CNVs confirmed by MLPA, NGS panels were designed using hybridization-based enrichment to detect CNVs as well as SNVs in a single procedure, covering LDLR, PCSK9, LDLRAP1, and APOB.
The panel provided excellent depth and uniformity of coverage across the targeted region, greater than 500×, detecting all the CNVs and clearly identifying a mid-exon breakpoint (Figure 1, C–E).
Conclusion
The ability to determine CNVs from NGS data enables laboratories to alleviate the burden of multiple assays—painting a clearer picture of genetic variation in a sample earlier in the analysis process. It is clear that while WGS and WES are powerful techniques, limitations exist for their routine use. Moreover, it has been demonstrated that accurate CNV calling from NGS data can be carried out with commercially available hybridization-based targeted panels.
Producing accurate, reliable data depends on high depth of coverage, consistent uniformity of coverage, an appropriate bioinformatics approach, and accessibility in terms of cost and speed of testing. The combination of CNV detection via targeted NGS side-by-side with aCGH can be a formidable, powerful, and comprehensive tool for genetic analysis.
John Cousin ([email protected]), is product manager for cytogenetics and rare disease at Oxford Gene Technology.