Dan Koboldt
An incomplete list of scientific endeavors that can be performed with the current WGS technology.
As 2014 draws to a close, I can’t help but speculate about the face of next-gen sequencing, genetics, and genomics in 2015. Illumina announced their plans for HiSeq X Ten “factory installation” sequencing system way back in January. It’s taken some time before the early adopters of this new technology have it up and running. But it seems reasonable to expect that several Illumina X Ten systems will be operational in 2015. Each one of those has the capacity to sequence 18,000 human genomes per year. As I wrote about recently, the transition to large-scale whole genome sequencing will bring many challenges.
Let’s set the difficulties aside for now and ask a more interesting question. What kind of scientific endeavors could we undertake with this new capability? Here are a few ideas.
1. Newborn and Pediatric Disease
Newborn intensive care units and children’s hospitals see many patients with severe, sometimes fatal diseases that have a genetic basis. Some of these are known genetic disorders, correctly diagnosed and confirmed by clinical genetic testing. A considerable number, however, resemble known diseases but affect patients with negative genetic test results. Numerous pilot programs, like the NIH’s Undiagnosed Disease Network, are using exome sequencing to cases like these. On average, exome sequencing uncovers a pathogenic mutation in 25-30% of cases.
Whole-genome sequencing is the natural next step: it can survey exonic regions that are poorly captured, and be used to detect structural variants. Now, with the X Ten system, whole genome sequencing might be the logical first step. It has a faster turnaround time, no hybridization required, and it surveys everything from single nucleotide variants to large deletions. Ideally, sequencing would be performed on the patient, both parents, and a sibling (if available).
2. Drug Trials and Pharmacogenomics
One of the great promises of genomic research is personalized medicine: tailoring disease treatments to an individual’s genetic makeup. Getting there will require studying the genetic variation underlying disease prognosis and pharmaceutical response. Many such pharmacogenomics projects are under way, though most are employing SNP arrays or targeted sequencing. Whole genome sequencing would better empower these efforts, since it would capture a much broader scope of variation that might contribute to the response.
WGS might even provide a useful front-end tool for clinical trials, where it might be used to stratify patients based on their likely response to the drug being studied.
3. Regulatory variation and eQTLs
One of the many payoffs of the International HapMap Project was that it characterized genetic variation in fibroblast cell lines that could be ordered from Coriell for subsequent experiments. With all of the SNP genotypes in hand, researchers could assess gene expression — initially with microarrays, and later with RNA-seq — and then correlate it with genetic variation. These types of studies yielded thousands of expression quantitative trait loci (eQTLs), along with insights into how genetic variation influences transcription.
Imagine a state-of-the-art study involving RNA-Seq and WGS from the same tissue sample (the RNA-seq would have to be done on another platform, like the HiSeq2000, since the X Ten can only be used for WGS). Studies from the ENCODE Project Consortium and other groups have revealed just how pervasively transcribed the genome appears to be. Undoubtedly there is sequence variation that influences gene expression but isn’t well-captured by SNP arrays.
4. Rare Tumor Types
Large-scale cancer sequencing efforts such as TCGA and ICGC have catalogued somatic mutations in a variety of common cancer types. Most of these projects had both an exome sequencing and a whole-genome sequencing component, but due to the cost, the majority of cases got exome sequencing. Even so, these studies have been incredibly useful for identifying recurrently mutated genes and pathways.
Notably, however, these efforts have targeted primarily common cancer types. There are many good reasons for this, but with low-cost whole genome sequencing I think that we can explore the whole genomes of rare tumor types as well. With TCGA, ICGC, and other datasets as a framework for comparison, we can undoubtedly learn a great deal about the somatic changes underlying rare tumor types. It could not only help the patients affected, but will give insights into what must be very unique biology.
WGS is the right tool to study these kinds of tumors because we know less about them: it will capture the full spectrum of mutations, from single base changes to large chromosomal rearrangements, in a single experiment. Then again, we’ve always been a proponent of WGS for cancer so this suggestion shouldn’t surprise anyone.
5. Clan Genomics: Family Disease Pedigrees
This may sound similar to application #1 (newborn/pediatric sequencing) but it’s a different kind of study that taps into a unique resource: multiplex pedigrees from families affected by genetic disorders. Family-based studies seemed to fall out of fashion a little bit with the rise of the case-control study, but they’re making a huge comeback now for a variety of reasons. Obviously there’s considerable power to detect variants contributing to disease in a family with segregating alleles (rather than unrelated individuals).
Also, WGS remains too expensive for case-control studies at the scale required to pick up low-effect and/or rare associated variants. With a large family pedigree, you can do linkage analysis but usually still need sequencing to pinpoint the causal mutation. WGS is attractive here, because it enables you to look at noncoding and structural variants in linkage regions, rather than taking a gene-centric approach. This is absolutely necessary: just ask any gene hunter to tell you about that huge linkage peak they have in a region without any annotated genes. There are countless examples.
6. Large Cohorts with Extensive Phenotyping
Samples from large, well-phenotyped cohorts have always been in high demand for genetic studies. Many of them have been surveyed with SNP arrays and more recently exome sequencing. Over time, many cohorts grow both in the number of participants and the amount of phenotype data collected. Large-scale, longitudinal studies of complex traits are essential for pinpointing the underlying genetics.
Even with the HiSeq X Ten, WGS remains too costly to be applied to everyone 10,000 sample cohort. Yet a pilot study of 200, 500, or 1,000 samples may be feasible, and may uncover results that can be replicated in the larger cohort. If it were up to me, I’d select the subset of samples with the most extensive phenotype data — biomarkers, clinical measurements, RNA-seq, health records, etc. Deep phenotypes combined with WGS seems like a very powerful combination indeed.
How Would You Apply WGS?
I’ve offered a few suggestions here, but there are undoubtedly other applications of WGS that should be considered in the light of the new X Ten system. What kinds of studies would you apply it to? Please leave me a comment and let me know. By the way, one of those Illumina HiSeq X ten installations is here at WashU.
So if you have a cohort and are looking for whole-genome sequencing, we should talk.
This article previously appeared on Dan Koboldt’s Massgenomics blog. Dan leads the human genetics analysis group of the Genome Institute at Washington University. He started the Massgenomics blog in 2008 to write about next-generation sequencing and medical genomics in the post-genome era. Website: www.massgenomics.org.