Send to printer »

Feature Articles : Oct 1, 2013 (Vol. 33, No. 17)

CNV Strategies Get a Rethink

  • Richard A. Stein, M.D., Ph.D.

The discovery that 20,000–24,000 protein-encoding genes exist in the haploid human genome (a key finding of the Human Genome Project) fueled initiatives to map gene interactions and characterize the genetic circuitry in cells.

Radiation hybrid mapping, a strategy in which high-dose X-rays randomly introduce chromosomal breaks that shatter the DNA into tiny fragments, was initially used to generate high-resolution physical maps of the human genome.

The strength of this strategy is its ability to identify genetic markers that are close to each other. The closer two genetic markers are to each other on the chromosome, the more likely it is that they will be located on the same chromosomal fragment. Moreover, the frequency of breakage between markers can be used to reveal their order on the chromosome.

“We realized that we could ask a different question from the one that radiation panels had previously explored,” says Desmond J. Smith, M.D., Ph.D., professor of molecular and medical pharmacology at University of California, Los Angeles. Dr. Smith and colleagues proposed that radiation hybrid mapping data from mammalian cells could be used to delineate genetic survival networks for proliferation.

Central to this endeavor was the concept that if an extra copy of a gene may lead to cell death, this toxic effect could be blocked by an additional copy of another, distant gene. “And if that was true, we could expect to see the two genes co-inherited more often than expected by chance in the panel of radiation hybrid cells,” says Dr. Smith.

By looking at all potential pair-wise interactions between all the genes from the genome, investigators in Dr. Smith’s lab delineated an unbiased network of interactions involved in cell proliferation and survival, and subsequently applied this knowledge to address a question relevant to genetic circuits that underlie malignancies. Some cancers have a survival advantage as a result of copy number changes, such as the amplification of genes that increase cell proliferation and the deletion of genes that block proliferation.

“Most investigators examined these copy number variations in isolation,” says Dr. Smith. The question that investigators in Dr. Smith’s lab addressed extended beyond the simple characterization of isolated copy number variations (CNVs) in cancer. “We wanted to know whether the amplification of a gene affected in cancer is accompanied by the amplification of a distant gene somewhere else in the genome,” says Dr. Smith.

This strategy helped characterize the survival network of cancer cells, which is a subset of the survival network that is unveiled by the radiation hybrid data. In addition, this strategy bypassed one of the most significant challenges in characterizing copy number changes, the frequent involvement of multiple genes, which historically made it difficult to point toward the specific genes contributing to the resulting phenotypes.

“This is not the case for radiation mapping hybrid panels, where X-rays fragment the DNA and the resolution is very high,” says Dr. Smith. Overlapping the cancer interaction network with the radiation hybrid network provides opportunities to better understand, at single-gene resolution, the involvement of specific genes in disease. “We hope that these networks can ultimately be exploited for cancer treatment,” says Dr. Smith.

Detecting CNVs via NGS

In recent years, array comparative genomic hybridization (CGH) was recommended by several professional societies as a first-line test for the prenatal detection of CNV. “But this approach suffers of low resolution, and … it misses balanced chromosomal structural variants, such as translocations and inversions,” says Yu-Ping Wang, Ph.D., associate professor of biomedical engineering at Tulane University.

With the emergence of next-generation sequencing, investigators in Dr. Wang’s group turned their attention toward this approach as a way to improve the accuracy of CNV detection. “Next-generation platforms can detect CNVs with a higher resolution, which is unattainable with other approaches, such as array CGH,” says Dr. Wang. By using next-generation sequencing data based on a total-variation-penalized, least-squares model, the first time this statistical approach was used to analyze CNVs, Dr. Wang and colleagues developed CNV-TV (total variation). “We found this tool to provide higher accuracy and robustness for CNV detection,” says Dr. Wang.

Several depth-of-coverage, next-generation sequencing strategies are currently available to detect CNVs. In a comprehensive survey comparing six publicly available platforms, Dr. Wang and colleagues revealed that each of them presents specific strengths and weaknesses. In addition, some are superior to others for specific applications, underscoring the need to integrate multiple approaches to more robustly capture genetic variation.

While most approaches focus on detecting CNVs from individual samples, or by comparing CNVs from patients with disease with those from controls, Dr. Wang and colleagues applied non-negative matrix factorization to detect recurrent CNVs within a population, and showed that two ethnic groups can be distinguished based on differences in their CNV pattern clustering. “Technically there is still room to further improve our ability to detect CNVs, and we are currently developing a strategy to detect CNVs from multiple samples,” says Dr. Wang.

CNVs as Evolutionary Clues

“Structural variants in the human genome have mostly been studied for their clinical implications, but not a lot has been done to understand the evolutionary implications of these genomic segments,” says Charles Lee, Ph.D., scientific director of the newly created Jackson Laboratory for Genomic Medicine.

Work in Dr. Lee’s lab showed that the copy number of the amylase-encoding gene, AMY1, is under positive evolutionary selection. Fewer copies were found in populations consuming little starch, but the copy number increased to as much as 15 per cell in individuals from populations having a high intake of this carbohydrate. In contrast to humans, only two copies of the AMY1 gene were found in cells from the chimpanzee, a species that consumes very little starch.

More recent work in Dr. Lee’s lab, at the interface between structural chromosomal variants and evolution, identified an approximately 36 kB noncoding locus in the human genome containing transcribed and putatively regulatory sequences, which exhibits a copy number variant that is thought to predate the divergence, over 500,000 years ago, of the modern human and Neanderthal lineages.

While most CNVs, like single-nucleotide polymorphisms (SNPs), appear to be bi-allelic in a given population, CNVs have the potential of “mutating” faster, making their analysis more challenging. “They are also often embedded in complicated regions of the human genome, enriched for repetitive DNA and other genomic rearrangements, making them difficult to accurately genotype and thereby hindering their incorporation in most genetic studies,” says Dr. Lee.

While array CGH and next-generation sequencing continue to help unveil and characterize structural variants, insights into their evolution and mutation rates are accompanied by technical hurdles. “To understand how they arose, we need to accurately characterize the boundaries and content of each of these structural genomic variants at the nucleotide sequence level. We’ve gotten better at this over the years, especially for deletions, but we still have a ways to go for other structural variant types,” says Dr. Lee.

CNVs as Risk Factors

“In psychiatry, people have been interested in CNVs because they represent important risk factors,” says Judith L. Rapoport, M.D., senior investigator and chief of the Child Psychiatry Branch at the National Institute of Mental Health. While looking at a very rare, childhood-onset form of schizophrenia, Dr. Rapoport and colleagues found that a pediatric group of schizophrenia patients exhibits a higher prevalence of the 22q11 microdeletion, of approximately 5%, than any adult-onset population with schizophrenia that was previously studied.

The finding that the same 22q11 microdeletion is also present in children with other neurodevelopmental conditions promises to make the interpretation of sequencing data more challenging. “Approximately 80% of the children born with this deletion have some kind of severe disorder, whether it is a mood disorder, autism, obsessive-compulsive disorder, or delayed language development, and the possibility of conducting prenatal screening has enormous implications,” says Dr. Rapoport.

The 22q11 microdeletion, which involves an approximately 3 Mb chromosomal region, contains many genes. The large number of genes, together with the microdeletion’s involvement in several neuropsychiatric conditions, makes the identification of the microdeletion’s specific role in disease, and the elucidation of the molecular basis of pathogenesis, more challenging. “There are investigators currently exploring the effects that CNVs from our patients have on neurons grown by cellular reprogramming,” says Dr. Rapoport.

The identification and characterization of pathological CNVs has ramifications in terms of prenatal screening, particularly since many common structural variants are found in patients who appear to be clinically unaffected. While approximately 26% of the individuals harboring the 22q11 microdeletion will develop schizophrenia, a disorder for which few risk factors are known, approximately 75% of these individuals will not develop this condition. “But having an identical twin with schizophrenia confers a 50% risk, and this deletion represents, therefore, the second largest risk factor that currently exists,” explains Dr. Rapoport.

Pathogenic CNVs

“We developed an algorithm that helps determine the probability that a CNV is pathogenic,” says Ian D. Krantz, M.D., professor of pediatrics at the Children’s Hospital of Philadelphia. One of the most challenging aspects related to interpreting CNVs is that in many instances, they occur in apparently healthy individuals.

PECONPI software identifies pathogenic CNVs based on their gene content and frequency, and researchers in Dr. Krantz’s lab tested its ability to identify pathogenic CNVs on two genetically heterogeneous cohorts, one with sensorineural hearing loss and the other one with congenital heart defects.

“This adaptable software allows investigators to incorporate the parameters they want to use, and comb through hundreds of CNVs as they study complex traits or rare individual traits in birth defects,” says Dr. Krantz. Based on the variable parameters that investigators can select, the algorithm automatically ranks CNVs by priority.

As part of this work, Dr. Krantz and colleagues explored the possibility of evaluating recessive disorders by combining the analysis of pathological CNVs on one allele with next-generation sequencing of the other allele. “After finding a CNV, we are trying to incorporate an additional step to sequence the other allele and search for point mutations that are unmasked in recessive conditions,” says Dr. Krantz.

Probabilistic Methods

“Our background is in mathematics, and after learning from collaborators about the challenges in this area of biology, we wanted to apply probabilistic methods to improve the precision of CNV detection,” says Saman K. Halgamuge, Ph.D., professor and associate dean of the School of Engineering at the University of Melbourne. “Applying expertise from mathematics, computer sciences, and engineering would benefit many areas of biology, and researchers from these fields need to increasingly provide their input in helping solve problems, as this would exert a huge impact on society.”

A new method developed by Dr. Halgamuge and colleagues in collaboration with Dr. Jason Li of the Peter MacCallum Cancer Research Institute estimates CNV from whole exome sequencing datasets, based on the ratio of average read depths from tumor and normal tissue samples collected from patients. “Exome analysis provides a more specific way to differentiate between normal and tumor samples, and the data is more targeted,” says Dr. Halgamuge.

A key feature of the method is the use of discrete wavelet transform smoothing to reduce experimental noise from the sequencing data. “After this step, the Hidden Markov Model, a probability-based tool used in mathematics, is applied to detect copy number gains and losses,” explains Dr. Halgamuge.

A comparison between the proposed method and several other existing methods revealed that it outperforms them in terms of precision, but one of its shortcomings was the detection of small CNVs as noise. Addressing this shortcoming will require additional improvements. “Another challenge, and the next step in our efforts, will be to distinguish driver structural variations, which are more dangerous, from passenger ones, which are not causing the cancer,” says Kaushalya C. Amarasinghe, doctoral student and first author of the study.

Improved CNV Detection and Analysis

As significant contributors to genomic diversity, CNVs are thought to collectively account for up to three times more base pairs than all single nucleotide polymorphisms combined. The recent expansion in experimental genome-scanning technologies, along with the development of novel computational and biotechnological tools, promise to more reliably unveil and characterize structural variation in the human genome. These advances are marking a transformative moment in research and clinical medicine, and are bound to fill important gaps in our understanding of development, disease, and evolution.