Send to printer »

Feature Articles : Aug 1, 2013 (Vol. 33, No. 14)

Diving Deep with Array CGH

  • Richard A. Stein, M.D., Ph.D.

While single-nucleotide polymorphisms were an initial focus, the subsequent discovery of copy-number variation unveiled a new dimension of genomic diversity, and made apparent the need to develop more refined technologies to capture chromosomal rearrangements.

Comparative genomic hybridization (CGH) provided the opportunity to detect chromosomal imbalances in a high-throughput manner, at the genome-wide scale, and with higher resolution than with previously available methods, such as G-banding and fluorescence in situ hybridization. At the same time, the wealth of CGH data generated gave rise to several challenges.

“Identifying the genomic changes that are significant is a challenging aspect of CGH array data analysis and interpretation,” says Kenneth J. Craddock, M.D., a cytogeneticist at Toronto General Hospital.

While copy-number abnormalities can be found in nearly all medical conditions, their significance must be explored individually, particularly because some variants simply reflect normal inter-individual variation, while others may be of unknown significance. “At this time, the findings for which we know the significance are a minority, and this constitutes another huge challenge in the field,” Dr. Craddock says.

With the generation of more complex datasets, the availability of computing power is becoming increasingly important in analyzing and understanding the significance of structural changes in the chromosome.

“Support from informaticians and computer programmers is not very prominent in genetics labs that are testing for various diseases, including cancer, but with the newly emerging technologies, these aspects will need to be addressed,” Dr. Craddock says.

Importantly, this support will be crucial in helping differentiate physiological structural variants from the ones that are pathologically significant. “This will also help connect the data to existing databases and increase automation, something for which most hospitals currently do not have the necessary resources,” he adds.

“While we have a great capacity to sequence genomes in a high-throughput manner at lower costs, we still lack accurate computational algorithms to detect copy-number variations from sequencing data,” according to Santhosh Girirajan, Ph.D, assistant professor of biochemistry and molecular biology at Penn State University. Dr. Girirajan and his colleagues used array CGH to visualize chromosomal rearrangements in several developmental disorders.

For a recent effort, they examined a cohort of more than 2,300 children who had copy-number variants associated with intellectual disability, finding that harboring two large copy-number variants of unknown significance is associated with an over eightfold higher likelihood of developmental delay. This finding suggested that multiple copy-number variants may interact with one another to shape the clinical presentation in complex diseases, and explains previous reports of phenotypic heterogeneity, when dissimilar clinical presentations were described in individuals harboring identical chromosomal abnormalities.

More recently, Dr. Girirajan and colleagues examined the global load of chromosomal deletions and duplications in autism, a heterogeneous disorder that, based on 2013 estimates by the Centers for Disease Control and Prevention, affects one in 50 schoolchildren in the U.S.

This study revealed an approximately sevenfold increase in duplications and a twofold increase in deletions in children with autism, and pointed toward the relationship between an increased genomic load of copy-number variations, particularly duplications, and the risk to develop this condition. “This also points toward the need to understand the impact of deletions or duplications of chromosomal regions harboring tens of genes, as opposed to the more simple genetic mutations that we often talk about in human genetics,” Dr. Girirajan says.

“CGH, a very well understood and accepted test, provides a cost-effective way to find copy-number variants, but interpreting the results and figuring out what the variants mean, is currently the major challenge,” says Robert L. Nussbaum, M.D., professor of medicine and chief of the division of medical genetics at the University of California, San Francisco.

Complex Disorders

One of the challenges stems from the fact that some copy-number variations, even the large ones, might not have a particular impact on the phenotype. On the other hand, some symptomatic patients were shown to harbor copy-number variants that are present in parents who might not be affected by the disorder.

“That might suggest either that the copy-number variation does not have anything to do with the condition, or that there is an additional factor that interacts with it, such as a mutation, which may be in another gene or on another chromosome,” Dr. Nussbaum says. The challenges are more pronounced when testing is performed in a prenatal setting, due to the significant difficulties in predicting the severity of a specific disease whose onset may be years or decades in the future.

Array CGH is reshaping many biomedical areas. One of these is preimplantation genetic diagnosis, which entails a small biopsy that is performed to remove one of the 6-8 cells of a three-day embryo formed after in vitro fertilization, prior to its implantation into the uterus. This widely performed procedure revolutionized assisted reproductive technologies but, like any medical procedure, is not devoid of risks.

“For the first time, we were able to obtain DNA from the embryo and perform genetic analyses without removing cells from the blastomere,” says Simone Palini, Ph.D., senior clinical embryologist of the research group directed by Carlo Bulletti, M.D., in the physiopathology of reproduction unit at Cervesi General Hospital in Cattolica, Italy.

Together with Luca Galluzzi, Ph.D., professor of recombinant and molecular biotechnology, and Mauro Magnani, Ph.D., professor in biochemistry in the department of biomolecular science at the University of Urbino, the investigators demonstrated that fluid removed from blastocel cavity of a five-day-old human blastocyst can provide sufficient DNA to perform genetic analyses.

Genomic DNAv was found in approximately 90% of the blastocyst fluid samples that were collected during the vitrification procedure, as part of the cryopreservation of the embryos obtained by in vitro fertilization.

“In this approach, the embryologist places the needle only 7 microns inside the blastocyst, which is 250–300 micrometers in diameter, and preimplanatation genetic testing is performed without disturbing the embryo, in a procedure that is comparable to the intracytoplasmic sperm injection,” says Dr. Palini.

The 0.3–0.5 nanoliters of fluid that are retrieved during this process contain a median of 9.9 pg DNA, which can be used to perform various types of tests. “For most labs, a thermocycler is like a coffee machine, and virtually all labs can afford to perform PCR and look for mutations in specific genes of interest,” say Dr. Palini and Dr. Galluzzi.

In a proof-of-principle experiment, Dr. Bulletti and Dr. Magnani’s groups used the DNA isolated from the blastocyst fluid for whole-genome amplification and array CGH analyses, illustrating the possibility to detect several aneuploidies and to determine the sex of the embryos.

“CGH, a very important tool for reproductive medicine, will most likely see further improvements in the future, and it assumes an important role in preventive medicine, even if more studies should be performed to validate the protocol,” Dr. Palini says.

Evolutionary Applications

Gene duplication is a major mechanism behind evolutionary change. With this in mind, Dr. Sikela in collaboration with Dr. Jonathan Pollack at Stanford University, focused on finding genes that have been highly duplicated along specific primate lineages in several species, including humans.

“We used gene-based array CGH genome-wide to identify genes that are important for human and primate evolution,” says James M. Sikela, Ph.D., professor of biochemistry and molecular genetics at the University of Colorado Denver.

This marked the first time genome-wide array CGH analysis was applied to perform a cross-species comparison of humans and other primates. “In the analysis that we performed, cDNAs corresponding to virtually all human genes had been deposited on the arrays, so this was a cDNA array, as compared to the typical oligonucleotide array. This had the advantage that we could survey the copy number of virtually all human genes in each experiment,” Dr. Sikela explains.

This strategy revealed that approximately 140 genes were specifically changed in the human lineage, represented and were present in more or fewer copies in humans as compared with nonhuman primates.

One of the strongest signals was from a gene that encoded multiple copies of DUF1220, a protein of unknown function. Further analysis revealed that DUF1220 was present in many more copies in humans as compared to the genome of any other primates. Subsequently, Dr. Sikela and colleagues found that, in brain, DUF1220 shows a neuron-specific preferential expression in cerebellar Purkinje cell bodies and dendrites, in neurons from the cortical layers of the hippocampus, and in the neocortex, the region responsible for higher cognitive functions.

More recent work from Dr. Sikela’s lab unveiled strong associations between the DUF1220 copy number and brain size across primates, and showed that more copies are associated with a larger brain size, also implicating DUF1220 in certain human conditions characterized by brain-size pathology. Thus, DUF1220 copy-number variation has emerged as an important evolutionary factor and, additionally, it contributes to physiological and pathological brain size variation.

“The DUF1220 story all began with our applying array CGH in a novel way. Array CGH certainly played a key role in revealing these findings,” says Dr. Sikela.

Experimental Design

While array CGH found broad applicability in clinical and research laboratories, the statistical models of experimental design and data analysis remain an area that received relatively limited attention.

One of the most frequent protocols, the two-color labeling approach, involves the separate labeling of a sample of interest with a reference sample using two fluorescent dyes. After the labeled samples are mixed and hybridized on the array, copy-number differences are calculated as the fluorescence ratio of the sample of interest with the reference sample. The additional observation that probe-specific dye biases may be a source of artifacts also opened the necessity to use the dye-swap design, in which each sample is separately labeled with each of the two dyes. A significant drawback of the reference design is that half of the samples are used for measurements that ultimately do not present biological interest.

“This approach also unnecessarily increases the cost of the study,” says Jeanette E. Eckel-Passow, Ph.D., associate professor of biostatistics at the Mayo Clinic. Dr. Eckel-Passow and colleagues recently provided experimental evidence supporting the use of an off-chip reference sample to provide a more cost-effective experimental design. This allows the sample size to be doubled, a particularly attractive option for genome-wide association studies, for which the statistical power is a crucial consideration.

“In research settings, it is really important to have as much statistical power as possible, and in order to do that, it is necessary to analyze more samples,” Dr. Eckel-Passow says. In this approach, an average of the reference sample that is labeled with both fluorescent dyes can be used for all clinical samples that are examined. “This would decrease the costs as well,” she adds.

Interpretation Infrastructure

“The infrastructure that we built enables laboratories to store information on genetic variants found in genes associated with disease, along with their detailed interpretation,” says Samuel J. Aronson, executive director of IT at Partners HealthCare Center for Personalized Genetic Medicine. The center’s IT platform, GeneInsight, provides the infrastructure to manage the interpretation of genetic information used in patient care, and report the data to healthcare providers.

Because of the dynamic nature of genetic knowledge, genetic variants of unknown significance at the time of testing may be discovered to be clinically relevant at a later point in time.

“It is important to ensure that new information reaches treating clinicians as soon as possible after variants are reclassified. GeneInsight is designed to help laboratories meet this challenge,” Aronson explains.

As part of the GeneInsight platform, treating clinicians receive alerts when previously identified and recorded genetic variants are updated in a manner that may be clinically meaningful. Historically, GeneInsight has focused on next-generation sequencing and single-nucleotide polymorphism data. “We are now deepening our support for structural and copy number variants,” Aronson says.

Compared with other forms of genetic variation, characterizing copy-number variants presents different challenges, some of which stem even from the manner in which these chromosomal changes are defined and described. “Information technology is critically important to support the expanded clinical use of these data,” he adds.

A major goal of GeneInsight is to facilitate testing processes that go beyond just capturing the state of knowledge at the moment a test is signed out. “There is no reason that clinicians should need to run a new physical test to get an updated interpretation for a previously conducted genetic assay,” Aronson says. “They should be automatically provided with an update whenever possible alerting them that our understanding of their patient’s genetic profile has evolved.”

Copy-number variations, which occur in at least 12% of the human genome, are estimated to account collectively for more genomic diversity than all single-nucleotide polymorphisms combined. The development of array CGH unveiled a new facet of chromosomal biology, and marked a shift in understanding its link to development, health, disease, and evolution. Addressing some of the existing difficulties, including the bioinformatics, computational, and statistical challenges, will be instrumental in enabling this already established and widely used approach to undergo further development and refinement.