Leading the Way in Life Science Technologies

GEN Exclusives

More »

Feature Articles

More »
March 15, 2017 (Vol. 37, No. 6)

Gene Expression’s Big Rethink

If The “One Gene, One Protein, One Function” Idea Was True, We Would Have Genomic Gridlock

  • Comparative Analyses

    Click Image To Enlarge +
    At Mount Sinai’s Icahn School of Medicine, the Ma’ayan Laboratory has developed an open-source bioinformatics pipeline to extract knowledge from typical RNA-Seq studies and generate interactive principal component analysis (PCA) plots. The PCA plot shown here was generated using Gene Expression Omnibus/Sequence Read Archive data, which represents ~55,000 RNA-Seq human samples. Colors reflect the results of text searches on the metadata associated with each sample. [Alexander Lachmann, Ph.D.]

    “Right now, there is some confusion in the field about how to analyze RNA-Seq data,” says Avi Ma’ayan, Ph.D., professor of pharmacology and systems therapeutics at Mount Sinai School of Medicine. “But some order will emerge, and the right way to go about creating that order is to compare the various tools and pipelines for their quality and their ability to recover biological knowledge.”

    To facilitate the integrative analysis of gene-expression signatures extracted from the Gene Expression Omnibus (GEO), a large repository of gene expression data generated and deposited by individual research groups, Dr. Ma’ayan and colleagues recently developed GEN3VA (gene expression and enrichment vector analyzer), a web-based software application that allows the multilevel analysis of microarray profiles.

    “GEN3VA allows investigators to aggregate published studies and to extract and compare gene expression signatures,” explains Dr. Ma’ayan.

    Validating the ability of GEN3VA to uncover novel information, in a case study that proposed to dissect pathway changes that occur during aging, Dr. Ma’ayan and colleagues comparatively examined a collection of gene-expression signatures that included old and young mammalian tissues. This analysis included 244 human, mouse, and rat genomic signatures that originated from 62 tissue and cell types across these three species.

    “We wanted to collect as many signatures as possible, regardless of the tissue and organism, to find the most common genes that are up- and downregulated, and perform enrichment analysis on those common genes to find small molecules that can reverse or mimic the aging signature,” details Dr. Ma’ayan.

    This approach incorporates data collected for the Library of Integrated Network-based Cellular Signatures (LINCS) and has the power to discover new small molecules that can modulate gene expression. It can also assess reproducibility across datasets generated with different platforms, which is a topic that is attracting considerable interest.

    “We identified a conserved set of genes,” informs Dr. Ma’ayan. “Some of these genes have been known for a while, but others are novel. Also, we found that NFkB is a critical transcription factor that can regulate genes and increase their expression in aging.”

    These results raise the possibility of using small molecules to modulate these pathways and potentially attenuate and even reverse aging. “The strength of this analysis,” insists Dr. Ma’ayan, “is that the data was sourced not from a single lab but from multiple labs. Also, the labs used different platforms.”

  • Splice-Sensitive Sequencing

    “Microarrays, RNA-Seq, epigenetic sequencing, and other types of sequencing will be major components of high-throughput analyses,” says Thomas C. Whisenant, Ph.D., research scientist in molecular and experimental medicine at The Scripps Research Institute. Collectively, these tools are ideally positioned to generate multi-omic panels of data related to nucleic acids.

    “The data will then be used as input into a software that will generate a profile to help investigators direct their research toward the most interesting targets based on the output,” adds Dr. Whisenant.

    One of the ongoing challenges in gene-expression analyses is that comparable approaches differ in their accuracy in capturing specific datasets. In a recent study, Dr. Whisenant and colleagues compared data generated on microarrays with data generated using next-generation sequencing to interrogate the same blood-based classifiers.

    “While the end result of the analysis was comparable, the overlap was in the 50–60% range at the end of each analysis,” observes Dr. Whisenant. “That is remarkably discordant for an analysis that uses the same samples that have been treated roughly the same way.”

    One of the potential problems for discordant results across experiments is that assays are variable at the technical level. “The process of acquiring nucleic acids, amplifying the fragments, ligating the adaptors, and completing various other steps before finally getting a readout is so variable, that it is difficult to get a good, repeatable estimate of expression from the same sample,” cautions Dr. Whisenant.

    While the biological relevance of splicing has been increasingly appreciated in recent years, capturing splicing variants by sequencing is still challenging due to several technical considerations. “Looking at splicing is difficult in general,” says Dr. Whisenant. “The amount of material that is needed to get reliable, consistent data is greater.”

    Dr. Whisenant and colleagues recently used RNA-Seq to examine gene expression and splicing changes that occur during T-cell activation. This study sought to identify the genes that are bound to the splicing factor U2AF2 during T-cell activation. Using splicing-sensitive microarrays, the investigators measured the impact on gene expression when some of these proteins were knocked down by means of RNA interference.

    Another topic of interest in sequencing technologies revolves around the need to perform more sensitive types of sequencing, which will generate information not only about populations of cells, but also about groups of cells and individual cells in a population. “This approach,” asserts Dr. Whisenant, “will help investigators resolve single-cell levels of expression, single-cell copy number at the genome level, and compartment-level expression data—for example, expression only in the nucleus or only in the cytoplasm.”

    Given the need to obtain reliable count data for every exon in a gene, the detection of splicing variants requires sequencing at higher depth and is associated with higher costs. Some microarrays contain probes that can hybridize to any of the isoforms in a sample, and their use is poised to decrease costs.

    “But the problem with microarrays is that if the spliceoform of interest is not on the array, one would never detect it,” advises Dr. Whisenant. “This opens a discovery problem.”

Related content