June 1, 2005 (Vol. 25, No. 11)
A Case Study Using GeneDirector LIMS
In microarray gene expression experiments, several diverse conditions in each experiment (e.g., time, doses, replicates) increase the data points from tens of thousands of measurements for a single array to millions for the experiment as a whole.
Further, there is a huge amount of information related to the microarray data itself, i.e., protocols used, biological materials used (cell lines, tissues, RNA), labeling dyes used (Cy3, Cy5), and the information related to manufacturers, materials, and compounds.
As a result, microarray core facilities are often faced with the challenges of (a) data integration and management, (b) cataloging and archiving all the information related with the data, (c) utilizing all the information from the databases, (d) data analysis and knowledge discovery, (e) having a system to leverage the historical information archived with the data, (f) maintaining and quickly retrieving the inherent relationships among diverse data points, and (g) secure collaboration of data.
Therefore, in order to take care of all of the above challenges, it is imperative to have a system that is not only a data and information repository, but also offers sophisticated analytical tools. BioDiscovery’s (El Segundo, CA) GeneDirector is a microarray data-management software solution that offers LIMS and data analysis capabilities.
GeneDirector is compatible with arrays, arrayers, and scanners from many different manufacturers. It is embedded with Oracle RDBMS and possesses management capabilities for data, project, array, sample, and results. In addition, it offers three computational modules: CloneTracker (microarray design and management), ImaGene (microarray image analysis), and GeneSight (microarray data analysis).
Background and Objective of Case Study
A maize microarray dataset was obtained from www.plantgenomics.iastate.edu/ microarray/ data.1 The authors identified differentially expressed genes in epidermal cells or vascular tissues of maize, obtained by LCM technology. The investigators extracted RNA from epidermal cells and vascular tissues collected via LCM and from whole coleoptiles.
RNA from the second round of T7-based RNA amplification was reverse-transcribed and labeled with Cy3 or Cy5, and used to hybridize microarrays containing approximately 8,800 maize cDNAs. Microarray analyses were used to compare global patterns of gene expression between epidermal cells and vascular tissues, between epidermal cells and whole coleoptiles, and between vascular tissues and whole coleoptiles.
Each of the three comparisons was based on four hybridizations involving two independently isolated RNA samples and a dye swap. In addition, each cDNA was duplicate spotted on the microarray.
The objective of this case study is to use GeneDirector LIMS to manage the data generated along the entire maize expression analysis experimental process in the published study. This includes management of samples, protocols, arrays, hybridizations, scanned images, quantification of scanned images, and knowledge discovery from these quantification data.
Maize Microarray Data Management
study project A project “Maize” was created in GeneDirector to archive and manage the entire microarray data for the published study. The project contained eight subfolders, namely, PROBE, PLATE, ARRAY, SAMPLE, HYBRIDIZATION, SCANNED_ARRAY, QUANTIFIED_ ARRAY, and PROCESSED_DATA.
creation of array design An array design “Maize” was created in GeneDirector that contains the array layout for maize data. Clicking on this array design displays the probes printed on the array. This array design is displayed in the “Array Designs” folder of the “My Data” database.
arrays within project Based on the array design, arrays were created and stored in the GeneDirector database corresponding to the study experimental design. There are three sets of individual comparisons, namely, Epidermal vs. Vascular, Epidermal vs. Coleoptile, and Coleoptile vs. Vascular.
Four arrays were created for each pair comparison in which two arrays are two biological replicated flip-dye experiments. Clicking on any array, like the array design, displays the probes present on the array beneath its panel. All the arrays used in the project are displayed in the ARRAY folder in the GeneDirector application.
samples and study protocols Sample annotations were entered into GeneDirector via the sample tracker to store the study sample information. For each array, GeneDirector tracks three biological sample types, BIOMATERIAL, RNA SAMPLE, and LABELED SAMPLE.
All the biological samples used in the project are displayed in the SAMPLE folder in the GeneDirector application. The experimental protocols, including RNA extraction, amplification, and labeling, were associated with their corresponding samples and stored in the database.
array hybridizations Hybridization entities were created for the four different arrays in each of the three sets of experimental pair comparisons. Hybridizations were created using the labeled samples and arrays.
All the hybridizations used in the project are displayed in the HYBRIDIZATION folder in the GeneDirector application. These hybridizations were associated with their protocols and are displayed in the hybridization editor window.
scanned images Scanned arrays were created for every hybridized array by importing their corresponding Cy3 and Cy5 images into the GeneDirector database. All the scanned arrays used in the project are displayed in the SCANNED_ARRAY folder in the GeneDirector application.
image analysis These scanned arrays were quantified with GeneDirector’s image analysis module ImaGene. Using proprietary segmentation algorithms, ImaGene converts the pixel values of every spot on the scanned array into its numeric quantification value along with the appropriate flagging for the unusual spots. The ImaGene module of GeneDirector also offers automated batch analysis of the scanned arrays.
The quantifications of the scanned arrays were stored in the GeneDirector database and displayed in the QUANTIFIED_ARRAY folder. Selecting a quantified array in the relationship viewer helps to view the relationship of this array with all other objects in the database, e.g., the array design and array used.
statistical analysis The quantified arrays stored in the GeneDirector database were launched into GeneSight for knowledge discovery. The data analysis module GeneSight, embedded within GeneDirector, is a bioinformatics software solution that offers exploratory data mining and confirmatory statistical analyses tools to obtain biological insights from the complex and high dimensional microarray experiments.
Data were paired.These data were transformed in the following sequence:
Local background correction for every spot individually
Omit flagged spots (empty, poor, and negative)
Floor low expression values to 20
Log base 2
Lowess normalization
Difference across array channels
Combine within array replicates
An individual student’s t-test was computed for each of three sets of experimental pair comparisons, namely, Vascular and Coleoptile, Epidermal and Vascular, and Epidermal and Coleoptile. The genes found significant at p<0.001 were partitioned (filtered) out to generate reports.
The common genes among the three sets of experimental pair comparisons were determined using the “Intersection” tool of GeneSight.
These analyzed data were stored in the database and displayed in the PROCESSED_DATA folder in the GeneDirector application. These processed data can be re-launched into GeneSight for review and/or further analysis.
Conclusion
GeneDirector offers a microarray data-management system that makes it easy to organize, archive, retrieve, analyze, and query all of the microarray data throughout the entire workflow. This relational database structure provides a method to create a scalable and flexible storage and retrieval system.
The GeneDirector database also allows sharing of data among different users, labs, or departments by providing security and access controls. In addition to being a data and data-related information repository, it is a one-stop solution for microarray design, image analysis, and data analysis.
With the unprecedented growth of microarray data and other related information that are distributed, complex, and heterogeneous, GeneDirector offers flexibility for data import into its repository module and quantification, data exploratory, and statistical confirmatory tools in its computational modules for turning these data into knowledge.