FDA/NCTR ArrayTrack

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract ArrayTrack is an integrated suite designed for the management, analysis and interpretation of microarray experiment data. There are three (3) integrated components: a) a Minimum Information About A Microarray Experiment (MIAME) supportive database that stores and annotates essential information for an experiment; b) analysis tools with an intuitive user interface providing the ability to search, filter, apply statistical operations and graphically visualize data; and c) a number of libraries that provide gene annotation, protein function and pathway information that are directly hyperlinked within the data analysis process. ArrayTrack is publicly available. Product features/capabilities include:

MIAME supportive Database --

1) Accepts microarray data from various platforms and/or scanners. The upload function has been tested for six (6) platforms (Affymetrix, Agilent, GE Health Care, Applied Biosystems, Illumina and customized 2-color array).

2) Accommodates many toxicological parameters, such as dose, chemicals, treatment schedule, sacrifice time, etc.

3) Accepts Affymetrics (Affy) data in the cell intensity (CEL) file format. Converts CEL file to probe set [Robust Multichip Averaging (RMA), DChip, Expresso, Plier, Plier + 16] file. You will need to install the R server from BioConductor to use this function.

4) A high throughput data uploading (batch import) function is implemented, which uploads entire datasets of an experiment in a single procedure.

5) A comprehensive reporting system was developed to provide a summary of variables associated with an experiment.

6) The data security function allows data sharing both easily and safely.

7) The data export function allows the user to export an original data file, image file, CEL file and also export multiple datasets in one spreadsheet.

Analysis Tools --

1) Seven (7) normalization methods (MAS5, RMA, dChip and Plier) are available for the Affymetrix CEL file. In addition, traditional normalization methods (LOWESS, Linear LOWESS, Total Intensity Normalization, Mean/Median Scaling and GenePix Mean Log Ratio Normalization, Quantile and Reference Average Comparison Normalization) are also implemented for both one- and two-channel microarray data.

2) “T-Test” calculates p-values for each gene on the chip. This function contains the standard and Welch t-test as well as the permutation t- test for one and two-class samples

3) “ANOVA”, different from t-test, allows statistical testing on multiple groups and/or variables. Currently, only one-way ANOVA is available. High-dimension ANalysis Of VAriance (ANOVA) will be available soon.

4) After obtaining p-values using t-test/ANOVA, ArrayTrack provides several methods to select a list of significant genes for further analysis or biological interpretation:

5) “Hierarchical Clustering Analysis” (HCA) is an unsupervised clustering approach to group samples based on the similarity of gene expression patterns. The gene name in the HCA is linked to the Gene library. The image of the HCA or a sub cluster can be saved.

6) “Principal Component Analysis” (PCA) is another unsupervised learning method to investigate sample clustering based on gene expression profiles.

7) “Correlation Matrix” computes the correlation coefficients of different arrays and displays the matrix visually. The result of the R value can be exported.

8) Both “ScatterPlot” and “Mixed ScatterPlot” provide pair-wise scatter plot functions. The “ScatterPlot” is a function that is specifically applied to two-color array data by plotting cy3 intensity vs. cy5 intensity. “Mixed ScatterPlot” is a general pair-wise plotting function that allows plotting of any one measure (intensity or ratio) against another similar measure in the same experiment.

9) “MA Plot” is another two-color array specific function, where the log intensity ratio M = log2(Cy5/Cy3) is plotted against the mean log intensity A = 0.5log2(Cy3xCy5). This function might provide better visual inspection of the concordance and quality of the two-color chip expression data than the scatter plot.

10) “Virtual Array Viewer” displays expression data in the format of the original array image. This function reconstructs the original array image based on either the raw or normalized expression data and provides a visual representation of data for further exploration, analysis and interpretation. The function is applied to both one channel and two- channel data, including Affy data.

11) “Rank Intensity Plot” sorts intensities of genes in a descending order along the y-axis, and each gene is given an ordinal number along the x-axis to reflect its relative position on a chip. The shape of the curves characterizes the general properties of the expression data and provides a general assessment of the quality of data. This function is particular useful to examine the quality of two-color array data. For example, if the green curve represents the cy3-labeled samples while the red curve represents the cy5-labeled samples, a well-balanced two-channel microarray data should show a superimposed or parallel distribution of the green and red lines, and the crossover of the green and red lines indicate an unbalanced bias between the two channels.

12) “BarChart” allows comparison of the expression level of a gene across the array data within a single experiment or across multiple experiments and/or platforms.

13) “VennDiagram” displays the overlapping among 2~3 gene lists. The user can draw the diagram by common ID (gene ID, Locus ID, Spot ID, etc.), common pathway, or common Gene Ontology (GO).

14) “Quality Control” enables the evaluation of the overall quality of two- color array GenePix data using visual inspection, statistical metrics and experiment annotation.

15) “Quality Filtering” provides a means to examine the quality of each spot in two-color array GenePix data.

Number of Libraries --

1) “IDConverter” allows conversion between about ten (10) different IDs used by various public databases, including GenBank, LocusLink, UniGene, IMAGE, and etc.

2) “Gene Library” and “Protein Library” contain the functional information about genes, and proteins for facilitating microarray data interpretation. All of these data are derived from public databases, including LocusLink, GenBank, UniGene, SWISS-PROT, KEGG, etc. Users can quickly identify the functional information for a set of significant genes derived from analysis by searching these libraries as well as other similar libraries included in this category.

3) “Pathway Library” provides a collection of pathways from KEGG and PathArt. Using this library, users can identify a list of statistically significant pathways (Fisher Exact Test) based on a list of genes, proteins or metabolites. This library is useful for genomics, proteomics and metabonomics/metabolomics research.

3) “PathArt” (see G6G Abstract Number 20060) is commercial software that provides manually curated pathways (mainly regulatory and disease pathways) for the interpretation of microarray results. ArrayTrack is integrated with PathArt. You will need to purchase the PathArt license separately from its manufacturer to use this function in ArrayTrack.

4) "KEGG" mainly contains metabolic pathways. ArrayTrack is integrated with KEGG. Although KEGG is a public pathway package, commercial users need to contact the manufacturer to acquire a KEGG license for accessing this function through ArrayTrack.

5) “IPA” stands for Ingenuity Pathways Analysis (see G6G Abstract Number 20017U). Ingenuity delivers systems biology expertise to biologists and bioinformaticians through pathways analysis software, genome-scale computable network databases and knowledge management services and infrastructure. The user needs to get a license from Ingenuity to log into IPA through ArrayTrack.

6) “GOFFA” stands for Gene Ontology (GO) For Functional Analysis. Comprehensive tools are available in “GOFFA” to analyze microarray results using GO resources. For example, it is straightforward in GOFFA to determine the statistically significant GO terms corresponding to a list of genes derived from a microarray experiment using the Fisher Exact Test. The GOFFA in ArrayTrack provides GO path plot, pruned GO tree plot, all gene list and term clustering categorized by molecular function, biological process and cellular component.

7) “IPI (International Protein Index) Library” is downloaded from the European Bioinformatics Institute (EBI) website. This is a non- redundant protein database that is particular useful for proteomics research.

8) “Orthologene Library” contains data from the National Center for Biotechnology Information (NCBI) Homologene database by augmenting with other functional information from. This is a resource particularly useful for cross-species research based on gene homology.

9) “Chip Library” contains all the microarray chips that are used to generate the data stored in the ArrayTrack database. The chips are organized according to species, manufacture and platform. The manufacture-provided information for each chip is also available, including sequence information, if provided.

10) Both “Toxicant Library” and “EDKB Library” contain chemical structure together with toxicological endpoints. The chemicals can be directly mapped to various metabolic pathways. These libraries are useful for integrating traditional toxicology data with genomics data. Since chemicals with similar structures are likely to exhibit similar biological (or toxicological) activities, NCTR/FDA is also implementing an algorithm for assessing structure similarity of chemicals and exploring structure-toxicity relationship based on the substructure features and physicochemical properties derived from the structure. The “Toxicant Library” has been initially populated with data from the Carcinogenicity Potency Database and the “Endocrine Disruptor Knowledge Base (EDKB) Library” that contains data associated with endocrine disruptors.

11) All the libraries in ArrayTrack are interlinked.

System Requirements

ArrayTrack uses machine-independent technology. Most users access ArrayTrack through a web browser. Information on the system requirements for the locally installed version is available here .

ArrayTrack is only available on CD. Before requesting a CD please ensure you have the appropriate Oracle license for your institute to run the software. You can request a CD by contacting Dr. Weida Tong at (870) 543-7142.

You must have an Oracle license to run ArrayTrack locally.

1. An ORACLE 9i (or above) server is recommended for the current version of ArrayTrack.

2. ArrayTrack is a client-server application: a Java client as the application front-end and an ORACLE database as the data repository. You need to setup the server first and then install the client from our website (the detail instruction for installation will be available along with the CDs).

3. System recommendations:

4. We strongly recommend that you work with an ORACLE database administrator and a system administrator during installation of the server. That is also beneficial on the long run to ensure the safety of the data and the proper use of the software. If you encounter any difficulty during installation or on maintaining the software, please contact us.

Manufacturer

Manufacturer Web Site FDA/National Center for Toxicological Research (NCTR)

Price Free

G6G Abstract Number 20089

G6G Manufacturer Number 101034