GEPAS

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract The Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete integrated packages of tools for microarray data analysis available on the web. GEPAS constitutes a maintained effort to offer a platform for gene expression data analysis to the scientific community, which has been running since 2001.

During its life, GEPAS has evolved to keep pace with the new interests and trends in the ever changing world of microarray data analysis.

GEPAS has been designed to provide an intuitive although advanced web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-color microarray experiments and other preprocessing options), to the final step of the 'functional profiling' of the experiment (using Gene Ontology, Pathways, PubMed abstracts etc.), which include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management.

GEPAS provides a direct access to different functional profiling and functional annotation facilities implemented via the Babelomics tool (see G6G Abstract Number 20274).

Normalization -- Normalization is the first step that needs to be done when analyzing microarray data. Its goal is to standardize data from several microarrays into a common scale so the comparisons between them are meaningful.

Raw data format - Different scanners yield data in differently formatted files. There is Not a common standard across all manufacturers which means that a particular routine is needed for reading each microarray platform data.

Using GEPAS for normalization you can read Affymetrix CEL files. You can also read several file formats coming from two color microarray platforms such as GPR files from GenePix or standard Agilent raw data files.

Preprocessing -- This module of GEPAS helps you to do basic data transformations of your normalized data. The purpose of this step is to shape your data in a distribution which will be suitable in further steps of the analysis.

Logarithmic transformations, replicates merge and imputation of missing values are some examples of the kind of transformations which can be done.

Raw Data Viewer -- This module of GEPAS is intended for the visualization of raw data. It is a tool complementary to the Normalization (see above) module of GEPAS. Both tools work over the same raw data and have similar input requirements.

Affymetrix raw data plots -

1) Box-plot: displays Box plots for each of the arrays in your raw data. Just perfect match probes are used.

2) PM and MM distributions: displays histograms of the perfect match and miss-match probes for each of the arrays in your raw data.

3) RNA digestion: a line is displayed for each array aiming to detect possible RNA degradation.

4) MA plot: displays MA plots of each of the arrays against a pseudo- median reference computed across all arrays in the raw data.

5) Raw image: represents the raw data mimicking scanner image.

Two color arrays raw data plots -

1) Box-plot: Log-ratios of the red -Cy5- foreground channel over the green -Cy3- foreground channel (i.e. M-values) are displayed in a Box- plot for each array.

2) MA plot: displays MA plots of each of the arrays.

3) Density: displays density plots of each of the foreground channels in the microarray.

4) Printip: print-tip LOESS (aka LOWESS) curves are displayed over an MA plot for each of the print-tip blocks in the array. Can Not be applied to microarray platforms which do Not have a print-tip structure like Agilent, for instance.

5) Foreground vs. Background plot: displays foreground versus background intensities for each array.

Clustering --

Cluster methods -

Hierarchical methods -

SOTA is a divisive method developed by the manufacturer (Dopazo and Carazo, 1997; Herrero et al., 2001), which has recently become popular and has been included in several packages [such as the TMeV (see G6G Abstract Number 20224)].

Single linkage, complete linkage, UPGMA, WPGMA, UPGMC, WPGMC - All of these are sequential, agglomerative, hierarchic, non-overlapping clustering methods.

Non-Hierarchical methods -

SOM and K-means - SOM is Neural Network based on a lattice or network of nodes also called neurons (Kohonen 1997). GEPAS 4.0 includes a new version of SOM that automatically chooses the optimal number of clusters.

Distance functions -

Pearson correlation coefficient; Euclidean (squared); Correlation Coefficient (linear); Correlation Coefficient (offset of 0); Correlation Coefficient (Spearman); and Correlation Coefficient (jackknifed).

Differential gene expression --

T-REX is the new GEPAS' set of tools for analyzing 'differential gene expression'. It implements several modules to study gene expression under different experimental conditions. All tests provided are gathered under four (4) general methodologies:

1) Differential expression between two conditions (two classes’ option in T-REX);

2) Differential expression among more than two conditions (multi class option in T-REX);

3) Differential expression related to a continuous variable (correlation option in T-REX);

4) Differential expression related to a survival time (survival option in T-REX).

Product also offers - Differential expression in Time / Dosage experiments (Time/Dosage series classes’ option).

Supervised Classification --

Prophet is GEPAS' web interface to help in the process of building a "good predictor." A predictor is a mathematical tool that is able to use a data set composed of different classes of objects (here microarrays) and “learn” to distinguish between these classes.

There are different methods that can do that (see below). The most important aspect in this learning is the evaluation of the performance of the classifier. This is usually carried out by means of a procedure called cross-validation.

The manufacturer has implemented several widely accepted strategies so that Prophet can build simple, yet advanced predictors, along with a carefully designed cross-validation of the whole process (in order to avoid the widespread problem of "selection bias").

Classification methods -

1) Diagonal Linear Discriminant Analysis (DLDA);

2) Nearest Neighbor (KNN);

3) Support Vector Machines (SVM);

4) PAM or Shrunken centroids;

5) SOM method.

Variable selection: finding the "important genes" -

Some methods require the previous selection (filtering) of genes for the learning process. Two ways of ranking genes are offered, that can be used in combination with any of the above class-prediction algorithms are: F-ratio or Wilcoxon statistic test.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site GEPAS

Price Contact manufacturer.

G6G Abstract Number 20273

G6G Manufacturer Number 102153