G6G Directory of Omics and Intelligent Software - MIT Broad Institute GenePattern

GenePattern Gene Expression Analysis Module

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract GenePattern combines an advanced scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic data. GenePattern provides support for four (4) broad categories of gene expression analysis:

1) Differential Analysis/Marker Selection;
2) Class Prediction (Supervised Learning);
3) Class Discovery (Unsupervised Learning); and
4) Pathway Analysis.

GenePattern also supports several data conversion tasks (see G6G Abstract Number 20183), such as filtering and normalizing, which are standard prerequisites for genomic data analysis.

1) Differential Analysis/Marker Selection -- Differential analysis, also known as 'marker selection', is the search for genes that are differentially expressed in distinct phenotypes. GenePattern can assess differential expression using either the signal-to-noise ratio or t-test statistic. GenePattern provides the following support for differential analysis:

a) Comparative Marker Selection - ranks the genes based on the value of the statistic being used to assess differential expression and uses permutation testing to compute the significance (nominal p-value) of the rank assigned to each gene.

Due to the number of genes tested against the null hypothesis of No differential expression, many genes are likely to have significant p- values by chance alone. The analysis adjusts for multiple hypotheses testing using a number of statistical approaches, including false discovery rate (FDR) and family-wise error rate (FWER). You can control the ranking based on the statistic most appropriate for your data.

b) Class Neighbors - helps you identify genes whose expression pattern is strongly correlated with a phenotype. This analysis, developed by scientists at the Broad Institute, “defines an ‘idealized expression pattern’ corresponding to a gene that is uniformly high in one class and uniformly low in the other. [It] tests whether there is an unusually high density of genes ‘nearby’ (that is, similar to) this idealized pattern, as compared to equivalent random patterns.” [Golub T.R., Slonim D.K., et al. “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, 531-537 (1999).

c) Heat Map Viewer - shows you differential expression by displaying gene expression values in a heat map format. Each colored cell in the heat map represents the gene expression value for a probe in a sample. The largest gene expression values are displayed in red (hot), the smallest values in blue (cool), and intermediate values in shades of red (pink) or blue.

2) Class Prediction (Supervised Learning) -- Supervised learning, also known as class prediction, is the search for a gene expression signature that predicts class (phenotype) membership. The basic methodology for class prediction is to start with two (2) data sets, a training set and test set; use your training data set to build a classifier (class predictor) based on your chosen classification method; and use your test data set to test the classifier. GenePattern provides the following support for class prediction:

a) GenePattern supports class prediction based on several classification methods, including classification and regression trees (CART), K-nearest neighbors (KNN), probabilistic neural network (PNN), Weighted Voting, and Support Vector Machines (SVM). Most of the class prediction methods supported by GenePattern have been used in research published by scientists at the Broad Institute.

b) For each classification method, GenePattern also supports class prediction based on leave-one-out cross-validation. For small data sets, rather than creating training and test data sets, cross-validation divides a data set into n folds. For each fold, the analysis trains on n-1 folds and tests on the remaining fold. After iteratively training and testing all folds, the analysis combines the results to determine the classifier.

c) GenePattern provides a tool for splitting a single data set into non- overlapping training and test data sets.

3) Class Discovery (Unsupervised Learning) -- Unsupervised learning, also known as class discovery, is the search for a biologically relevant unknown taxonomy identified by a gene expression signature or a biologically relevant set of co-expressed genes.

The basic methodology for class discovery is clustering: you cluster the data based on your chosen clustering method and then validate the clusters through gene annotations, enrichment analysis (are the clusters enriched by genes from functionally important categories, pathways, or processes), or by replicating the results in other data sets. GenePattern provides the following support for clustering:

a) GenePattern supports several traditional clustering methods, including consensus clustering, hierarchical clustering, and self- organizing maps (SOM clustering).

b) For validating clusters, GenePattern provides tools for retrieving annotations and for splitting a single data set into non-overlapping training and test data sets.

Clustering is the traditional method for class discovery. GenePattern also supports the following less traditional methods:

c) Non-negative matrix factorization (NMF) is an algorithm used in various fields, such as text mining and music analysis, to decompose multivariate data.

d) Principal components analysis (PCA) is a statistical technique used in various fields, such as face recognition and image compression, to determine the key variables in a multi-dimensional data set that can explain the differences in observations.

4) Pathway Analysis -- Pathway analysis is the search for sets of genes differentially expressed in distinct phenotypes. GenePattern provides the following support for pathway analysis:

a) KSscore computes a Kolmogorov-Smirnov non-parametric rank statistic representing the positional distribution of a set of genes within an ordered list of genes. You can use this analysis to examine the enrichment of a set of genes at the top of an ordered list; the KSscore is high when the genes in the gene set appear near the top of the ordered list.

b) Gene Set Enrichment Analysis (GSEA) determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes). The GSEA software packages the method, making it easy to run the analysis and review the results. GSEA will soon be available as a GenePattern module.

c) In addition, GenePattern provides tools for retrieving annotations, which aid in understanding gene sets and gene set enrichment results.

System Requirements

See Release Notes.

Manufacturer

Broad Institute of MIT and Harvard
301 Binney Street
Cambridge, MA 02142
Ph: 617-714-7000
Fax: 617-714-8102
or
7 Cambridge Center
Cambridge, MA 02142
Ph: 617-714-7000
Fax: 617-714-8102
or
320 Charles Street
Cambridge, MA 02142
Ph: 617-714-7000
Fax: 617-714-8102
Company directory
E-mail: gp-help@broad.mit.edu

Manufacturer Web Site GenePattern Gene Expression Analysis Module

Price Contact manufacturer.

G6G Abstract Number 20181

G6G Manufacturer Number 101795

The G6G Directory of Omics and Intelligent Software

GenePattern Gene Expression Analysis Module