Gene Expression Omnibus

Category Genomics>Gene Expression Databases/Tools

Abstract The Gene Expression Omnibus (GEO) is a gene expression/molecular abundance public repository that archives and freely distributes high throughput gene expression data submitted by the scientific community. GEO currently stores approximately a billion individual gene expression measurements, derived from over 100 organisms, addressing a wide range of biological issues. These huge volumes of data may be effectively explored, queried, and visualized using Web-based tools.

GEO Architecture

Submitters supply their gene expression data in four (4) sections - 1) Platform - describes the list of features on the array (e.g., cDNAs, oligonucleotides, etc.); 2) Sample - describes the biological material and the experimental conditions under which the sample was handled, and the abundance measurement of each feature derived from it; 3) Series - defines a set of related Samples that are considered to be part of an experiment; 4) Supplementary data - original microarray scan images or raw quantification data.

Sample data are assembled into biologically meaningful and comparable GEO DataSets. DataSet records provide a coherent synopsis about an experiment and form the basis of GEOs data display and analysis tools.

GEO Submissions

An infrastructure is provided in which submitters can supply MIAME (Minimum Information About a Microarray Experiment)-compliant data. There are four (4) ways in which data may be deposited with GEO - 1) Web deposit - simple, step-by-step, interactive Web forms; 2) Spreadsheets - Excel spreadsheet templates for batch deposit; 3) SOFT (Simple Omnibus Format in Text) - a plain text, line-based format designed for rapid batch submission; 4) MINiML (MIAME Notation in Markup Language) - an XML format designed for rapid batch submission.

Data Mining

The data in GEO can be queried using two (2) NCBI Entrez databases - 1) Entrez GEO-DataSets - provides an experiment-centric view of the data in GEO. Experiments of interest may be located by searching for attributes such as free text keywords, technology type, author, organism, and experimental variable information. Once a relevant DataSet is identified, that experiment can be further explored for gene expression profiles of interest using the supplementary tools provided on the GEO DataSet record.

Tools available on the GEO DataSet record - a) Cluster heat maps - A selection of hierarchical and K-means clustering algorithms are provided. Clusters of interest can be selected, enlarged, downloaded, plotted as line charts, or linked directly to Entrez GEO-Profiles; b) Query subset A vs. B - This tool assists in the identification of genes that display marked differences in expression level between two specified sets of Samples within a DataSet, as calculated using t-tests or fold difference. Genes that meet the user-defined criteria are presented in Entrez GEO-Profiles; c) Subset effects - This feature retrieves all profiles that are flagged as having significant effects with respect to a specific experimental variable, for example 'age' or 'strain'.

2) Entrez GEO-Profiles - provides a gene-centric view of the data in GEO. Gene expression profiles of interest may be located by searching for attributes such as gene name, GenBank accession number, SAGE tag, GEO accession number, description, or profiles flagged as having significant effects with regards to specific experimental variables.

Tools available within Entrez GEO-Profiles results page - a) Profile neighbors - returns a list of genes that show a similar expression pattern within a given DataSet; b) Sequence neighbors - retrieves profiles related by nucleotide sequence similarity by BLAST; c) Homolog neighbors - retrieves profiles of genes belonging to the same HomoloGene group; d) Links - Links to other NCBI Entrez databases including GenBank, PubMed, Gene, UniGene, OMIM, Homologene, Taxonomy, SAGEMap, and MapViewer.

Note: The data in GEO can also be queried outside of the Entrez databases with GEO BLAST - The GEO BLAST interface allows you to search for GEO-Profiles of interest based on nucleotide sequence similarity. Additionally, all standard BLAST results display 'E' icons that link directly to GEO-Profiles expression data. And more.

