WebGestalt
Category Cross-Omics>Knowledge Bases/Databases/Tools
Abstract WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) is an integrated data mining system for the management, information retrieval, organization, visualization and statistical analysis of large sets of genes.
WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of large sets of genes.
It enables biologists to manipulate integrated information and find patterns that are Not detectable otherwise.
WebGestalt is designed for functional genomic, proteomic and large scale genetic studies from which high-throughput data are continuously produced. It currently works for human and mouse studies.
WebGestalt features/capabilities include:
Database - GeneKeyDB -- WebGestalt is based on an ORACLE relational database, GeneKeyDB.
This database offers a strong gene and protein centric viewpoint. Gene and gene product information is primarily taken from NCBI LocusLink, Ensembl, Swiss-Prot, HomoloGene, Unigene, CGAP, UCSC, GO Consortium, KEGG, BioCarta and Affymetrix.
WebGestalt modules -- WebGestalt is composed of four (4) modules: gene set management, information retrieval, organization/visualization and statistics.
1) The gene set management module receives gene sets submitted by the users. Received gene sets can be saved, retrieved and deleted. Boolean operations are also provided by this module to generate the unions, intersections or differences between gene sets.
2) The information retrieval module currently retrieves information for up to 20 attributes through the manufacturer’s local database GeneKeyDB for the received gene sets.
3) The organization/visualization module helps the users to explore efficiently the retrieved information in various biological contexts, using eight (8) sub-modules:
- a) Gene Ontology (GO) Tree,
- b) KEGG Table and Maps,
- c) BioCarta Table and Maps,
- d) Protein Domain Table,
- e) Tissue Expression Bar Chart,
- f) Chromosome Distribution Chart,
- g) PubMed Table and
- h) GRIF Table.
Subsets of genes based on the organization can be generated and saved as new gene sets.
4) The statistics module (see below).
1) Gene set management module – This module accepts gene sets submitted by files, by GO categories or by chromosome location ranges. The input file should be a plain text file, including the appropriate IDs (required) and corresponding microarray ratios or other values (optional), separated by tabs in the format of one ID per row.
Gene identifiers that can be recognized are Entrez Gene IDs, Swiss-Prot IDs, Ensembl IDs, Unigene IDs, gene symbols and Affymetrix probe set IDs.
Sub-sets of genes can be generated from an existing gene set through the organization/visualization module and saved as new gene sets through the management module.
The management module also performs Boolean operations to generate the union, intersection and difference between two (2) existing gene sets.
Recursively applying these Boolean operations makes it possible to combine information from more than two sets of genes.
Orthologs can be retrieved for a gene set using the management module. The orthologs are defined by HomoloGene from NCBI. Inclusion of orthologous information could assist in comparative genomics studies.
2) Information retrieval module - This module provides rapid access to the existing information for all genes in a gene set. The attributes that can be retrieved include nomenclature, identifiers to different databases, map and functional information.
Retrieved information for all genes in a gene set can be downloaded as a tab-delimited file or opened directly in the web browser using Microsoft Excel.
3) Organization/visualization module - This module in WebGestalt is intended to assist biologists in exploring large gene sets by organizing and visualizing the genes in various biological contexts.
- a) GO Tree - The GO Tree is based on the manufacturer's published tool GO Tree Machine. The GO Tree organizes a gene set based on the GO DAG (Directed Acyclic Graph), and has implemented several visualizations, including an expandable tree, a bar chart at selected annotation level and an enriched DAG.
- The enriched DAG is used for visualizing GO categories with enriched gene numbers as identified by the statistics module.
- b) KEGG and BioCarta Tables and Maps - The KEGG Table shows KEGG pathways associated with the gene set, the number of genes in each pathway and the Entrez Gene IDs for the genes.
- The KEGG table also provides P-values, indicating the significance of enrichment for each KEGG pathway. Each pathway name in the KEGG Table is hyperlinked to the KEGG Map, in which genes in the gene set are highlighted in red.
- WebGestalt can also organize genes based on another popular pathway database, BioCarta (see G6G Abstract Number 20264) into a BioCarta Table.
- The BioCarta Table has the same structure as the KEGG Table. Each pathway name in the BioCarta Table is hyperlinked to the BioCarta Map.
- c) Protein Domain Table - The Protein Domain Table organizes the genes based on the PFAM protein domains. The table shows the name of the PFAM domains associated with the gene set, the number of genes having each domain and the Entrez Gene IDs for the genes.
- The table also provides P-values, indicating the significance of enrichment for each domain.
- Each domain name is hyperlinked to the Conserved Domain Database of the NCBI, where the information of domain functions, structure and sequence is available.
- Each Entrez Gene ID is hyperlinked to the Conserved Domain Summary of the NCBI, where a graphical view of domains on the protein is available.
- d) Tissue Expression Bar Chart – This chart is designed to organize a gene set based on large-scale, publicly available gene expression data derived from a wild variety of tissue and organ types. WebGestalt uses the gene expression data from the CGAP-expressed sequence tag (EST) project.
- In the Tissue Expression Bar Chart, each tissue is represented by a bar. The height of the bars represents the number of genes that are in the active gene set, and also expressed in the tissue based on the CGAP data.
- For individual genes, WebGestalt evaluates the over/under- representation of the gene in individual tissue types using the statistics module.
- e) Chromosome Distribution Chart - Chromosome distribution of the genes in a gene set is visualized using the Chromosome Distribution Chart. The chromosome location information comes from the UCSC genome annotation databases (see G6G Abstract Number 20197).
- In this chart, each chromosome is represented by a vertical bar. Each gene is represented by a ‘red cross’ symbol and located on the chromosome based on its location. Clustered genes from a gene set can be easily visualized in the chart.
- f) PubMed Table and GRIF Table - WebGestalt can organize genes according to their co-occurrence in publications, based on the gene- publication association information retrieved from the LocusLink database.
- LocusLink provides two types of gene-publication indices. One is computed from the PubMed, the other is GRIF (Gene References Into Function). WebGestalt organizes genes based on both indices and generates a PubMed Table or a GRIF Table.
- The PubMed Table shows PubMed IDs for the publications associated with the gene set, the number of genes in each publication and the Entrez Gene IDs for the genes.
- Each PubMed ID is hyperlinked to the corresponding PubMed record, where the abstract for the paper is available. The GRIF Table is similar to the PubMed Table, except for one additional column showing the GRIF comments.
4) Statistics module - The statistics module currently provides two statistical tests (the hypergeometric test and the Fisher's exact test) to identify interesting patterns in the gene sets.
The users can select different significance levels for the statistical analysis. The users can also specify the minimum number of genes in a significant category.
The hypergeometric test can be used for the evaluation of the over/under-representation of individual genes in a selected tissue type.
Note: WebGestalt has been implemented in WebQTL (see G6G Abstract Number 20326), which is a unique service that allows biologists to rapidly identify and map genes and Quantitative Trait Loci (QTL).
The WebGestalt modules are used to analyze sets of genes that are highly correlated with various phenotypes in WebQTL.
System Requirements
Contact manufacturer.
Manufacturer
- WebGestalt is developed and maintained by members of the
- Department of Biomedical Informatics and the
- Department of Biostatistics of
- Vanderbilt University Medical Center.
- This tool is brought to you by the
- Bioinformatics Resource Center at
- Vanderbilt University.
Manufacturer Web Site WebGestalt
Price Contact manufacturer.
G6G Abstract Number 20327
G6G Manufacturer Number 102881