Abstract geWorkbench (genomics Workbench) is a Java-based open-source platform for 'integrated genomics'.

Using a component architecture it allows individually developed plug- ins to be configured into complex bioinformatic applications. At present there are more than 50 available plug-ins supporting the visualization and analysis of gene expression and sequence data.

Example use cases include:

1) Loading data from local or remote data sources.

2) Visualizing gene expression, molecular interaction networks, protein sequence and protein structure data in a variety of ways.

3) Providing access to client- and server-side computational analysis tools such as t-test analysis, hierarchical clustering, self organizing maps, regulatory networks reconstruction, BLAST searches, pattern/motif discovery, etc.

4) Validating computational hypothesis through the integration of gene and pathway annotation information from curated sources as well as through Gene Ontology (GO) enrichment analysis.

Plug-ins -- The geWorkbench platform employs a component repository infrastructure to manage a large collection of pluggable components that can be used to customize the application's graphical user interface. This (ever growing) list of plug-in components covers a wide range of functionality for a number of different genomic data modalities.

Microarray Visualization (Plug-ins) --

1) Color Mosaic - Heat maps for microarray expression data, organized by phenotypic or gene groupings.

2) Dendrogram - Tree-structured diagrams reflecting the results of hierarchical clustering analysis.

3) Expression Profiles - Line graph of genes expression profiles across several arrays/ hybridizations.

4) Expression Value Distribution - Distribution plot of marker expression values across one or more microarrays.

5) Microarray Viewer - Color-gradient representation of gene expression values.

6) Scatter Plot - Pairwise (array vs. array and marker vs. marker) comparison and plotting of expression values.

7) SOM Clusters Viewer - Visualization of gene clusters produced by the self-organizing maps (SOM) analysis.

8) Tabular Microarray Viewer - Spreadsheet view of all expression measurement in an experiment, one row per individual marker/probe and one column per microarray.

Data Management (Plug-ins) --

1) Marker Component - Definition of data views consisting of marker subgroups. The views control the amount of data displayed.

2) Phenotype/Array Component - Definition of data views consisting of microarray subgroups. The views control the amount of data displayed.

Normalizers (Plug-ins) --

1) Array-Based Centering - Subtraction of the mean or median measurement of a microarray from every measurement in that microarray.

2) Marker-Based Centering - Subtraction of the mean or median measurement of a marker profile from every measurement in the profile.

3) Mean-Variance Normalizer - Transformation of expression measurements to standard units: for every marker, the mean measurement of the marker profile (across all microarrays in an experiment) is subtracted from each measurement in the profile and the resulting value is divided by the standard deviation of the profile.

4) Missing Value Calculation - Replacement of missing values with consensus values.

5) Threshold Normalizer - Adjustment of values that fall outside a user- specified threshold.

6) Quantile - Expression measurements in each microarray are adjusted so that the distribution of values is the same across all microarrays in an experiment.

7) Housekeeping - Normalization of all measurements in a microarray through division by the average expression value of a (user defined) set of housekeeping genes.

Filters (Plug-ins) --

1) Affy Detection Call - (Affymetrix data only) Filtering of measurements based on the value of their "detection call" attribute.

2) Deviation - Filtering of markers with a low dynamic range.

3) Expression - Threshold Elimination of measurements that fall outside a range of expression values.

4) 2-channel Threshold - (Genepix data only) Same as "Expression Threshold" filter but different threshold ranges can be specified for each channel.

5) Genepix Flag Filter - (Genepix data only) Filtering of measurements based on the value of their "Flags" attribute.

Annotation (Plug-ins) --

1) Dataset History - Log of data transformations induced by data- modifying operations.

2) Dataset Annotation - Free text format box used to annotate data, images and results. Such annotations persist application invocations and can be used as an online lab notebook.

3) Experiment Information - Microarray machine parameters used in an experiment run. If available, high-level experiment information (e.g., purpose of experiment) is also displayed.

4) Marker Annotations - Retrieval of gene and pathway information for markers on a microarray.

5) caBIO Pathway Listing - Visualization of BioCarta pathway diagrams.

6) Gene Ontology - Enrichment analysis of selected groups of genes against Gene Ontology ( annotations.

Network Generation (Plug-ins) --

1) ARACNE Reverse Engineering - Analysis of large amounts of microarray data (typically 100-500 microarrays) to reverse engineer underlying gene regulatory networks.

2) Cytoscape - Visualization of gene regulatory networks created in Reverse Engineering using Cytoscape (see G6G Abstract Number 20092).

Analysis (Plug-ins) --

1) Hierarchical Clustering - Clustering of markers and microarrays into hierarchical binary trees. The resulting structures can be visualized in the Dendrogram plug-in.

2) Self Organizing Map (SOM) - Clustering of markers using self organizing maps. The resulting clusters can be visualized in the SOM Clusters Viewer plug-in.

3) T Test - Identification of markers with statistically significant differential expression between sets of microarrays. T-testing is used for the determination of significance.

Sequence Analysis & Visualization (Plug-ins) --

1) Sequence Alignment - Server-based versions of BLAST and Smith- Waterman alignment.

2) Synteny - Comparison of sequence similarity between two (2) genomic regions. The comparison results are represented as a 'dot matrix' augmented with detailed annotation for both regions.

3) Promoter Analysis - Identification of putative transcription factor binding sites in DNA sequences. The analysis uses the profiles in the ‘JASPAR Transcription Factor Binding Profile Database’.

4) Pattern Discovery - Discovery of sequence motifs in sets of DNA and protein sequences.

5) Position Histogram - Visualization of results from the Pattern Discovery plug-in. Motif/pattern support is plotted against the relative sequence position of the motif match.

6) Sequence Panel - Visualization of results from the Pattern Discovery plug-in, displaying the motif match location over each sequence from the input data set.

caBIG™ Molecular Analysis Tools Knowledge Center

geWorkbench is supported through NCI's Molecular Analysis Tools Knowledge Center -- the caBIG™ Molecular Analysis Tools Knowledge Center is an NCI-supported Center operated by the Columbia University Herbert Irving Comprehensive Cancer Center and The Broad Institute of MIT and Harvard.

