Gaggle
Category Cross-Omics>Knowledge Bases/Databases/Tools
Abstract The Gaggle is a framework for exchanging data between independently developed software tools and databases to enable interactive exploration of 'systems biology' data.
Guided by the classic software engineering strategy of separation of concerns and a policy of semantic flexibility, it combines existing popular programs and web resources into a user-friendly, rich, and easily extended environment in which to do ‘systems biology’.
Note: The practice of systems biology depends upon many software tools, operating on many kinds of data from many different sources. Each of these tools typically excels at one (or a few) types of analysis with one (or a few) types of data.
A crucial challenge, therefore, is to combine the capabilities of these and other, forthcoming tools to create a data exploration and analysis environment which can do justice to the variety and complexity of systems biology. Gaggle solves this problem.
Gaggle currently supports a number of geese -- the manufacturers name for any 'open source software' which is adapted to run in the gaggle. This adaptation is generally only a small amount of programming work.
Once gaggled the program can broadcast and receive any of a small number of data types which together constitute an adequate basis for exploratory analysis in systems biology. These data types include:
1) Name list (i.e., these genes are interesting).
2) Name list combined with a condition list (i.e., these genes are interesting in these conditions).
3) HashMap: a collection of name/value pairs.
4) Matrix: rows and columns, each named, containing numerical data.
5) Network: a collection of nodes and edges, with arbitrary hashmaps associated with each.
Gaggle Boss -- The Gaggle Boss is an indispensable part of the Gaggle. Most geese will automatically launch the Boss (in a minimized state) if it is Not already running.
Once started, the Boss often retreats into the background, providing the channel over which the geese communicate, but little used by the user.
The following is a (partial) list of existing 'geese' with some of their features/capabilities:
1) The Annotation Goose -- displays short bits of descriptive text indexed by an identifier, such as Open Reading Frame (ORF), gene, or protein name. Features include keyword search and broadcasting lists of identifiers to and from the Gaggle.
2) Cytoscape Goose -- Some features of Cytoscape:
- a) Nodes and edges may have any kind, and any number, of data attributes attached.
- b) These data attributes are used to control the visual attributes -- so that, for instance, a gene with high expression ratio and low p-value may appear dark red and proportionately large.
- c) Edge confidence scores and node association types (i.e., protein- DNA, protein-protein) are often used to display different widths, styles, and colors of edges.
Networks and name lists are most commonly broadcast to and from Cytoscape (see G6G Abstract Number 20092) from other geese in a Gaggle.
3) The DMV: Data Matrix Viewer -- This is an Institute for Systems Biology (ISB) goose, with a few useful features:
- a) Read and explore data matrix files in a simple tab-delimited file format or a data repository.
- b) When microarray data are accompanied by a simple ISB-standard metadata file, the DMV displays the experimental conditions for all available data, allowing the biologist to select some or all data to explore.
- c) Once loaded, you get a spreadsheet-like view of the data.
- d) Other selections, or other files of data, will be loaded into new tabs.
- e) X-Y plots of any selected rows may be plotted, and will appear in yet another new tab.
- f) Movies may be run, in which gene/expression values are broadcast to a selected goose, one column at a time. Most commonly, the target goose is a Cytoscape network view of the genes in the microarray experiment; the Cytoscape goose is configured to interpret those expression values, and accompanying statistical measures (when available) to color and size the nodes in the network.
- g) Numerical matrices and gene lists, in addition to the movie data, can be broadcast to or from the DMV.
4) Firegoose -- Firefox toolbar for the Gaggle - The Firegoose toolbar connects the Gaggle to the web. By downloading and installing this extension into your Firefox browser you can broadcast data between the Gaggle and web resources.
Supported web sites include KEGG pathways, EMBL STRING (a database of functional associations), DAVID (which enables clustering by functional annotations), and Entrez Gene and Protein. With a little scripting, the Firegoose can potentially exchange data with practically any bioinformatics website.
5) Genome Browser -- The genome browser is a way of visualizing data plotted against coordinates on the genome. Tiling arrays and ChIP-chip data are a couple use cases. It's still a work in progress...
6) MeV Goose -- MultiExperiment Viewer (MeV) is a versatile microarray data analysis tool, incorporating sophisticated algorithms for clustering, visualization, classification, statistical analysis and biological theme discovery.
The MeV goose is most commonly used in Gaggle as follows:
- a) Broadcast a microarray data set from the DMV to R.
- b) Normalize the matrix in R.
- c) Broadcast the matrix from R to MeV
- d) Apply any of the (many!) analyses MeV offers.
- e) Broadcast selected gene names back to the Gaggle. For instance, cluster genes to a Cytoscape network, or to select rows in the DMV.
7) The R Goose -- The R Goose allows you to use R -- a language and environment for statistical computing and graphics -- for data exploration in the Gaggle. R is especially useful with microarray, massively parallel signature sequencing (MPSS) and proteomics data.
8) Translator -- In biology, there is a large number of naming systems for ORFs, genes, and their products. The Translator attempts to manage some of that complexity by allowing relatively painless conversion between one naming system and another.
Different naming systems are often mutually inconsistent, so mapping between them is destined to be a 'lossy' process. That's lossy as in lossy data compression (A lossy compression method is one where 'compressing data' and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way).
The Translator software supports a loose definition of translation, encompassing scenarios like mapping peptides to genes or mapping across species via Cluster of Orthologous Group (COG) membership.
These go beyond simply exchanging one naming system for another. Maintaining the desired degree of rigor is up to the user's judgement.
System Requirements
The Gaggle requires Java version 1.5 or later in order to run. The Gaggle depends upon Remote Method Invocation (RMI) for communication among the geese. In earlier versions of Java, the Gaggle Boss had to be compiled together with every goose; if we did not compile the Boss specifically with your goose, then your goose would not run.
With version 1.5, this restriction is eliminated. We find this very useful, and feel that it justifies the extra burden placed -- upon Mac OSX users in particular -- who must specially install Java 1.5 (and who may, additionally, need to upgrade their operating system to 10.4). We provide explicit step-by-step Mac OSX instructions to lighten that burden; these may be found, along with instructions for Windows and Linux, by following the links at Gaggle Prerequisites
Manufacturer
The Gaggle was originally conceived and implemented at the Baliga Laboratory at the Institute for Systems Biology.
Development continues in collaboration with the Bonneau Laboratory at New York University.
Manufacturer Web Site Gaggle and video tutorial
Price Contact manufacturer.
G6G Abstract Number 20222
G6G Manufacturer Number 101457