DataRail
Category Cross-Omics>Agent-Based Modeling/Simulation/Tools and Cross-Omics>Pathway Analysis/Gene Regulatory Networks/Tools
Abstract DataRail is an open source MATLAB toolbox for managing, transforming, visualizing, and modeling data, in particular the high-throughput data encountered in Systems Biology.
DataRail stores experimental data in flexible multi-dimensional arrays, transforms arrays so as to maximize information content, and then constructs models using internal or external tools.
Data integrity is maintained via a containment hierarchy for arrays, imposition of a metadata standard based on the Minimum Information for Data Analysis in Systems Biology (MIDAS) format, assignment of semantically typed universal identifiers, and the implementation of a procedure for storing the history of all transformations with the array.
DataRail is intended to bridge the gap between data acquisition and modeling.
DataRail is model- rather than data-centric in that the task of creating and transmitting knowledge is invested in mathematical models constructed using the software, rather than the data storage system itself, but it is designed to support existing modeling tools rather than serve itself as an integrated modeling environment.
Design goals and implementation --
To facilitate the collection, annotation and transformation of experimental data, DataRail software is designed to meet the following specific requirements:
1) Serve as a stable repository for experimental results of different types while recording key properties of the biological setting and complete information about all data processing steps;
2) Promote model development and analysis via internal visualization and modeling capabilities;
3) Interact efficiently and transparently with external modeling and mining tools;
4) Meet new requirements in data collection, annotation and transformation as they arise; and
5) Facilitate data sharing and publication through the compatibility with existing bioinformatics standards.
DataRail data is stored in a succession of regular multi-dimensional arrays (as stated above...), known as ‘data cubes’ in information technology, each representing transformations of an original set of primary data.
The integrity of data is maintained by tagging the primary data with metadata referenced to a controlled ontology, storing all arrays arising from the same primary data in one file structure, documenting the relationships of arrays to each other, storing algorithms used for data transformation with data arrays and assigning each data structure a unique identifier (UID) based on a controlled semantic.
DataRail was implemented as a MATLAB toolbox (as stated above...) with scripting and GUI-based interaction and incorporating a variety of data processing algorithms.
DataRail works best as a component of a loosely coupled set of software tools including commercial data mining packages such as Spotfire or public toolboxes for modeling.
In addition, DataRail is designed to communicate with a semantic Wiki (Sbwiki - see below...), that is better designed for storing textual information, such as experimental protocols, and that documents DataRail's use of UIDs.
SbWiki -- SbWiki is a wiki-based system which utilizes Semantic Web technologies and a lightweight data entry and cataloging framework to support collaborative management of unstructured and semi-structured Systems Biology data.
It aims to capture high-level knowledge of models, specifically the set of assumptions necessary for model interpretation that heretofore have Not been encoded in any concrete manner (particularly pre-publication), along with lower-level systems biology markup language (SBML)-compliant resource description framework (RDF) annotations.
SbWiki also tracks the experimental data used to train or evaluate models, such as details of the biological setting (species, cell type, and growth or culture conditions) and experimental protocol;
SbWiki integrates well with DataRail on this point, but both tools also support independent use.
Note: SbWiki is under development, but will be available soon.
Constructing and evaluating models --
DataRail supports three (3) approaches to modeling -
First, a statistical toolbox to perform Principal Component Analysis (PCA) and Partial Least Squares Regression (PLSR) is available, as well as one to perform multiple linear regression analysis.
Second, efficient links have been created to other MATLAB toolboxes such as CellNetAnalyzer - (see G6G Abstract Number 20331), which performs Boolean modeling;
CellNetOptimizer - a MATLAB toolbox for creating logic-based models of signal transduction networks, and training them against high-throughput biochemical data (a recent new modeling feature...); and
The differential-equation-based modeling package PottersWheel - (see G6G Abstract Number 20361).
Third, export of primary or transformed data from DataRail as vectors, matrices or n-dimensional arrays has been implemented to facilitate links to other modeling tools.
In this case, users need to ensure continuing compliance with the MIDAS data standard so as to preserve the integrity of metadata.
Thus far, the manufacturers have implemented the export into a MIDAS file, which can be read by Spotfire, and formats compatible with PottersWheel or CellNetAnalyzer or CellNetOptimizer.
Several additional exports are planned, e.g. to BioConductor to perform different sorts of statistical analysis.
Recent New feature of DataRail --
Bayesian inference via Bayesian Network Structure Learning (BNSL), Structure learning for static or dynamic Bayesian Networks with observational and/or interventional data; developed by D. Eaton and K. Murphy.
System Requirements
Contact manufacturer.
Manufacturer
- Sorger Lab
- Department of Systems Biology
- Harvard Medical School
- Boston, MA 02115 USA
- And
- Lauffenburger Lab
- Department of Biological Engineering
- Massachusetts Institute of Technology
- Cambridge, MA 02139 USA
Manufacturer Web Site DataRail
Price Contact manufacturer.
G6G Abstract Number 20702
G6G Manufacturer Number 104274