Abstract MetabolomeExpress is a File Transfer Protocol (FTP) server and web-tool for the online storage, processing, visualization and statistical re-analysis of publicly submitted gas chromatography/mass spectrometry (GC/MS) metabolomics datasets.

Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (e.g. metabolite, species, organ/biofluid, etc.).

Users may perform meta-analysis comparisons of multiple independent experiments or re-analyze public primary datasets via user-friendly tools for t-test, principal components analysis (PCA), hierarchical cluster analysis (HCA) and correlation analysis.

Users may also interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP) their own data to the server for online processing via a novel raw data processing pipeline.

MetabolomeExpress provides an opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications.

Transparent sharing of these data will allow researchers to assess data quality and draw their own insights from published metabolomics datasets.

MetabolomeExpress Structural overview --

In a structural sense, MetabolomeExpress is comprised of four (4) interacting layers.

The first layer is an FTP-accessible repository system on the server file system that stores raw and processed GC/MS datasets.

The second layer is a simple MySQL database containing three (3) core tables:

The MySQL database also contains a variety of tables storing different ontologies and controlled vocabulary terms.

The third layer of MetabolomeExpress consists of a set of novel server-side data processing modules implemented in PHP and R.

PHP - PHP is a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.

The fourth structural layer is a JavaScript-based web interface that provides integrated access to all the data-processing, visualization and analysis tools.

Note: Usage instructions and finer structural details are presented in the MetabolomeExpress Users Manual.

The MetabolomeExpress database of metabolite response statistics --

One of the central purposes of a metabolomics database is to store treatment: control metabolite signal intensity ratio statistics in a central database and provide tools to:

1) Search for metabolic phenotypes of interest;

2) Compare the results of different experiments; and

3) Manually verify original interpretations of raw data.

For this purpose, MetabolomeExpress uses a simple database table containing columns for statistical information and a wide variety of administrative, biological and technical metadata.

The database currently contains > 9,500 metabolite ratio statistics derived from 11 experiments published in 8 articles in high-ranking plant science journals.

Quality control of datasets submitted to the MetabolomeExpress database of metabolite response statistics --

Any user with a complete dataset stored in a MetabolomeExpress FTP repository may submit this dataset to be imported into the main statistical database.

The quality control model used by MetabolomeExpress follows essentially the same principles as the major microarray data repositories.

Quality control is totally objective (carried out automatically by a computer script) and serves only to ensure that the dataset provided is complete (i.e. it includes: a correctly completed metadata file, all raw data files, peak lists, a library match report, a normalized data matrix and a statistical results file - all formatted correctly).

The validation script uses human-readable ‘validation template’ files defining reporting requirements and controlled vocabularies for major metabolomics research areas (e.g. plant, animal, bacterial, fungal and environmental) and model systems with highly-developed bioinformatics resources (e.g. Arabidopsis thaliana, rice, human, mouse, Escherichia coli and Saccharomyces cerevisiae).

Overview of the MetabolomeExpress web interface --

The MetabolomeExpress web interface provides two (2) main tools - Experiment Explorer and Database Explorer.

Experiment Explorer (see below...) is used to process and analyze raw and processed experimental datasets located in user FTP repositories while Database Explorer (see below...) is used to interact with the contents of the metabolite response statistics database and visualize the contents of shared GC/MS reference libraries.

A navigation tree allows contents of the FTP repositories to be browsed, downloaded or loaded into the Experiment Explorer.

The raw data viewer --

According to the manufacturer, one of the distinguishing features of MetabolomeExpress compared to other available GC/MS metabolomics web-tools is its Raw Data Viewer tool (a component of the Experiment Explorer ).

The current version of the viewer has two (2) windows: one for displaying chromatograms and one for displaying arbitrary mass-spectral scans. One or more chromatograms may be simultaneously overlaid in the viewer and two ‘color-channels’ are available so that two sets of chromatograms may be compared. Peak detection and/or library matching results may also be overlaid on chromatograms.

Chromatographic peak detection --

The aim of chromatographic peak detection is to capture useful information about analytically important instrument signals (i.e. chromatographic peaks) while discarding signals devoid of useful analytical information (such as baseline noise).

For this purpose, MetabolomeExpress uses a simple yet highly effective slope-based peak detection algorithm (PeakFinder) to detect chromatographic peaks in all extracted ion chromatograms and generate, for each raw data file, a corresponding tab-delimited peak list report file.

Mass-spectral and retention-index (MSRI) library matching and quantification --

Any detailed biological interpretation of a GC/MS metabolomics data set requires that signals corresponding to analytes of biological origin are correctly identified and distinguished from those corresponding to artifacts or internal standard compounds.

Compliance with Metabolomics Standards Initiative (MSI) standards for metabolite identification requires that at least two orthogonal analytical parameters are used to match analytical signals to particular metabolites.

For this purpose, MetabolomeExpress uses two (2) widely-accepted identification parameters: retention index (RI) and mass spectrum.

The MetabolomeExpress MSRI library matching algorithm accepts EIC peak lists (as generated by the PeakFinder algorithm), a mass-spectral and retention index (MSRI) GC/MS reference library (a table of retention indices, mass spectra and quantifier ion information for authentic standards and unknown analytes observed in biological samples) and a simple RI calibration file (a table of retention times and retention indices for an array of RI standards such as n-alkanes) as input and generates a match report file as output.

A number of matching criteria may also be set.

Statistics and exploratory data analysis --

The Statistics and Data Exploration panel of the Experiment Explorer module currently provides tools to carry out data matrix construction, data matrix renormalization, data matrix heatmap visualization, Welch’s t-tests, principal components analysis (PCA) and hierarchical clustering analysis (HCA).

Statistical results are displayed in the web interface but may also be downloaded for offline analysis.

Color is used to aid data interpretation wherever possible. T-test results are presented as a red/blue heatmap table that allows results to be sorted by metabolite name, chemical class, retention time, retention index, signal intensity ratio or p-value.

Wherever possible, displayed results are linked to their underlying raw GC/MS signals by point-and-click access - thus aiding manual verification of processed results.

PCA plots are provided in 2D and 3D formats. PCA plots and HCA heatmap cluster-grams are provided in vector formats for creation of publication quality figures.

The Database Explorer --

The Database Explorer module of MetabolomeExpress provides a number of tools with which to explore the contents of the MetabolomeExpress database of metabolite signal intensity ratio statistics.

The first of these, Database Statistics, provides an overview of the current database contents and buttons to load experiments into the Experiment Explorer module for more detailed analysis.

The second tool, ResponseFinder, is a simple query tool that allows users to search for metabolite responses of interest. Results are returned with links to experimental metadata and underlying raw GC/MS signals in the raw data viewer.

The third tool is MetaAnalyser. This tool allows the results of multiple experiments to be aligned, clustered, and compared in heatmap form.

