MetaboAnalyst
Category Metabolomics/Metabonomics>Metabolic Profiling/Analysis Systems/Tools
Abstract MetaboAnalyst is a web server/tool for metabolomic data analysis and interpretation. This web-based metabolomic data processing tool is Not unlike many of today’s web-based microarray analysis packages.
The purpose of MetaboAnalyst is to provide a user-friendly and easily accessible tool for analyzing data arising from high-throughput metabolomics data.
It is designed to address two (2) common types of problems:
1) To identify features that are significantly different between two conditions (biomarker discovery); and
2) To use the metabolomic data to predict the conditions under study (classification).
It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping.
In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods.
It also employs a large library of reference spectra to facilitate compound identification, from most kinds of input spectra.
MetaboAnalyst guides users through a step-by-step analysis pipeline (workflow) using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs.
MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses.
MetaboAnalyst on-line analysis pipeline --
MetaboAnalyst is an on-line analysis pipeline similar in concept to several existing on-line microarray analysis tools such as GEPAS and CARMAweb.
It is primarily designed to allow users to conduct two-group discriminant analysis (i.e. control vs. non-control -- the most common type of metabolomic analysis) for classification and ‘significant feature’ identification. MetaboAnalyst also supports both paired and unpaired data analyses.
A typical MetaboAnalyst run consists of six (6) steps:
Step 1) data upload;
Step 2) processing;
Step 3) normalization;
Step 4) statistical analysis;
Step 5) annotation; and
Step 6) summary report download.
Users are guided through these steps by MetaboAnalyst’s intuitive interface and the navigation bar on the left panel of each page.
Detailed descriptions help files or helpful hints are either shown on the corresponding web pages or are provided as mouse-over pop-up balloons. This support is further enhanced by the availability of several step-by-step tutorials, sample data sets (NMR, GC/LC–MS, binned data, etc.), sample summary files and frequently asked questions (FAQs) are available on MetaboAnalyst's web-site.
Step 1: data upload -- Users can begin a MetaboAnalyst analysis by pressing the ‘Click Here to Start’ link on the MetaboAnalyst’s Home page. This takes users to the data upload page.
Because there is No widely-accepted standard format for reporting metabolomics experiments MetaboAnalyst has been designed to accept diverse data types including compound concentration tables (from quantitative metabolomic studies), binned spectral data, NMR or MS peak lists, as well as raw GC-MS and raw LC-MS spectra.
Detailed instructions on how to specify paired information (for paired data analysis) as well as examples for each data type are available through MetaboAnalyst’s ‘Data Formats’ link on the manufacturers home page.
Step 2: data processing and data integrity checking -- Depending on the type of uploaded data, different processing strategies can be employed to convert the raw numbers into a data matrix suitable for downstream analysis. For compound concentration lists, the data can be used immediately after MetaboAnalyst’s data integrity check. For binned spectral data, a linear filter is first applied in order to remove baseline noise.
Often there are large numbers of missing values in a typical quantitative metabolomics dataset (10%-40% in the manufacturer’s experience). To allow selected analyses to precede (without divide-by-zero problems) these missing values are replaced by the half of the minimum value found in the dataset by default.
The manufacturer’s also implemented a variety of methods which enable users to manually or automatically perform missing value exclusion, missing value replacement, as well as missing value imputation by Probabilistic PCA (PPCA), Bayesian PCA (BPCA) and Singular Value Decomposition Imputation (SVDImpute).
In addition, as part of the data integrity check, MetaboAnalyst also verifies class labels and pair specification (if applicable) to make sure all the required information is present and consistent before proceeding to the next step.
Step 3: data normalization -- At this stage, the uploaded data is compiled into a table in which each sample is formally represented by a row and each feature identifies a column. With the data structured in this format, two (2) types of data normalization protocols - row-wise normalization and column-wise normalization - may be used.
Row-wise normalization aims to normalize each sample (row) so that it is comparable to the other. Four (4) commonly used metabolomic normalization methods have been implemented in MetaboAnalyst, including normalization to a constant sum, normalization to a reference sample (probabilistic quotient normalization), normalization to a reference feature (creatinine or an internal standard) and sample-specific normalization (dry weight or tissue volume).
In contrast to row-wise normalization, column-wise normalization aims to make each feature (column) more comparable in magnitude to the other. Four widely-used methods are offered in MetaboAnalyst - log transformation, auto-scaling, Pareto scaling, and range scaling.
Step 4: data analysis – MetaboAnalyst’s data analysis module is a collection of well-established statistical and machine learning algorithms that have been shown to be particularly robust for high-dimensional data analysis. These algorithms are organized into five (5) analysis ‘paths’ for users to explore.
- a) Univariate analysis path - Because of their simplicity and interpretability, univariate analyses are often first used to obtain an overview or rough ranking of potentially important features before applying more sophisticated analyses.
- Univariate analysis examines each variable separately and does Not consider the effect of multiple comparisons. MetaboAnalyst’s univariate analysis path supports three (3) commonly used methods - fold-change analysis, t-tests and volcano plots.
- b) Chemometric analysis path - This analysis path offers the two most commonly used chemometric methods - principal component analysis (PCA) and partial-least squares discriminant analysis (PLS-DA).
- PCA is an unsupervised method aiming to find the directions of maximum variance in a data set (X) without referring to the class labels (Y). PLS-DA is a supervised method that uses multiple linear regression technique to find the direction of maximum covariance between a data set (X) and the class membership (Y).
- c) Feature selection path - This analysis path provides two (2) well-established methods widely used for identification of differentially expressed genes in microarray experiments - Significance Analysis of Microarrays (and Metabolites) (SAM) - and Empirical Bayesian Analysis of Microarrays (and Metabolites) (EBAM).
- However, these methods are very general for the identification of significant features in high-dimensional data and are Not restricted to the analysis of microarray data.
- d) Cluster analysis path – MetaboAnalyst’s cluster analysis allows a closer interrogation of samples with similar abundance profiles. This path includes two (2) major approaches of clustering analysis - hierarchical clustering and partitional clustering.
- Hierarchical (agglomerative) clustering begins with each sample considered as separate cluster and then proceeds to combine them until all samples belong to one cluster. A variety of dissimilarity measures (Euclidean distance, Pearson’s correlation, and Spearman’s rank correlation) and clustering methods (average linkage, complete linkage, single linkage and Ward’s linkage) have been implemented in MetaboAnalyst.
- The result of hierarchical clustering is usually presented as a dendrogram or Heatmap, both of which are available in MetaboAnalyst.
- Partitional clustering attempts to directly decompose the data set into a user-specified number of disjoint clusters. Two (2) widely used methods, k-means clustering and self-organizing map (SOM) have been implemented in MetaboAnalyst. The clusters from both k-means and SOM are presented as aggregated expression profiles in which samples in each cluster are plotted as line graphs on top of each other using their feature values.
- e) Supervised classification path - Class prediction using metabolomics data is increasingly important in studies aiming for early diagnosis, prognosis or treatment outcomes. MetaboAnalyst offers three (3) advanced supervised classification methods - PLS-DA, random forest, and support vector machine (SVM).
PLS-DA based feature selection and classification was previously discussed in the chemometrics analysis path (see above...).
Random forest uses an ensemble of classification trees, each of which is grown by random feature selection from a bootstrap sample at each branch. Class prediction is based on the majority vote of the ensemble.
The SVM classification algorithm aims to find a nonlinear decision function in the input space by mapping the data into a higher dimensional feature space and separating it by means of a maximum margin hyper-plane. MetaboAnalyst's SVM analysis is done through recursive feature selection and sample classification using a linear kernel.
Step 5: Data annotation (peak search and pathway mapping) -- A key step in placing statistically significant findings from chemometric analyses (as opposed to quantitative metabolomic analyses) into a ‘biological context’ is to identify significantly altered compounds represented by certain spectral bins or certain clusters of spectral peaks.
Once a user has identified lists of MS or NMR peaks that exhibit statistically significant changes, they may use one of several spectral comparison routines and spectral libraries to attempt to identify the compound(s) based on either lists of MS peaks, GC-MS peaks or NMR peaks.
These compound identification routines and spectral reference libraries were originally developed for the Human Metabolome DataBase (HMDB) and for MetaboMiner. While Not as comprehensive as some commercial libraries or commercial software, these freely available tools have been shown to be quite advanced in identifying many common compounds.
Once compound information becomes available (via quantitative routes or via MetaboAnalyst's metabolite ID software), more insight can be obtained by which metabolic pathways are involved. Pathway mapping has been implemented in MetaboAnalyst using more than 70 pathway diagrams and metabolite libraries derived from the HMDB.
Step 6: summary report download -- When users finish their analyses and click the download link, a comprehensive report will be generated containing a detailed description of each step performed embedded with graphical and tabular outputs. In addition, the processed numeric data, high-resolution images (PNG format), R scripts, as well as the R command history are also available for downloading.
Users familiar with R can easily reproduce the results on their local machine after installation of R and the required packages. Users have the option of providing an email address (to which the summary report is sent) or simply downloading the compressed file that contains all the data (graphs, tables, etc.) produced during the analysis.
System Requirements
Web-based.
Manufacturer
- Department of Biological Sciences
- Department of Computing Science
- University of Alberta
- And
- National Research Council
- National Institute for Nanotechnology (NINT)
- Edmonton, AB, Canada T6G 2E8
Manufacturer Web Site MetaboAnalyst
Price Contact manufacturer.
G6G Abstract Number 20654
G6G Manufacturer Number 104301