MZmine 2

Category Proteomics>Mass Spectrometry Analysis/Tools and Metabolomics/Metabonomics>Metabolic Profiling/Analysis Systems/Tools

Abstract MZmine 2 is a modular framework for processing, visualizing, and analyzing mass spectrometry(MS)-based molecular profile data. The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted ‘metabolomic analyses’.

A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings.

MZmine 2 Data processing workflow -- A typical workflow for processing mass spectrometry data using MZmine 2 consists of the following steps:

(Note: that some of these steps are optional and may be skipped):

1) Raw data import;

2) Detection of chromatograms using the Chromatogram builder;

3) Deconvolution of chromatograms into individual peaks;

4) Deisotoping;

5) Identification of fragments, adducts, and peak complexes;

6) Normalization of retention time using the Retention time normalizer;

7) Alignment using the Join aligner or RANSAC aligner;

8) Gap filling using the Peak finder or Same range gap filler;

9) Normalization using the Linear normalizer or Standard compound normalizer;

10) Identification, using a custom database or online databases; and

11) Data analysis, export, and visualization.

MZmine 2 Supported file formats -- MZmine 2 can read and process both unit mass resolution and exact mass resolution (e.g. FTMS) data in both continuous and centroided modes, including fragmentation scans. Supported data formats are:

1) mzML (version 1.0 and 1.1).

2) mzXML (versions 2.0, 2.1 and 3.0) - mzXML provides a standard container for MS and MS/MS proteomics data and is the foundation of its manufacturer’s proteomic pipelines.

3) mzData (versions 1.04 and 1.05) - mzData is a data format capturing peak list information. Its aim is to unite the large number of current formats (PKL’s, DTA’s, MGF’s, etc.) into one; mzData.

4) NetCDF - NetCDF (Network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

5) Thermo RAW (only on Windows with Thermo Xcalibur installed).

MZmine 2 Dataset filters -- This module comprises various filter(s) that can be applied to the entire raw data file.

a) Crop filter - A new raw data file is created adding only a copy of the scans inside the retention time range defined by the user.

Scan filters - This module comprises various filters that can be applied to the raw data scan by scan.

a) Mean filter - For each data point, the filter assigns to it the intensity average of all the datapoints inside the user defined window, which is centered in the mass value of this data point.
b) Savitzky Golay filter - The Savitzky-Golay method essentially performs a local polynomial regression (of degree k) on a series of values (of at least k+1 points which are treated as being equally spaced in the series) to determine the smoothed value for each point.
c) Crop filter - The scans out of the retention time range defined by the user are deleted by using this filter. Also, the data points of the remaining scans that are out of the M/Z window defined by the user are deleted.
d) M/Z resample filter - Each scan is divide in M/Z bins whose length is defined by the user in the parameters.

Peak detection -- Chromatogram builder - The Chromatogram builder is the main module for peak detection in MZmine 2. Its purpose is to create a list of unique masses which form continuous chromatograms in raw data. Later, these chromatograms may be deconvoluted into individual peaks by the Deconvolution module.

The Chromatogram builder module works in three (3) steps: Mass detection, Filtering and Chromatogram construction.

1) Mass detection - In the Mass detection step, individual ions are detected in each mass spectrum.

a) Centroid mass detector - This mass detector is suitable for already centroided data. It simply assumes that each signal above given noise level is a detected ion.
b) Exact mass detector - This mass detector is suitable for high resolution MS data, such as provided by Fourier Transform Mass Spectrometer (FTMS) instruments.
c) Local maxima mass detector - This mass detector represents a very simple method, which detects all local maxima within the spectrum, except those signals below the given noise level.
d) Recursive threshold mass detector - This mass detector is suitable for continuous data, which has too much noise for the Exact mass detector to be used, but which shows a consistent width of M/Z peaks.
e) Wavelet transform mass detector - The Wavelet transform mass detector is particularly suitable for low resolution and noisy data.

2) Filtering - Filtering is an optional operation which may reduce the amount of false chromatograms by removing those M/Z signals which can be recognized as noise. The filtering is done on the detected M/Z peak lists; raw data is Not considered in this step.

3) Chromatogram construction - When mass detection and filtering is finished, M/Z data points from each scan must be connected together to form chromatograms.

Chromatogram deconvolution -- Following the detection of chromatograms by the Chromatogram builder, chromatograms have to be deconvoluted into individual peaks. The Deconvolution module provides several algorithms for this purpose.

Isotopic peaks grouper -- This module attempts to find those peaks in a peak list, which form an isotope pattern. When an isotope pattern is found, the information about the charge and isotope ratios is saved, and additional isotopic peaks are removed from the peak list. Only the highest isotope is kept.

Note that Deisotoping is performed after the Chromatogram builder and Deconvolution. Therefore, MZmine does Not search for isotopic peaks in individual scans, but instead tries to identify those peak list entries, which form an isotope pattern together.

Identification of fragments, adducts, and peak complexes --

a) Fragment search - This method identifies fragment peaks using MS/MS scan data. Fragment ions are usually produced from another, bigger ion, in the ionization source. Such additional signals represent an undesired noise.
b) Adduct Search - Definition of an adduct ion - An ion formed by interaction of two species, usually an ion and a molecule, and often within the ion source, to form an ion containing all the constituent atoms of one species, as well as an additional atom or atoms. This method identifies common adducts (selected by the user) in a single peak list.
c) Peak complex search - This method attempts to identify ion complexes that are pairs of ions which appear together at the same retention time, and form an ion complex which contains both smaller ions as components.

Normalization of retention time using the Retention time normalizer -- a) The retention time normalizer attempts to reduce the deviation of retention times between peak lists, by searching for common peaks in these peak lists and using them as normalization standards.

Alignment using the Join aligner or RANSAC aligner --

a) Join aligner - This method aligns detected peaks in different samples through a match score. This score is calculated based on the mass and retention time of each peak and ranges of tolerance stipulated in the parameters setup dialog.
b) RANSAC aligner - This method is an extension of the Join aligner method. The alignment of each sample is done against a master peak list which is taken from the first sample in the first round, and from the average of all aligned peak lists in every round. It corrects any linear or non-linear deviation in the retention time of the chromatograms by creating a model of this deviation.

Gap filling using the Peak finder or Same range gap filler --

a) Peak finder - Following alignment, the resulting peak list may contain missing peaks as a product of deficient peak detection or a mistake in the alignment of different peak lists. The fact that one peak is missing after the alignment does Not imply that the peak does Not exist. In most cases it is present but was undetected by the previous algorithms. This algorithm fills the gaps in the peak list when it is possible according with the parameters defined by the user.
b) Same M/Z and RT range gap filler - This method fills in gaps in each peak list row by using the same M/Z and retention time (RT) range as other peaks in the row.

Normalization using the Linear normalizer or Standard compound normalizer --

a) Linear normalizer - Linear normalizer divides the height (or area) of each peak in the peak list by a normalization factor, determined according to the Normalization type parameter.
b) Standard compound normalizer - The purpose of this module is to reduce the deviation between samples caused by different detection efficiency. Internal standard peaks must be present in the detected samples. Peak list must be aligned prior to normalization. User can select one or multiple internal standard peaks, which must be present in all raw data files. Then peak height (or area) of each peak is normalized by either the nearest standard or a weighted contribution of all standards.

Identification using a custom database or online databases --

a) Custom database search - This method assigns identity to peaks according to their M/Z and retention time values. The user provides a database of M/Z values and retention times in CSV format.
b) Online database search - This module allows identification of peaks or whole peak lists using an on-line compound database. Databases are queried for the calculated neutral mass of the peak and matching compounds are returned.

MZmine 2 Supported databases --

PubChem - PubChem database contains millions of chemical compound structures.

KEGG - KEGG database contains metabolites and other biomolecules present in natural metabolic pathways.

HMDB - The Human Metabolome Database (HMDB) - contains over 7,000 known metabolites found in the human body.

METLIN - The METLIN database contains over 20,000 metabolites.

MZmine 2 Data analysis, export, and visualization --

Data analysis -

a) Principal component analysis (PCA) - involves a mathematical procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components.
b) Sammon’s projection - The main use for the projection is visualization. Sammon’s projection is useful for preliminary analysis in all statistical pattern recognition, because a rough visualization of the class distribution can be obtained, especially the overlap of the classes.

Peak list export and import -

a) CSV export - This module exports the contents of the peak list into a Comma-Separated Value (CSV) format file, which can later be processed by MS Excel or other tools.
b) XML export and import - These modules allow exporting the contents of the peak list into an XML file. Only the contents of the peak list are saved, raw data are Not included in the export.

Visualization -

a) TIC/XIC visualizer - This tool displays a plot of two dimensions, where X axis corresponds to retention time and Y axis is the intensity level of the signal. This visualization of the raw data corresponds to the chromatographic appearance of the data.
b) Spectra plot - This tool displays all the ions from a selected scan. It shows a plot of two dimensions, where X axis corresponds to M/Z value and Y axis is the intensity of the ion signal.
c) 2D visualizer - This tool displays a plot of two dimensions, where X axis corresponds to retention time and Y axis is the M/Z value. This visualization of spots in the plot corresponds with the intensity of the data in that region.
d) 3D visualizer - This tool presents a three dimensional plot where X axis represents the retention time, Y axis the M/Z value and Z axis the intensity of the signal. This plot is the collection of all the information from the raw data in a graphical representation.
e) Peak list table - This shows a list of identified peaks, after applying a series of methods to a raw data. This feature is one of the most important of MZmine project, because it collects the result of many other modules.
f) Scatter plot - This tool shows a scatter plot with data from identified peaks in at least two (2) or more data files.
g) Histogram plot - This plot displays a graphic representation of frequencies. Each rectangle represents an interval of frequency.
h) Intensity plot - This plot is using the third part library JfreeChart for its basic functionality.
i) Neutral loss visualizer - This plot shows the neutral loss calculated from all MS/MS to MS fragmentations in the selected raw data file.

System Requirements

Contact manufacturer.

Manufacturer

G0 Cell Unit
Okinawa Institute of Science and Technology (OIST)
Onna, Okinawa, Japan
And
Quantitative Biology and Bioinformatics
VTT Technical Research Centre of Finland
Espoo, Finland

Manufacturer Web Site MZmine 2

Price Contact manufacturer.

G6G Abstract Number 20661

G6G Manufacturer Number 104307

The G6G Directory of Omics and Intelligent Software

MZmine 2