GenePattern Proteomics Module

Category Proteomics>Mass Spectrometry Analysis/Tools

Abstract GenePattern combines an advanced scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic/proteomic data.

GenePattern Proteomics offers peak detection, noise subtraction, peak matching, and more for the advanced analysis of Matrix-Assisted Laser Desorption/Ionization (MALDI), Surface-Enhanced Laser Desorption/Ionization (SELDI), and Liquid Chromatography-Mass Spectrometry (LC-MS) data.

GenePattern provides the following support for the analysis of proteomic data:

1) PEPPeR (LC-MS) -- For the analysis of LC-MS data, GenePattern provides support for the algorithms defined by PEPPeR, a Platform for Experimental Proteomic Pattern Recognition:

a) Landmark matching is a method to propagate identified peptides over time onto accurate mass LC-MS features in such a way as to maximize total identified peptides from disparate data acquisition methods. Using a combination of accurate mass and local retention time information it is possible to determine the likely identification of an unknown peak based on its relative location to known peaks.

b) Peak matching attempts to group similar features (or peaks) across multiple LC-MS sample runs by incorporating m/z (mass-to-charge ratio) and retention time (RT) variation. Although peak matching can be performed on virtually any type of LC-MS data, it is typically performed after landmark matching.

Note: The PEPPeR modules are based on work published by Jaffe, Mani, et al in PEPPeR, a Platform for Experimental Proteomic Pattern Recognition (Molecular & Cellular Proteomics 5:1927-1941, 2006).

2) ProteoArray (LC-MS) -- GenePattern's 'ProteoArray module' provides the following support for the analysis of LC-MS data:

a) For a series of LC-MS experiments in mzXML format [an XML (eXtensible Markup Language) based common file format for proteomics mass spectrometric data], GenePattern provides the ability to detect and align features across runs.

Note: This module is provided by Brian Piening of the Fred Hutchinson Cancer Research Center.

3) SELDI/MALDI -- GenePattern provides the following support for the analysis of SELDI/MALDI data:

a) Quality assessment of the input spectrum as a function of the area under the spectrum and the area under the spectrum after removing the noise component of the signal.

b) Peak detection using digital convolution (moving window) filters, which applies smoothing, background correction, and peak enhancement filters to the spectrum before identifying final peak locations.

c) Spectra comparison, which filters the noise from two (2) spectra and then compares the spectra using a cross correlation function.

d) A proteomics pipeline provides automated processing of SELDI/MALDI data. In addition to quality assessment and peak detection, the pipeline incorporates a range of normalization methods and sophisticated peak alignment algorithms for matching peaks across multiple samples.

Starting with spectra from a set of samples, the pipeline outputs matched peaks as features, and normalized intensities of these peaks for each sample. Several aspects of the pipeline are fully customizable.

e) Integration with other GenePattern analysis modules. By representing peaks as features, the peak detection and proteomics pipeline modules create output files similar to those used as input for the modules that support gene expression analysis (see G6G Abstract Number 20181).

Analyses such as clustering, classification, and differential marker selection are based on pattern recognition and applicable to the analysis of both proteomic data and gene expression data.

Note: The modules for the analysis of SELDI/MALDI data are based on work published by Mani and Gillette in Proteomic Data Analysis: Pattern Recognition for Medical Diagnosis and Biomarker Discovery (Mehmed Kantardzic and Jozef Zurada (Eds.) Next Generation of Data Mining Applications, Wiley-IEEE Press).

4) Data Formats -- Proteomics analysis modules are designed for easy access:

a) All proteomics modules read and write data using mzXML or comma- separated value (csv) files. Generally, mzXML files tend to be used for LC-MS data and csv files for SELDI/MALDI data.

b) GenePattern provides support for data conversion (see G6G Abstract Number 20183), including support for converting to and from mzXML files.

System Requirements

See Release Notes.


Manufacturer Web Site GenePattern Proteomics Module

Price Contact manufacturer.

G6G Abstract Number 20182

G6G Manufacturer Number 101795