GenePattern SNP Analysis & Data Format Conversion Modules

Category Genomics>Genetic Data Analysis/Tools

Abstract GenePattern combines an advanced scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic data. GenePattern SNP Analysis lets you analyze single nucleotide polymorphism (SNP) microarrays using normalization, copy number estimation, smoothing, Loss of Heterozygosity (LOH) determination, and visualization.

High-density SNP arrays allow for the analysis of SNPs, copy number alterations (amplifications and deletions), and LOH detection.

GenePattern provides the following support for the analysis of SNP microarray data:

1) Scaling of the data to normalize intensity levels across microarray chips.

2) Probe-level modeling to determine an intensity value for each SNP based on the intensity levels of the probes in each probe set.

3) Copy number (CN) calculation to determine the copy number of a target SNP. The calculation, which divides the intensity value of the target SNP by the intensity value of the normal SNP, is also called CN normalization or normalization with respect to normals.

4) Smoothing based on the R package GLAD (Gain and Loss Analysis of DNA), which detects the altered regions in the genomic pattern and assigns a status (normal, gained or lost) to each chromosomal region.

5) Additional analyses to support detection and visualization of LOH and CN alterations.

SNP Analysis sub-modules are as follows:

Data Format Conversion Module -- Import and export data, normalize and filter data, convert gene identifiers, and more.

Analyzing genomic data requires working with vast amounts of inherently noisy data in a variety of data formats, where gene identifiers can vary across platforms. In addition to supporting genomic analysis, GenePattern provides support for simply working with your data files. GenePattern provides the following support for essential data processing tasks:

1) Importing, exporting, and file conversion: GenePattern imports data from a broad array of platforms and formats, including Microarray Gene Expression Markup Language (MAGE-ML), mzXML [an XML (eXtensible Markup Language) based common file format for proteomics mass spectrometric data], and the Gene Expression Omnibus (GEO) (see G6G Abstract Number 20013);

Converts Affymetrix cell intensity (CEL) files to GenePattern files and GenePattern files to MAGE-ML format; and converts line endings to the format required by the host operating system.

2) Normalizing, filtering, and imputing values: The 'preprocessDataset' module provides several pre-processing options, including normalization, floor and ceiling thresholding, and variation filtering.

If your expression data set is missing values, GenePattern provides support for imputing those values; this can be particularly useful when converting cDNA expression data, which allows missing values, to a formats that do Not.

3) Converting gene identifiers and retrieving annotations: GenePattern provides support for converting the gene identifiers used by one microarray chip to those used by another. It provides access to gene annotations through 'GeneCruiser', which uses Affymetrix probe (gene) identifiers.

4) Working with data sets: GenePattern provides support for working with data sets by allowing you to extract row and column (gene and sample) names, extract rows and columns of data, transpose rows and columns, reorder samples based on phenotypes, or split a single data set into two (2) non-overlapping subsets.

System Requirements

Supported operating systems: GenePattern installers are available for Windows, Mac OS X, and Linux. GenePattern should work with any operating system that has a Java 1.5 virtual machine installed. We have tested it on the following OS platforms:

Users are also running GenePattern on the Red Hat, Debian, Gentoo, Mandrake and Fedora distributions of Linux.

Supported browsers: The GenePattern Web Client has been tested on the following browsers:

Current technology versions: Following are the technology versions used in GenePattern 3.1.

Hardware requirements: GenePattern's hardware requirements are found on almost all currently available machines:

As of December 2007, installing all GenePattern modules from the Broad repository requires approximately 1 GB of hard drive space. The SNPFileCreator module may require additional RAM depending on the chip type and number of CEL files being processed.

Manufacturer

Manufacturer Web Site SNP Analysis Data Format Conversion

Price Contact manufacturer.

G6G Abstract Number 20183

G6G Manufacturer Number 101795