GenePattern SNP Analysis & Data Format Conversion Modules
Category Genomics>Genetic Data Analysis/Tools
Abstract GenePattern combines an advanced scientific workflow platform with more than 90 computational and visualization tools for the analysis of genomic data. GenePattern SNP Analysis lets you analyze single nucleotide polymorphism (SNP) microarrays using normalization, copy number estimation, smoothing, Loss of Heterozygosity (LOH) determination, and visualization.
High-density SNP arrays allow for the analysis of SNPs, copy number alterations (amplifications and deletions), and LOH detection.
GenePattern provides the following support for the analysis of SNP microarray data:
1) Scaling of the data to normalize intensity levels across microarray chips.
2) Probe-level modeling to determine an intensity value for each SNP based on the intensity levels of the probes in each probe set.
3) Copy number (CN) calculation to determine the copy number of a target SNP. The calculation, which divides the intensity value of the target SNP by the intensity value of the normal SNP, is also called CN normalization or normalization with respect to normals.
4) Smoothing based on the R package GLAD (Gain and Loss Analysis of DNA), which detects the altered regions in the genomic pattern and assigns a status (normal, gained or lost) to each chromosomal region.
5) Additional analyses to support detection and visualization of LOH and CN alterations.
SNP Analysis sub-modules are as follows:
- a) Module Name is CopyNumberDivideByNormals -- Determines the copy number of a target SNP.
- b) Module Name is GLAD -- Gain and Loss Analysis of DNA.
- c) Module Name is LOHPaired -- Computes LOH for paired samples.
- d) Module Name is SNPFileCreator -- Process Affymetrix SNP probe- level data into an expression value.
- e) Module Name is SNPFileSorter -- Sorts a .snp file by Chromosome and location.
- f) Module Name is SNPMultipleSampleAnalysis -- Determine Regions of Concordant Copy Number Aberrations.
- g) Module Name is XChromosomeCorrect -- Corrects X Chromosome SNPs for male samples.
Data Format Conversion Module -- Import and export data, normalize and filter data, convert gene identifiers, and more.
Analyzing genomic data requires working with vast amounts of inherently noisy data in a variety of data formats, where gene identifiers can vary across platforms. In addition to supporting genomic analysis, GenePattern provides support for simply working with your data files. GenePattern provides the following support for essential data processing tasks:
1) Importing, exporting, and file conversion: GenePattern imports data from a broad array of platforms and formats, including Microarray Gene Expression Markup Language (MAGE-ML), mzXML [an XML (eXtensible Markup Language) based common file format for proteomics mass spectrometric data], and the Gene Expression Omnibus (GEO) (see G6G Abstract Number 20013);
Converts Affymetrix cell intensity (CEL) files to GenePattern files and GenePattern files to MAGE-ML format; and converts line endings to the format required by the host operating system.
2) Normalizing, filtering, and imputing values: The 'preprocessDataset' module provides several pre-processing options, including normalization, floor and ceiling thresholding, and variation filtering.
If your expression data set is missing values, GenePattern provides support for imputing those values; this can be particularly useful when converting cDNA expression data, which allows missing values, to a formats that do Not.
3) Converting gene identifiers and retrieving annotations: GenePattern provides support for converting the gene identifiers used by one microarray chip to those used by another. It provides access to gene annotations through 'GeneCruiser', which uses Affymetrix probe (gene) identifiers.
4) Working with data sets: GenePattern provides support for working with data sets by allowing you to extract row and column (gene and sample) names, extract rows and columns of data, transpose rows and columns, reorder samples based on phenotypes, or split a single data set into two (2) non-overlapping subsets.
System Requirements
Supported operating systems: GenePattern installers are available for Windows, Mac OS X, and Linux. GenePattern should work with any operating system that has a Java 1.5 virtual machine installed. We have tested it on the following OS platforms:
- Windows XP, Vista
- Mac OS X 10.4 (Tiger), OS X 10.5 (Leopard)
- Linux Ubuntu 7.10, SuSE
Users are also running GenePattern on the Red Hat, Debian, Gentoo, Mandrake and Fedora distributions of Linux.
Supported browsers: The GenePattern Web Client has been tested on the following browsers:
- Windows: Firefox 2.0, MS Internet Explorer 6.0 and 7.0
- Mac: Firefox 2.0, Safari 2.0
- Linux: Firefox 2.0
- Safari: By default, Safari sets an open "safe" files after downloading preference. This setting prevents GenePattern from correctly exporting and importing zip files. To clear this preference: open Safari, select Safari>Preferences, select General preferences, and clear the Open "safe" files after downloading check box.
Current technology versions: Following are the technology versions used in GenePattern 3.1.
- Java 1.5
- R 2.5.0
- Perl 5.8.8
- Tomcat 5.5.* series
- HSQL 1.8.0
Hardware requirements: GenePattern's hardware requirements are found on almost all currently available machines:
- 256 MB RAM
- 500 MHz Pentium 3 or equivalent
- Hard drive space:
- Server: 252 MB
- Client: 84 MB
As of December 2007, installing all GenePattern modules from the Broad repository requires approximately 1 GB of hard drive space. The SNPFileCreator module may require additional RAM depending on the chip type and number of CEL files being processed.
Manufacturer
- Broad Institute of MIT and Harvard
- 7 Cambridge Center
- Cambridge, MA 02142
- Ph: 617.452.3000
- Fax: 617.452.4588
- or
- 320 Charles Street
- Cambridge, MA 02141-2023
- Ph: 617.258.0900
- Fax: 617.258.0901
- E-mail: gp-help@broad.mit.edu
Manufacturer Web Site SNP Analysis Data Format Conversion
Price Contact manufacturer.
G6G Abstract Number 20183
G6G Manufacturer Number 101795