SNPExpress

Category Genomics>Genetic Data Analysis/Tools

Abstract SNPExpress is a software tool that can be used to accurately analyze Affymetrix and Illumina Single Nucleotide Polymorphism (SNP) genotype calls, copy numbers, polymorphic copy number variations (CNVs) and Affymetrix gene expression in a combinatorial and efficient way.

In addition, SNPExpress allows concurrent interpretation of these items with Hidden-Markov Model (HMM) inferred Loss-of-Heterozygosity (LOH) and copy number regions.

SNPExpress Implementation -- SNPExpress, written in JAVA (version 1.5), uses tab-delimited files as input and is currently available for use with Affymetrix DNA mapping arrays (10K 2.0, 100K set and 500K set), Illumina HumanHap550 Genotyping BeadChip and Affymetrix GeneChips (HG-U95Av2, HG-U133A and B, HG-U133 plus 2.0).

A file containing a matrix with each column representing the genotypes of one array and rows starting with Illumina or Affymetrix SNP IDs is mandatory.

The genotype should be formatted as homozygous 'AA' or 'BB', heterozygous 'AB', or, 'noCall' (Affymetrix)/'NC' (Illumina). Similar matrix files containing copy numbers or gene expression values are optional.

Copy numbers should be centered around 2, where 2 represents the normal copy number of the autosomes and 1 for the male X chromosome.

The maximum displayed copy number is 4, in case the copy number is above 4 this is indicated by the grey-blue background.

Copy number, genotype and gene expression files required for SNPExpress can be generated through tools such as Affymetrix Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM), Affymetrix GCOS/Affymetrix CNAT 4.0, or dChipSNP (significance curve and clustering of SNP-array-based loss-of- heterozygosity data) with additional formatting in Microsoft Excel.

In case of Illumina data, SNPExpress includes the non-synonymous SNPs and the major histocompatibility complex (MHC) region; however, mitochondrial SNPs and Y-chromosome SNPs are Not visualized.

All files can be optionally uploaded as tab- or comma-delimited .txt files or binary files. These binary files can be created from .txt files by the menu item 'convert data source'.

SNPExpress maps both the SNP IDs (Illumina and Affymetrix) and the expression probe set IDs (Affymetrix) to the genome through internal alignment tables, using annotation provided by the manufacturers (Illumina and Affymetrix).

Regions showing LOH are calculated through a hidden Markov Model. The probability values for heterogeneous calls required for the HMM have been generated through sets of genotypes of normal samples.

For the 100K and 500K array, 90 samples and 270 samples, respectively, of different ethnical background from the HapMap project are available through the NCBI GEO website (and provided by Affymetrix).

SNPExpress includes the option to visualize the results of a novel analytical method that infers the copy number of each SNP based on a HMM model, which is implemented in dChipSNP.

Also, all CNVs, currently cataloged in the Database of Genome Variants, can be visualized.

Example expression, copy number, genotype and HMM copy number example files can be downloaded from the SNPExpress web-site (see below).

SNPExpress Results -- Genotypes and copy numbers are displayed as sequential blocks of which color indicates genotype, 'horizontal coordinate' indicates position on the chromosome and 'vertical coordinate' indicates copy number.

The colored genotype blocks are drawn sequential in a chromosome- wide view and proportional to the chromosomal location when zooming into a region of interest.

Gene expression levels are visualized as a vertical bar at the chromosomal position of the gene-specific probe set. The height of the bar is proportional to the gene expression value.

The default value is 500 and any expression higher than 500 is capped at 500, however, these values are user-definable.

In the event that multiple probe sets span the same region in the chromosome-wide view the vertical gene expression bars are red and proportional to the highest expression value.

Zooming into the location of interest discloses the individual probe sets. Links of SNP IDs to public databases are available by holding the ctrl- key and clicking on a SNP ID.

Distinct background colors are used to accentuate genomic changes.

Individual copy numbers are indicated as gain (pink background) or loss (green background) when their value exceeds a user-defined value. The default deviation threshold is 0.5.

LOH is highlighted at diploid level by a bold magenta line. All colors can be adapted to the users' preferences.

From the menu, the user is able to choose to visualize either one chromosome of multiple samples or the complete genome of one sample.

Detailed information, containing information such as SNP ID, associated gene symbol, probe set ID, CytoBand and expression value, is shown on a mouse-over display.

Furthermore, a gene of interest is directly visualized through a search function, and its associated SNPs are indicated with an orange background color.

The options to display known CNVs (purple background) or the HMM copy number results (thin magenta line) are included.

Finally, relevant data of a particular minimal (deleted of the amplified region) can be exported (i.e. Sample, Probe-set-id, Chromosome, Location (bp), CytoBand, Associated gene, Genotype, Copy number and Inferred LOH of the selected region) and high-resolution images of the visualization can be saved in the Portable Network Graphic (PNG) format.

System Requirements

SNPExpress runs best on a computer with at least 512MB of memory (1024MB is advised). The number of samples loaded is directly proportional to the amount of memory needed. One needs the JAVA Runtime Environment of JAVA 1.5 or higher.

Manufacturer

Manufacturer Web Site SNPExpress

Price Contact manufacturer.

G6G Abstract Number 20317

G6G Manufacturer Number 100931