SeattleSNPs PGA

Category Genomics>Genetic Data Analysis/Tools

Abstract The SeattleSNPs Programs for Genomic Applications (PGA) is focused on identifying, genotyping, and modeling the associations between single nucleotide polymorphisms (SNPs) in candidate genes and pathways that underlie inflammatory responses in humans.

Users can visualize these high-density SNPs and access the underlying data in many forms--both text and graphical representations.

SeattleSNPs is focused on variation analysis in genes related to the inflammatory response. These gene targets are found in specific pathways and from interacting molecules contributing to this response.

SeattleSNPs Gene Resources --

1) Baseline assembled and complete genomic sequence and chromosomal location for candidate gene targets.

2) Mapping of exon and repeat structure for candidate genes.

3) Amplification primers and conditions.

4) SNPs mapped by location in gene structure.

5) SNPs with immediate surrounding sequence for genotype assay design.

6) Genotypes and relative allele frequencies of the SNPs.

7) Special features of SNPs - location (5', coding, etc.), amino acid substitutions, and recurrent variation.

8) Manuals on all protocols, data analysis procedures, and use of software tools (see below...).

9) Workshop on genetic variation analysis and a gene submission program for variation analysis.

SeattleSNPs Genotyping Resources --

Background - In the last year, the manufacturer has converted their SNP discovery panel to samples included in the HapMap project. The SeattleSNPs project is now generating comprehensive SNP data for HapMap derived samples from both the European and African populations.

They have also implemented a large-scale genotyping effort, using Illumina BeadArray technology, to map highly informative tagSNPs from previously studied SeattleSNPs candidate genes.

TagSNP site selection - The manufacturer has developed an efficient selection algorithm (LDSelect - see below...) that is based on the linkage disequilibrium statistic r² and doesn’t require direct haplotype inference (Carlson et al. 2004).

This algorithm selects a subset of variants that efficiently describe all common patterns of variation in a gene.

SeattleSNPs Software Resources --

1) Genome Variation Server - The Genome Variation Server (GVS) is a 'local database' hosted by the SeattleSNPs PGA.

The objective of this database is to provide a simple tool for rapid access to human genotype data found in dbSNP, and to provide tools for analysis of genotype data.

The current release of genotype data found in the GVS database is that of dbSNP build 129 (June 2008).

The manufacturer has also added the HapMap phase 3 data (draft release 1) of August 2008. The variation locations are mapped to the human genome reference sequence of March 2006 (UCSC hg18, NCBI build 36).

This GVS database contains 4.5 million variations with corresponding genotype data.

To be included in this database, a variation must have genotype data, and it must be uniquely mapped to the human genome by dbSNP.

As most submitters to dbSNP report double genotypes for X and Y chromosome variations, the manufacturer puts double genotypes in their database, and changed single genotypes to (homozygous) double genotypes.

If a genotype on the Y chromosome was reported to be heterozygous, the manufacturer omitted it. The manufacturer has Not corrected the frequencies for male X chromosome genotypes.

2) HaploPowerCalc - HaploPowerCalc is a tool for estimating power to detect 'disease association' by a set of markers (e.g. a tag SNP panel or SNPs on an array), at any user-specified polymorphic site(s), under arbitrary disease model and sample sizes.

It is intended for users who wish to estimate the power (or sample sizes required to obtain adequate power) in their association study. HaploPowerCalc uses an approach based on haplotype-sampling.

3) PolyPhred - PolyPhred is a program that compares fluorescence- based sequences across traces obtained from different individuals to identify heterozygous sites for single nucleotide substitutions.

PolyPhred is Not a stand alone application. PolyPhred's functions are integrated with the use of three (3) other programs: Phred (Brent Ewing and Phil Green), Phrap (Phil Green), and Consed (David Gordon and Phil Green).

PolyPhred identifies potential heterozygote using the base calls and peak information provided by Phred and the ‘sequence alignments’ provided by Phrap. Potential heterozygotes identified by PolyPhred are marked for rapid inspection using the Consed tool.

4) VG2 - Displaying Genotype Data: Visual Genotypes - This visual genotype function is now available through the Genome Variation Server.

5) VH1 - Displaying Estimated Haplotype Data: Visual Haplotypes - This visual haplotype function is now available through the Genome Variation Server.

6) LDSelect - The LDSelect program analyzes the patterns of linkage disequilibrium (LD) between polymorphic sites in a locus, and bins the SNPs on the basis of a threshold level of LD as measured by r2.

At each round of selection, the 'binning algorithm' identifies the single SNP which exceeds the threshold r2 with the maximum number of other SNPs, and sets this group of SNPs as a bin.

Then each SNP within the bin is analyzed to determine whether it exceeds the threshold r2 with all other SNPs in the bin.

All SNPs in a bin that meet this criterion are designated as TagSNPs. Only one TagSNP needs to be typed per bin.

7) LDSelect-Multipopulation - The MultiPop-TagSelect algorithm, as implemented in the program multiPopTagSelect.pl, attempts to select a near-minimal set of tagging single-nucleotide polymorphisms (tagSNPs) that account for all observed patterns of linkage disequilibrium (LD) in multiple populations.

Specifically, it processes the output of tagSNP selection algorithms that designate bins of nearly equivalent SNPs, such that choosing (and typing) one SNP from each bin is sufficient to capture all associations observed in the data.

8) PCR-Overlap - This program will take large tracks of sequence data in FASTA file format, and produce Polymerase Chain Reaction (PCR) products in overlapping segments to span the entire region.

9) GeneHunter - see GENEHUNTER-MODSCORE (G6G Abstract Number 20395) and GENEHUNTER-TWOLOCUS (G6G Abstract Number 20396).

System Requirements

Contact manufacturer.

Manufacturer

Developed by the University of Washington in conjunction with Princeton University and National Heart Lung and Blood Institute.

Manufacturer Web Site SeattleSNPs PGA

Price Contact manufacturer.

G6G Abstract Number 20401

G6G Manufacturer Number 104032