Golden Helix SNP & Variation Suite

Category Genomics>Genetic Data Analysis/Tools

Abstract The Golden Helix SNP & Variation Suite is a comprehensive collection of standard and advanced statistical tools for population and family-based genetic association studies.

HelixTree, the flagship product of Golden Helix's single nucleotide polymorphism (SNP) & Variation Suite, can be used to conduct a variety of genotypic, allelic, and haplotypic association tests, perform data quality control and cleanup, control for false-positives, and detect multi-locus interaction effects and more.

Products features/capabilities include:

Import virtually any type of data set -- HelixTree enables the import of binary, continuous and categorical dependent variables supporting case-control, quantitative trait loci (QTL) and categorical analysis. Predictor variables can be binary, continuous, ordinal, categorical, nominal and genetic (bi- and multi-allelic genotypes, micro-satellites, etc.).

1) Affymetrix GeneChip-compatible -- Directly import data from Affymetrix GeneChip Human Mapping Arrays (10K, 100K, 500K and Genome-Wide Human SNP Array 5.0 & 6.0).

2) Illumina BeadStudio Integration -- Easily export genotype data from Illumina’s BeadStudio Data Analysis software into HelixTree’s proprietary sparse data storage format (DSF) using a custom BeadStudio plug-in. Then rapidly import the DSF file into HelixTree.

Perform truly interactive association analysis -- HelixTree uncovers significant associations between genetic variations and disease, drug response or other clinical outcomes that canNot be easily detected using other programs.

1) Allelic Association Analysis -- Perform allelic tests with binary (case/control) and continuous outcomes. The HelixTree allelic association feature tests individual alleles vs. the outcomes of their corresponding genotypes.

2) Genotype and Haplotype Analysis -- Rapidly perform single marker genotypic association tests across candidate genes or whole genome data sets. Get extra power by performing haplotype analysis using haplotypes as binary predictors or via a moving window approach (see Haplotype Trend Regression).

3) Haplotype Trend Regression -- Haplotype trend regression provides a unified moving window approach for testing association of haplotype frequencies with discrete and continuous phenotypes. Haplotype trend regression fits an additive effects model of haplotypes. With the Regression Module Add-on (see HelixTree Add-ons) you can also adjust your regression model for non-genetic covariates.

4) Linkage Disequilibrium (LD) -- HelixTree includes an interactive LD plot that enables you to quickly determine the extent of LD between marker pairs.

Conquer the technical challenges of large-scale data management --

1) Sparse Data Storage Technology -- HelixTree offers sparse data storage technology that enables you to import, store, and export your data into exceptionally compact data storage files - DSF and Golden Helix data (GHD). These files utilize a fraction of the disk space and RAM as ordinary text files.

2) Advanced Algorithms -- Experience the raw speed of HelixTree’s advanced proprietary algorithms. Even for large scale whole genome data sets, you can perform most operations (association tests, plotting, etc.) in seconds or minutes.

3) Project Navigator Interface (PNI) -- Keep track of your analyses via HelixTree’s PNI. In accordance with proper laboratory practices, PNI automatically time-stamps and logs each analysis step and provides efficient means for tracking and annotating results. You can also share project files, which is particularly helpful when collaborating on projects.

Gain key insights into your data -- HelixTree includes numerous intuitive tables, graphs, plots and other visualization tools that provide valuable insights about your data.

1) Marker Map Import and Application -- Import and apply genetic marker maps to properly order your spreadsheet according to chromosome and genetic distance. Marker maps also associate chromosome, physical position, cytoband, and gene information with given markers allowing you to perform analyses on regional subsets of data.

2) Haplotype Frequency Viewer -- Use the haplotype frequency viewer to estimate haplotypes for selected loci using both the Expectation-Maximization (EM) and Composite Haplotype Method (CHM) algorithms. Combine haplotypes to generate a diplotype table for further analysis.

3) Allele and Genotype Counts/Frequencies -- Generate allele and genotype frequency/counts tables for each marker in your data set.

4) Hardy-Weinberg Equilibrium (HWE) -- Determine how closely respective genotypes in your data set approximate a state of HWE by rapidly calculating and plotting HWE values for an entire data set or subgroups (i.e. cases or controls) within the data set.

Assess and remedy data quality -- To ensure that your data is of the highest quality, HelixTree provides a variety of features that help you Not only assess the quality of your data but remedy any problems as well.

1) HWE & Call Rate Cleanup -- Easily exclude problematic data that is out of HWE or that have poor call rates according to user-specified thresholds. Such cleanup functionality is especially helpful when dealing with whole genome data where thousands of markers could qualify as “poor quality”.

2) SNP Concordance -- It is beneficial to genotype a set of samples more than once to confirm the validity of an assay. The SNP Concordance feature does this by allowing you to check for concordance across all SNPs for a given set of samples.

3) Inferring Missing Genotype Data -- Sometimes SNPs have poor call rates due to poor DNA quality or problematic assays. With HelixTree, you can remedy this by accurately inferring missing genotypes using an extension of the EM algorithm.

Uncover complex gene-gene & gene-environment interactions -- Often times a SNP hypothesized to be highly associated with a disease is determined Not to be significant as a main effect.

This may be due to other factors (genetic or environmental) confounding the results. HelixTree provides a robust set of tools to uncover complex gene-gene (epistatic) and gene-environment interactions and/or correlations.

1) Recursive Partitioning (RP) -- Uncover conditional gene-gene and gene-environment associations with Golden Helix FIRM RP technology, an enhanced version of RP based on the statistical hypothesis testing methodology known as Formal Inference-Based Recursive Modeling (FIRM). RP enables you to interactively build dendrogram-like decision tree models of your data.

2) Forest of Random Trees Analysis -- HelixTree allows you to average multiple models by creating a “forest” of random trees. By analyzing a forest of random trees it is possible to understand both correlation and interaction effects among mixed types of variables.

3) Multivariate Analysis -- HelixTree also offers multivariate RP, enabling the simultaneous analysis of multiple phenotypes.

4) Two-loci Genetic Association Analysis -- The two-loci genetic p-value plot displays the statistical significance of performing associations of pairs of genetic markers with the selected response variable. HelixTree attempts a categorical split upon every possible pair of genetic variables in a node and then reports the corresponding raw p-value and adjusted p-values.

Control for False-positives -- When testing multiple hypotheses, there is always the possibility that one or more tests will result in a significant finding by chance alone, especially in the case of a genome-wide association study. Various techniques have been proposed to adjust raw p-values or to otherwise correct for multiple testing issues.

Among these are the Bonferroni adjustment, Simes’ method, and the False Discovery Rate. HelixTree provides all three (3) corrections for the multiplicity of potential splits and/or regressions over many predictors. You can also use SNP tagging (see below...) routines to exclude correlated SNPs from your data set.

1) SNP Tagging (Carlson Method) -- The LD plot has a SNP tagging capability based on the Carlson method. This method is based on the R² LD statistic, and determines groupings of markers, which are in tight correlation with an individual marker or markers (tagging markers) within a grouping.

By using tagging markers for analysis rather than the entire set, you can significantly reduce the number of multiple testing corrections without excessive loss of statistical power.

Extend, customize, innovate -- Innovate and customize HelixTree to meet your specific needs.

1) Python Scripting -- HelixTree offers an advanced Python scripting interface where you can access HelixTree analysis views programmatically. This enables you to develop new scripts to automate work flows, incorporate your own statistical routines and integrate HelixTree with literally hundreds of other packages.

2) R & S-PLUS Integration -- If you have a license for S-PLUS, HelixTree provides a custom python module (Golden Helix PyS-plus) that enables you to interface with the Windows desktop version of S-PLUS. You can also integrate HelixTree with the R statistical package using the freely available RPy package.

HelixTree Add-ons -- The Golden Helix SNP & Variation Suite provides an array of add-on modules and capabilities that you can utilize to enhance the functionality of HelixTree.

1) Copy Number Analysis Module (CNAM) -- Using methods known to be optimal for finding copy number segments, the Golden Helix Copy Number Analysis Module (CNAM) empowers researchers to find new value in their existing data, giving them the power to perform true association studies on copy number variations.

To date, most copy number analysis methods have had to rely on Hidden Markov Models (HMMs) or other methods that have proven to be problematic due to high false discovery rates and low sensitivity. Though research has shown that segmentation is the most effective way to reveal regions of variance, CNAM is one of the first to implement a dynamic optimal segmentation algorithm, making it efficient for whole genome studies.

The result of the segmentation algorithm is a covariate spreadsheet with copy number segments and intensity values for each sample in the data set. This spreadsheet can be joined with additional phenotype information to perform whole genome association analysis with HelixTree.

2) Regression Module -- An advanced Regression Module is available to test allelic and haplotypic associations in the presence of confounding phenotypic variables. The Regression Module supports both linear and logistic regression.

A typical workflow is to use stepwise regression to find confounding phenotypic variables, fix those regressors and then do a search for significantly associated haplotypes or individual SNPs. This regression approach is particularly advanced for overcoming the difficult challenges of population stratification. Permutation testing and adding interaction terms increase the flexibility of the analysis.

3) Golden Helix PBAT Software: Tools for Family-based Association Studies -- Extend HelixTree’s functionality with PBAT to include family-based analysis capabilities. PBAT is a comprehensive, user-friendly software package that provides tools for the design and analysis of family-based association studies.

Golden Helix has teamed up with PBAT’s author, Dr. Christoph Lange, Ph.D. of the Harvard School of Public Health, to integrate PBAT into HelixTree offering a graphical user interface, exclusive upgrades, distributed processing and more.

4) Whole Genome Analysis (WGA) Module -- Though HelixTree does not limit the number of rows or columns you can include in your data set, importing and analyzing large-scale whole-genome data sets may be problematic due to system memory limitations.

Adding the WGA module to HelixTree will internally compress your SNP data into a binary sparse data storage format which uses a fraction of the system memory as standard files allowing you to import and analyze extremely large data sets. For example, HelixTree has been used to analyze ~500K SNPs for 6,000 samples with a conventional desktop computer.

Note: For 30 additional features/capabilities of this advanced product - (see G6G Abstract Number 20109R74).

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site Golden Helix SNP & Variation Suite

Price Contact manufacturer.

G6G Abstract Number 20109R

G6G Manufacturer Number 101135