DNASTAR ArrayStar
Category Genomics>Gene Expression Analysis/Profiling/Tools
Abstract ArrayStar is an easy-to-learn gene expression analysis software package that offers many visualization and analytical tools that are easy to use. ArrayStar v3.0 offers several New Features that simplify the process of 'gene expression data analysis' and expand the capability of ArrayStar.
These Features include:
1) An optional module QSeq (see G6G Abstract Number 20375) for RNA-Seq application analyses.
2) Expanded Gene Characterization (see below...) tables.
3) Expanded clustering graphics.
4) Additional Normalization algorithms to expand the data types analyzed.
5) Networking capability.
RNA-Seq is a quantitative method for detecting and measuring mRNA expression levels. Such procedures permit comparisons of expression levels between different samples. RNA-Seq allows the execution of gene expression experiments through the use of NextGen technology.
The QSeq module of ArrayStar v3.0 permits quantization of gene expression levels along with a wide range of visualizations.
Heat Maps --
1) Heat Maps illustrate expression levels of the genes across a number of experiments.
2) Genes can be selected within the Heat Map for additional analysis.
3) The Gene Tree to the left of the Heat Map reveals a sub-tree of genes.
4) Clicking on branches reveals cluster information.
5) Gene Ontology (GO) information is easy to obtain by passing your cursor over any gene name or Heat Map location.
6) Selection of genes or gene clusters in the Heat Map is shown in the expression level histogram in gray and illustrates relative expression levels.
Gene Ontology & Gene Characterization -- ArrayStar v3.0 has added important new Features to its Gene Characterization and Ontology section to assist users in the data analysis of their project and to view the biological significance of a selected set of genes.
There are three (3) main components comprise the Gene Ontology groupings which describe gene and gene product attributes.
Ontology terms are the same for all organisms and are organized into trees. Given terms may have more than one parent. Each term has a unique numeric ID which is maintained by the Gene Ontology Consortium.
Gene Ontology information is important in that it assists in identifying sets of genes that are relevant to a particular project function. For example, statistical analyses can be used to find genes that are up or down regulated across 2 conditions.
The Gene Ontology view helps to better understand the relationship of gene sets with particular biological function(s).
Other Visualization Features Included in ArrayStar v3.0 --
Visualizations to assist in Gene Expression level analysis - For analysis across a series of experiments, such as a time series or a related set of conditions, two (2) advanced clustering algorithms are available in ArrayStar: Hierarchical Clustering and k-Means Clustering.
1) The Hierarchical Clustering method groups data points by clustering them one-by-one into ever-growing groups. After grouping all of the data points, the resulting clusters are displayed in the Heat Map.
2) The k-Means Clustering method differs from the Heat Map method since it groups data points by partitioning them into a fixed number of arbitrary groups and then repeatedly refining the groups. The Line Graph Thumbnail view is best used to display the k-Means Clustering.
Expression Level Changes - Line Graphs and Thumbnail Graphs -
Line Graphs - ArrayStar allows users to easily visualize the expression level changes seen in individual genes over the course of the experiment through the use of Line Graphs. Any gene can be highlighted by passing the cursor over it to generate the graphical representation of its expression.
Analytical Features of this include:
1) Selection of desired gene reveals ontology information.
2) View comparisons of different gene expression levels.
Thumbnail Line Graph - ArrayStar’s Line Graph Thumbnails view displays a series of Line Graphs generated from a clustering. Each individual Line Graph shows a visualization of the data contained within one cluster.
Expression levels are plotted vertically along the Y-axis, while the X- axis position for each point is determined by the experiment to which it belongs. Mouse-over a vertical gridline to view the experiment name.
Scatter Plot - ArrayStar software enables researchers to perform a wide range of analyses on their data. Multi-functional scatter plots can be generated that allow the user to easily select groups of genes for analysis.
Gene Table - The Gene Table view contains detailed information for every gene in your project, including both the expression data (e.g. signal intensities and fold change values) as well as any annotations that are available from imported sources, such as the gene name(s) and gene ontology.
Some annotations have special features, allowing you to hover over a term for more information, or click on a hot link to view detailed information online. Any gene subsets being investigated are indicated in the Gene Table, allowing you convenient access to key tabular information for the genes being visualized by other tools in the package.
Data Analysis -- ArrayStar provides users with several different methods for data analysis using statistical methods. Depending on the experiment and the type of information sought, different methods may be applied by the user.
1) Probabilistic Statistical Analysis Methods - To use these statistical tools replicate samples are required. Variability is measured within the replicates. From the variability, confidence scores that are generated can be used to reflect differential gene expression. Methods available to users are:
- a) Student t-test;
- b) Moderated t-test; and
- c) F-test (ANOVA).
After selection, ArrayStar calculates a P Value and a T/F Value for each gene. In general, if the T/F value is large, then the assumption can be made that the gene is differentially expressed.
The P value represents the probability that the calculated T/F value occurred by chance. In general, the lower the P value, the more confident you can be that the gene is differentially expressed.
In addition to the probabilistic statistical analyses listed above, the following general statistics are also available in ArrayStar:
- a) Coefficient of Variation;
- b) Standard Deviation; and
- c) Variance.
2) Multiple Testing Corrections - Statistical tests like the Student’s t- Test, F-Test (ANOVA) and Moderated t-Test are used to identify differentially expressed genes. However, often with a large dataset, it’s possible to have a significant group of false positives.
For example, a t-Test can be applied on a group of genes and those which have a p-value less than a certain value (0.05, for example) can be chosen as differentially expressed.
However, when the test is performed on a large number of genes (order of 10,000), a significant number of genes (~500) that are Not actually differentially expressed will have a p-value lower than the set threshold and thus will be selected as differentially expressed. These genes are false positives, and this issue is referred to as the 'Multiple Testing' problem.
Various adjustments can be made to the p-values with the objective of reducing the number of false positives. The adjustments available in ArrayStar are listed below, and can be applied to the p-values for any of the probabilistic statistical tests in ArrayStar.
- a) Bonferroni - In the Bonferroni method, the p-values for each gene are multiplied by N, where N is the total number of genes being tested. This increases the p-values to such a level, that very few genes are selected within the threshold.
- The Bonferroni method is highly conservative and while it reduces the number of false positives greatly, a number of truly differentially expressed genes are excluded. The Bonferroni method may be best utilized when looking for a small number of genes which are highly differentially expressed.
- b) Holm-Bonferroni - Using the Holm-Bonferroni method, the p-values are first sorted and then the smallest value is multiplied by N, where N is the total number of genes being tested. The next value is then multiplied by N-1 and so on, so that the last p-value is multiplied by 1.
- This method is Not as conservative as the Bonferroni method, but may still exclude many potentially interesting genes (false negatives). As with the Bonferroni method, the Holm-Bonferroni method may be best utilized when looking for a small number of genes for further experiments which are highly differentially expressed.
- In other words, this method can be effective when the goal is to just eliminate false positives even if it is at the cost of a number of false negatives.
- c) FDR (Benjamini Hochberg) - The FDR (Benjamini Hochberg) method is the default P-value adjustment method in ArrayStar. In this method, the p-values are first sorted and ranked. The smallest value gets rank 1, the second rank 2, and the largest gets rank N. Then, each p-value is multiplied by N and divided by its assigned rank to give the adjusted p-values.
- In order to restrict the false discovery rate to (say) 0.05, all the genes with adjusted p-values less than 0.05 are selected. This method aims to reduce what is called the False Discovery Rate (FDR) and is used when the objective is to reduce the number of false positives and to increase the chances of identifying all the differentially expressed genes.
3) Filtering - The Filtering capability of ArrayStar permits users to modify 'gene searches' in a number of different ways. Criteria that can be used include:
- a) Fold Change;
- b) Gene Annotations;
- c) Expression Levels; and
- d) Statistics.
Fold-change analysis is a simple method used to identify genes with expression ratios or differences between a treatment and a control that are outside of a given cutoff or threshold. ArrayStar permits, in its Filtering mode, searches to be conducted on Fold Change levels that are determined by the user. In the Scatter Plot image, a range of pre- set Fold Change levels are provided.
Gene Annotation permits filtering based on the selected genes that have annotations entered. Expression Levels and Statistics permit users to define filter criteria in each for the search.
Note: See G6G Abstract Number 20071 for additional product info from this manufacturer.
System Requirements
ArrayStar (Windows® computer running XP or Vista™)
- Windows® XP or Vista™
- 1 GHz or faster x86 CPU
- 384 MB of RAM (512MB RAM on Vista™), 1GB of RAM is required if using QSeq module
- 140 MB free hard drive space for installation (additional 280 MB) required on XP if .NET 2.0 is not installed
- Internet access (required to install, recommended for NetAffx™ usage)
Projects containing large data sets may require additional computing capacity.
Manufacturer
- DNASTAR, Inc.
- 3801 Regent Street
- Madison, WI 53705 USA
- Phone: 1 608-258-7420
- Toll Free: 1 866-511-5090
- Toll free calls from the U.K.: 0-808-234-1643
- FAX: 1 608-258-7439
- Email: info@dnastar.com
Manufacturer Web Site DNASTAR ArrayStar
Price Contact manufacturer.
G6G Abstract Number 20058A
G6G Manufacturer Number 100770