VIsual Statistical Data Analyzer (VISDA)

Category Genomics>Gene Expression Analysis/Profiling/Tools and Cross-Omics>Biomarker Discovery/Analysis/Tools

Abstract VISDA (VIsual Statistical Data Analyzer) is an analytical tool for cluster modeling, visualization, and discovery.

Being statistically-principled and visually-insightful, VISDA exploits the human gift for pattern recognition and allows users to discover the hidden clustered data structure(s) within high dimensional and complex biomedical data sets.

The unique features of VISDA include its hybrid algorithm, robust performance, and “tree of phenotype”.

With global and local biomarker identification and prediction functionalities, VISDA allows users across the cancer research community to analyze their genomic/proteomic data, to define new cancer subtypes based on the gene expression patterns, construct hierarchical trees of multiclass cancer phenotypic composites, or to discover the correlation between cancer statistics and risk factors.

VISDA Multivariate visualization --

Multivariate visualization has proven to be an advanced yet critical tool for the analysis and interpretation of complex data. To reveal all of the interesting patterns within a data set, the manufacturers have developed a VIsual and Statistical Data Analyzer (VISDA) for cluster modeling, discovery, and visualization (as stated above...).

The model-supported exploration of high-dimensional data space is achieved through two (2) complementary schemes: dimensionality reduction by discriminatory component analysis and cluster formation by soft data clustering, whose parameters are estimated using the weighted Fisher criterion and expectation-maximization algorithm.

VISDA uses an adaptive boosting of discriminatory subspaces involving hierarchical mixture modeling of the data set.

The hierarchical mixture model, selected optimally by the minimum description length criterion, allows the complete data set to be visualized at the top level and so partitions the data set, with clusters and sub-clusters of data points visualized at deeper levels.

Each subspace model is linear while the complete hierarchy maintains overall nonlinearity.

The main application of VISDA is for multivariate cluster modeling, discovery, and visualization, particularly for data sets living in high dimensional space.

Many real-world problems, when formulated, are to explore the hidden structure of the data in one way or another. The applications can be found in biomedicine, bio-defense, intelligence analysis, market analysis, etc.

For example, to define new cancer subtypes based on their gene expression patterns, or discover the correlation between biological agents and environmental changes.

VISDA is capable of navigating into a high dimensional data set to discover the hidden clustered data structure, and model and visualize the discovery.

It is particularly effective when dealing with highly complex data sets as compared to existing methods. To reveal all of the hidden clusters, the manufacturer’s exploration of high-dimensional data space is both statistically-principled and visually-insightful.

The manufacturer’s method can incorporate both the power of statistical methods and the human gift for pattern recognition, and is capable of capturing progressively all interesting aspects of the data set.

To the best of the manufacturer’s knowledge, it represents state-of-the-art in visual statistical data analysis and exploration. VISDA incorporates one of the most advanced theory, method, and algorithm in statistical learning.

It also works for both unsupervised and supervised scenarios. VISDA has recently been adopted as one of the core data analysis components by the National Cancer Institute (NCI) of the National Institutes of Health (NIH) via its initiative, namely, cancer biomedical informatics grid (caBIG).

VISDA Application features --

VISDA Data preprocessing -

The input data can be:

1) Any tab-delimited text file including multiple annotations of genes and conditions;

2) Any local data file in MAGE-ML format; and

3) Data retrieved from caArray.

All gene/sample annotation fields can be automatically extracted and used for subsequent cluster discovery. The uploaded data can also be visualized as a Heatmap before the scheduled analysis, providing the user with a global view of the entire data set.

The configuration step gives a user the freedom to choose among different analysis tasks, such as gene/phenotype clustering, supervised/unsupervised feature selection, various projection methods and other advanced features including cluster validation.

VISDA core algorithms are then activated to perform the targeted clustering on the uploaded gene expression data.

VISDA Analytical algorithms -

VISDA implements the following functions for sample/gene clustering:

1) Supervised and unsupervised feature selections;

2) Discriminatory data projections for exploratory cluster visualization, including principal component analysis (PCA) and projection pursuit method (PPM);

3) Hierarchical statistical modeling and parameter estimation by the expectation-maximization (EM) algorithm; and

4) Advanced functional options including Fisher discriminatory component analysis (DCA) projection, MDL cluster validation and hybrid clustering initialization using HC-k-means/SOM.

VISDA Information visualization -

Expression data are displayed as a Heatmap. Annotations of the conditions are shown at the top; annotations of the genes are listed on the right.

During the clustering process, clusters at each hierarchical level can be visualized by three (3) individual 2D projections: PCA, PPM, and DCA.

The user can then select the best projection view for further classification at deeper-levels.

One of VISDA’s distinctive features is the integration of human intelligence into the automation of the core algorithms.

To leverage a user’s prior knowledge and visual cues about data patterns, VISDA allows each user to initialize the number of clusters and their centers at each exploration level.

The iterative user-algorithm interactions exploit the power of the human gift for ‘pattern recognition’ and statistical machine learning, assuring robust and globally converged clustering solutions.

VISDA Graphic user interface (GUI) -

All the sub-level results of VISDA are stored into a hierarchical structure, and a pie chart diagram (Not shown here) shows the growth of the Hierarchical Clustering (HC) tree.

All the pictures can be viewed, zoomed and saved in either PNG or EPS format. At each hierarchical level, the clustering posterior probabilities of all samples/genes belonging to each cluster can be saved as a text table with multiple sample/gene annotations.

The table of the most informative genes/features selected for array clustering, ranked by their respective signal-to-noise ratio (SNR; supervised) or variance (unsupervised) criteria, can also be viewed and saved.

System Requirements

VISDA can run on any platform that supports Java JRE 5.0 (or above) and the C compiler.

The suggested RAM is 256MB or above. CPU is 1.0 GMHz or above.

VISDA has been tested on Microsoft Windows XP, Linux, and Unix platforms.

Users can install VISDA directly on a computer and launch the program from batch files provided in the deployment package.

Manufacturer

Manufacturer Web Site VIsual Statistical Data Analyzer (VISDA) or VISDA

Price Contact manufacturer.

G6G Abstract Number 20769

G6G Manufacturer Number 104347