G6G Directory of Omics and Intelligent Software

Visualization of Hierarchical Clustering (VisHiC)

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract VisHiC is a public web server for clustering and interpreting gene expression data. The tool is designed to extract the most significant biological features of a microarray dataset in a single run.

The main output is a compact global view of the expression matrix with only the most significant clusters shown and less pronounced patterns are hidden away, and its interactive format is open ended for more detailed analyses.

VisHiC provides stability to otherwise ambiguous clustering and performs the labor-intensive task of evaluating hundreds of redundant clusters in a rapid automated manner.

The approximate hierarchical clustering and rapid functional analysis guarantee meaningful results even if the datasets are large.

Functional assessment of microarray datasets is an immediate application of VisHiC analysis, as annotations of highlighted clusters should relate to proposed hypotheses.

The manufacturers approach is likely to be useful for large expression data warehouses, so that the first broad overview(s) could be offered to users who are routinely browsing hundreds of datasets.

One may use VisHiC to compare different datasets in the context of experimental conditions, global expression patterns and functional aspects.

The unique feature(s) of VisHiC is the 'global enrichment analysis' of every possible cluster for shared biological function and a compact 'global visualization' that highlights major ‘gene clusters’ that are co- expressed and significantly enriched in biological terms.

VisHiC utilizes Gene Ontology (GO), well curated pathway databases [such as, the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome (see G6G Abstract Number 20267)];

Or share DNA motifs [TRANSFAC (see G6G Abstract Number 20121) and miRBase (The miRBase database is a searchable database of published miRNA sequences and annotation)] to provide info on the shared regulative mechanisms for given genes.

All this info is shown in an easily comprehendible birds-eye view via a color-coded form that outlines interesting groups of genes.

Hierarchical clustering analysis can be considered a traditional way to analyze gene expression data. There are many applications that utilize it as a springboard for further analysis. VisHiC is No exception.

Datasets go through the following stages of analysis:

1) Approximate hierarchical clustering;

2) Cluster annotation; and

3) Versatile result visualization.

Another key feature of VisHiC is the search and visualization of clusters that are significantly enriched with biological terms.

VisHiC supports gene expression datasets from all major organisms that come in a standardized tab-separated form.

Gene expression datasets can be uploaded by the user using a simple web upload program - Private Experimental Data Manager (g:PEDaM) - (g:PEDaM is a light-weight file manager for gene expression data uploads).

The manufacturer also provides a variety of public datasets. VisHiC supports more than twenty-five (25) different types of gene identifiers to allow the user to input data with their favorite gene names or database IDs --

(such as, Affymetrix; Agilent, ARRAY_ZFISH, BDGP_INSITU_EXPR, CCDS, CODELINK, DBASS3, DBASS5, DEDB, EMBL, ENSG, ENSP, ENST, ENTREZ GENE, FLYBASE, FLYGRID, HGNC, HPA, Illumina (V1, V2), IMGT, INTERPRO, REFSEQ, RFAM, UCSC, UNIPROT, etc.).

VisHiC Input data/processing --

1) To analyze a dataset using VisHiC the user must provide a gene expression dataset and indicate the correct organism during the upload process.

2) The work of VisHiC starts with dataset pre-processing - During this process the dataset is clustered hierarchically using the HappieClust Pearson (HappieClust - is fast approximate hierarchical clustering using similarity heuristics) similarity measure which is used to measure the similarity between elements.

Then each cluster from the resulted hierarchy is annotated using g: Profiler - (a web-based toolset for the functional profiling of gene lists from large-scale experiments).

3) The second stage of analysis is performed when the user chooses to analyze and visualize some particular dataset.

The user has four (4) major options to cut the hierarchical tree. Two (2) approaches (Best annotation and Annotation score) are novel and introduced in the VisHic application, the other two (Distance and None) are more traditional.

1) Best annotation - The list of interesting clusters is based on the best annotations of the clusters. In other words, each cluster is characterized by one (best) annotation.

The Best annotation cutting strategy is performed via two (2) stages:

a) Search of dense clusters: non-overlapping clusters with a significant annotation present are searched and clustered.
b) Clusters that don't contain any dense or interesting clusters are collapsed and marked as the one that doesn't have any significant annotation that satisfies the input requirements.

Input options -

Minimum size - the minimum size of the cluster that can be marked as dense.

Maximum size - is the maximum size of the cluster that can be marked as dense.

Additional threshold - Additional threshold to cluster annotations, the annotations whose p-value is below the threshold are Not used during the tree cutting process.

Gradient - The user can choose between normal and exponential, where the latter stresses extreme values in gene expression.

2) Annotation score - The list of interesting clusters is based on the accumulative scores of the clusters.

A characteristic that represents the average goodness of annotations is computed for each cluster. The interesting clusters are created according to this characteristic.

The Annotation score cutting strategy is also performed via two (2) stages:

a) Search of dense clusters: non-overlapping clusters with a list of significant annotation scores are searched and clustered.
b) Clusters that don't contain any dense or interesting clusters are collapsed and marked as the one that doesn't have any significant annotation that satisfies the input requirements.

Input options – Same as ‘Best annotation’ Input options (see above…).

3) Distance - According to this strategy the tree is cut at some distance, later the formed clusters are displayed to the user according to size and annotation enrichment. The cutting distance is provided by the user, a smaller distance corresponds to clusters with high similarity.

4) None - The tree is displayed to the user as it is and as big as it is.

VisHiC Output data/visualization --

VisHiC output consists of several pages:

1) Dataset of hierarchical clustering visualization results -

a) Dataset Heat map and Dendrogram: interactive, also contains links to cluster reports.
b) List of interesting clusters.
c) List of unique annotations that are statistically significant. A unique annotation is an annotation that is found in only one ‘dense cluster’ in the dataset.

2) Top clusters report.

3) Cluster information.

System Requirements

Web-based.

Manufacturer

Bioinformatics, Algorithmics, and Data Mining group (BIIT)
Institute of Computer Science
University of Tartu
Liivi 2-314, Tartu 50409, Estonia

Manufacturer Web Site VisHiC

Price Contact manufacturer.

G6G Abstract Number 20490

G6G Manufacturer Number 104027

The G6G Directory of Omics and Intelligent Software

Visualization of Hierarchical Clustering (VisHiC)