*omeSOM (transcript/metabol-ome Self Organizing Map)

Category Metabolomics/Metabonomics >Metabolic Profiling/Analysis Systems/Tools and Genomics>Gene Expression Analysis/Profiling/Tools

Abstract *omeSOM (transcript/metabol-ome Self Organizing Map) is a tool designed to give support to the data mining task of metabolic and transcriptional datasets derived from different databases.

It provides a user-friendly interface and offers several visualization tools that are easy to understand by non-expert users. Therefore, *omeSOM is a tool designed to give support to the data mining task applied to basic research as well as breeding programs.

*omeSOM is oriented towards discovering unknown relationships --

*omeSOM is oriented towards discovering unknown relationships between data, as well as providing simple visualizations for the identification of co-expressed genes and co-accumulated metabolites.

A case study which involved ‘gene expression’ measurements and ‘metabolite profiles’ from tomato fruits was conducted to show the application of this tool.

The interest in comparing the cultivated tomato against the different ILs lies in the fact that, some wild tomato relatives can be sources of several agronomical characters which could be used for the improvement of commercial tomato lines.

*omeSOM implements a neural model --

*omeSOM implements a neural model for biological data clustering and visualization. It allows the discovery of relationships between changes in transcripts and metabolites of crop plants harboring introgressed exotic alleles and its use can be extended to other type of ‘omics’ data.

The software is focused on the easy identification of groups including different molecular entities, independently of the number of clusters formed.

The *omeSOM software provides easy-to-visualize interfaces for the identification of coordinated variations in the co-expressed genes and co-accumulated metabolites. Additionally, this information is linked to the most widely used gene annotation and metabolic pathway databases.

The *omeSOM software has been implemented in the MATLAB® programming language. The manufacturer’s used a standard toolbox for SOM training, provided by the original developers of this neural network model (SOM Toolbox).

SOM Toolbox --

SOM Toolbox is a function package for MATLAB implementing the Self-Organizing Map (SOM) algorithm and more.

With the SOM Toolbox, You can:

1) Train SOM with different network topologies and learning parameters;

2) Compute different error, quality, and measures for the SOM;

3) Visualize SOM using u-matrices, component planes, cluster color coding and color linking between the SOM and other visualization methods; and

4) Do correlation and cluster analysis with SOM.

The SOM Toolbox also features other data analysis methods related to VQ, clustering, dimension reduction, and proximity preserving projections, such as:

1) Data preprocessing tools;

2) K-means, K-nearest neighbor classifier and LVQ (learning vector quantizer);

3) Agglomerative hierarchical clustering and dendrograms;

4) Principal Component Analysis (PCA);

5) Sammon’s projection; and

6) Curvilinear Component Analysis (CCA).

The *omeSOM software provides the following main features/options --

1) Create *omeSOM model - creating an *omeSOM model requires an input file with the .data extension. The map size should be typed by the user in the command line.

2) Search - any input data point can be located on *omeSOM. This function returns the neuron number where a given metabolite name/transcript code has been grouped.

3) Neurons map - several views of a trained map are possible, showing transcript (red), metabolite (blue) and both molecular entities (black) grouped into neurons.

Detailed plots of normalized and un-normalized data are displayed. Additionally, in the case of transcripts, their corresponding Arabidopsis and Solanaceae Unigene annotations can be retrieved.

Also, a list of metabolic pathways associated with each metabolite is displayed.

4) 3-colors map - A specific view of the map is displayed, painting the neurons according to a color scale that easily indicates those grouping transcripts and metabolites.

5) Neurons error measure - A typical measure of clustering quality (cohesion) is calculated for each neuron and displayed graphically over the feature map with different marker sizes.

6) Neurons having pseudo-zeros - There are special situations where some metabolite may show undetectable levels in a specific genotype, yet have valid measurements for many others.

The features described above constitute the fundamental functions of the software, which are constantly extended according to user feedback.

*omeSOM Clustering --

Neural network-based clustering is closely related to the concept of competitive learning, which is based on the idea of units (“neurons”) that compete to respond to a given subset of inputs. The nodes in the input layer admit input patterns and are fully connected to the output nodes in the competitive layer.

Each output node corresponds to a cluster and is associated with a prototype or weight vector. Given an input pattern, its distance to the weight vectors is computed and only the neuron closest to the input becomes activated. The weight vector of this winning neuron is further moved towards the input pattern.

This competitive learning paradigm is also known as “winner-takes-all” learning. Self-organizing maps (SOMs) represent a special class of neural networks that use competitive learning.

Their aim is to represent complex high-dimensional input patterns in the form of a simple low-dimensional discrete map, with neurons that can be visualized in a two-dimensional lattice structure, while preserving the proximity relationships of the original data as much as possible.

Therefore, SOMs can be appropriate for cluster analysis when looking for underlying or so-called hidden patterns in data.

A neighborhood function is defined for each neuron and when competition among the neurons is complete, SOMs update a set of weight vectors within the neighborhood of the winning neuron.

The *omeSOM software builds a SOM model oriented towards discovering unknown relationships among transcriptional and metabolite data, showing previously unknown clusters of coordinated up-regulated and down-regulated patterns in a genotype. Several model topologies, map sizes and initialization strategies are possible.

*omeSOM Visualizations --

An appropriate visualization of the resulting characteristics map, painting the neurons according to the type of data grouped, is provided for helping in the rapid identification of combined data types.

For the special case of the *omeSOM, many interesting representations of clusters can be obtained from the projection of the patterns in the lattice of neurons. If the dataset includes the original data and all the data with an inverted sign, the resulting map shows a symmetrical “triangular” configuration.

This means that the top-right and down-left zones of the map group exactly the same data but have an opposite sign.

It can be seen directly from the data visualization which genes and metabolites are up-regulated and down-regulated together or with the inverse relationship (down regulated genes grouped together with up-regulated metabolites).

In a standard SOM, clusters are recognized as a group of nodes rather than considering each node as a cluster. The identification of clusters is mainly achieved through visualization methods such as the U-matrix.

This method computes the average distance between the codebook vectors of adjacent nodes, yielding a landscape surface where light colors stand for a short distance (a valley) and dark colors for longer distances (a hill). Then, the number of underlying clusters must be determined by visual inspection.

The visualizations provided by the *omeSOM model, instead, provide a simple interface for helping in the rapid identification of co-expressed genes and co-accumulated metabolites via a simple color code.

The focus is on the easy identification of groups of different patterns, independently of the number of neurons in a cluster.

KEGG pathways associated with grouped compounds --

If metabolites and transcripts are named consistently with the Kyoto Encyclopedia of Genes and Genomes (KEGG) conventions, data grouped by neurons can be checked against metabolic pathways available online, for finding candidate genes belonging to metabolic pathways.

For each metabolite, a list of KEGG pathways where it participates can be easily visualized in the same interface. The software performs cross-reference(s) searches inside KEGG to obtain the corresponding pathway descriptions, using the metabolite KEGG codes.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site *omeSOM (transcript/metabol-ome Self Organizing Map)

Price Contact manufacturer.

G6G Abstract Number 20772

G6G Manufacturer Number 104349