Functional Similarity Matrix (FunSimMat)

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract The Functional Similarity Matrix (FunSimMat) is a comprehensive database providing various precomputed functional similarity values for proteins in UniProtKB and for protein families in Pfam and SMART.

It is a comprehensive resource of semantic and functional similarity values.

FunSimMat allows ranking disease candidate proteins for Online Mendelian Inheritance in Man (OMIM) diseases and searching for functional similarity values for proteins [extracted from the Universal Protein Resource (UniProt)], and protein families [Protein families (Pfam) and the Simple Modular Architecture Research Tool (SMART)].

FunSimMat provides several different semantic and functional similarity measures for each protein pair using the Gene Ontology (GO) annotation from the UniProt KnowledgeBase (UniProtKB) and the Gene Ontology Annotation project at the European Bioinformatics Institute (EBI) (GOA).

There are several search options available:

1) Disease candidate prioritization --

2) Functional similarity --

3) Semantic similarity -- For a list of GO terms, FunSimMat performs an all-against-all comparison and displays the semantic similarity values.

Query options --

FunSimMat offers several different types of queries that are available through its web front-end and the XML-RPC server.

There are some options that are common to most of the queries:

Limit to top n results allows for obtaining only the most similar proteins, protein families or diseases with respect to the biological process, molecular function, or cellular component.

Restrict to super-classes allows for limiting the comparison to so called super-classes. This option is available for comparisons with a taxon or the complete database.

Semantic similarity search --

This query option allows for measuring the semantic similarity of the concepts represented by two (2) Gene Ontology terms.

A space- or tab-delimited list of GO terms to be compared needs be entered into the textbox in the query form, e.g. GO:0000001; GO:0004567.

The results table returns the all-against-all comparison of the GO terms based on four (4) different similarity measures, simRel, Lin, Resnik, and Jiang & Conrath.

Note: See the Results page (located on the manufacturer’s web-site) for more information on the results.

More information on the scores can be found on the Scores page (also located on the manufacturer’s web-site).

Comparing one protein / protein family with a list of proteins / protein families --

This query option allows for the comparison of one protein, protein family or disease to a given list of proteins, protein families, or diseases.

There are several possibilities for specifying this list.

First, a list of accessions may be entered into a text field.

Second, a file with accession numbers may be uploaded. The file should contain only the accession numbers separated with spaces or tabs.

Third, by entering an OMIM accession, all proteins annotated in UniProtKB with this OMIM entry are selected.

Fourth, an arbitrary taxon may be entered in the text field. It is required that you use the NCBI Taxonomy accession number of the taxon.

Fifth, a pre-defined taxon can be selected from the drop-down list.

Sixth, the query protein / protein family / disease can be compared to the whole database. The results table returns different scores.

The BPscore measures the similarity of the Biological Processes annotated to the two proteins, protein families or diseases.

Likewise MFscore and CCscore measure the similarity of the Molecular Functions and the Cellular Components, respectively.

The funSim scores are computed from BPscore and MFscore.

The funSimAll scores combine all three (3) scores (BPscore, MFscore, and CCscore), which measures the overall functional similarity of the two proteins, protein families, or diseases.

Comparing a list of GO terms with a list of proteins / protein families --

This option allows for defining a functional profile and finding similar proteins, protein families or diseases.

The functional profile is defined through a list of GO terms, which need to be entered as a space- or tab-delimited list into the text field.

Then the ontology has to be selected: Biological Process, Molecular Function, or Cellular Component.

Note: It is Not possible to define a mixed profile of these three (3) types.

There are several possibilities for specifying the list of proteins or protein families to compare to.

First, by entering an OMIM accession, all proteins annotated in UniProtKB with this OMIM entry are selected.

Second, an arbitrary taxon may be entered in the text field. It is required that you use the NCBI Taxonomy accession number of the taxon.

Third, a pre-defined taxon can be selected from the drop-down list.

Fourth, the functional profile can be compared to the whole database.

Depending on the selected GO term type, BPscores, MFscores, or CCscores are computed.

Disease Candidate Prioritization --

This option allows for prioritizing candidate disease proteins with respect to an OMIM entry of interest.

The list of candidates can be defined in several ways.

First, a list of UniProt accessions may be entered into a text field.

Second, a file with accession numbers may be uploaded.

Third, the input disease can be compared to all human proteins in the database.

Depending on the selected GO term type, BPscores, MFscores, or CCscores are computed.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site Functional Similarity Matrix (FunSimMat)

Price Contact manufacturer.

G6G Abstract Number 20731

G6G Manufacturer Number 104110