G2D (genes to diseases)

Category Genomics>Genetic Data Analysis/Tools

Abstract G2D (genes to diseases) is a web resource for prioritizing genes as candidates for inherited diseases. It uses three (3) algorithms based on different prioritization strategies.

The input to the server is the 'genomic region' where the user is looking for the disease-causing mutation, plus an additional piece of information depending on the algorithm used.

This information can either be the disease phenotype [described as an Online Mendelian Inheritance in Man (OMIM) identifier], one or several genes known or suspected to be associated with the disease (defined by their Entrez Gene identifiers), or a second genomic region that has been linked as well to the disease.

In the latter case, the tool uses known or predicted interactions between genes in the two regions extracted from the STRING database (see G6G Abstract Number 20298).

The output in every case is an ordered list of candidate genes in the region of interest.

For the first two of the three methods, the candidate genes are first retrieved through a ‘sequence homology’ search then scored accordingly to the corresponding method.

This means that some of them will correspond to well-known characterized genes, and others will overlap with predicted genes, thus providing a wider analysis.

Outline of the G2D Server --

Given a chromosomal region where the user is looking for candidate genes, there are three (3) ways of using the G2D server depending on the algorithm that will be applied.

1) The first option takes as input the phenotype of the disease of interest described by means of an OMIM disease entry.

The system will prioritize the genes according to the description of the phenotype as provided by the MeSH disease annotations (to the linked bibliography in the corresponding OMIM entry).

2) The second option is to input one or several human or mouse genes that are already known or are suspected to be involved in the disease.

The system will prioritize the genes in the target region according to their similarity to the known gene(s) as given by their Gene Ontology (GO) annotations and high sequence homology.

3) The third option can be used when another chromosomal region has been also linked to the disease of interest.

The system will look for protein–protein interactions in the STRING database, both known and predicted that may be occurring between a gene in the region of interest and a gene in the second region.

Use of Phenotype option --

The system will prioritize the genes according to the description of the phenotype and its precomputed associations to gene features as extracted from the literature and the Entrez Gene database.

The input consists simply of the region where the user is looking for the mutation, and the phenotype of interest, given as an OMIM identifier.

The output is an ordered list of candidate genes, both known and predicted, ordered according to their susceptibility for producing the phenotype.

For each candidate, you can explore any overlapping with Expressed Sequence Tags (ESTs) and pseudo-genes, as well as trace back the reasoning the system followed to associate the candidates to the disease.

Use of Known Genes option --

This method can be used when one or more genes are already known or are suspected to be related to the disease. In order to use it, you have to provide your 'target region' in the same way as for the phenotype method, plus one or several human genes associated to the disease or related mouse genes.

Like for the phenotype method (above), the output is an ordered list of candidate genes that you may explore in a similar manner.

Candidates that overlap with genes that have (STRING) interactions with any of the known genes input by the user are flagged.

Use of STRING protein-protein interactions option --

The third method for finding candidate genes for multiple gene phenotypes relies on protein-protein functional interactions, either known or predicted.

The rational is that mutations on two proteins that participate on the same pathway, or are directly interacting, will produce the same or very similar phenotypes.

The input for this method consists of the target region, entered in the LOCATION BOX like in the previous two (2) methods, and a second region where the phenotype of interest has been also mapped to.

The second region is entered in the SECOND LOCUS BOX in the same manner, specifying format and chromosome.

The output is a list of genes in the target region that may interact, according to STRING, with any genes(s) in the SECOND BOX locus.

Candidates are sorted by how likely are their corresponding interactions to be true indicated by their associated STRING scores with 0.99 as maximum value.

G2D Additional info --

Running times for the algorithms vary from very few seconds to almost immediate yielding of results.

Users can examine, for all three (3) procedures, the rational used by the system to make the prioritization, keeping the process transparent.

The manufacturer supports this by including extensive hyperlinks to related resources such as NCBI sequence databases, Gene Ontology and the UCSC Genome Browser (see G6G Abstract Number 20197).

The G2D server contains precomputed candidate genes for more than 600 genetically inherited diseases that have been mapped onto chromosomal regions without assignment of a particular gene.

System Requirements

Web-based.

Manufacturer

Manufacturer Web Site G2D

Price Contact manufacturer.

G6G Abstract Number 20426

G6G Manufacturer Number 104055