Abstract PhenoPred is a web-based tool designed to detect novel gene-disease associations in humans.

It is based on known gene-disease associations, protein-protein interaction data, protein functional annotation at a molecular level and protein sequence data.

Machine learning principles are used to integrate heterogeneous data sources.

PhenoPred can be used to prioritize genes based on their likelihood to be associated with a given disease or to prioritize diseases for a given query gene.

PhenoPred is based on HUGO Gene Nomenclature for gene names and Disease Ontology (DO) for the names of diseases. DO is based on International Classification of Diseases (ICD-9) maintained by the World Health Organization (WHO).

PhenoPred method --

The PhenoPred method is supervised:

First, the manufacturer mapped each gene/protein onto the spaces of disease and functional terms based on the distance to all annotated proteins in the protein interaction network.

The manufacturer also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility.

The manufacturer then trained support vector machines (SVMs) to detect gene-disease associations for a number of terms in the Disease Ontology and provided evidence that --

Despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously.

PhenoPred web site usage --

The PhenoPred web site provides two (2) basic services:

1) For a given Disease Ontology (DO) term, PhenoPred will rank genes that are most likely to be associated with this term; and

2) For a given gene it ranks DO terms that are most likely to be associated with a query gene.

PhenoPred works with 422 DO terms, which were selected using the following rules:

PhenoPred Datasets --

The Diseases and genes of known genetic involvement were extracted from the Online Mendelian Inheritance in Man (OMIM) database, Swiss- Prot, and the Human Protein Reference Database (HPRD).

The protein-protein interaction (PPI) interaction map was assembled by combining the physical interaction data from HPRD, the Online Predicted Human Interaction Database (OPHID) and studies by Rual et al. (2005) Towards a proteome-scale map of the human protein-protein interaction network, Nature, 437, 1173-1178 and Stezl et al. (2005) A human protein-protein interaction network: a resource for annotating the proteome, Cell, 122, 957-968.

Collected disease names and associated genes were manually integrated into the DO.

PhenoPred was developed as a collaborative effort between the Radivojac and the Mooney lab at Indiana University in order to facilitate identification of disease-associated genes and understanding molecular basis of disease.

