PIPs database

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract PIPs database (human protein-protein interaction prediction database) is a resource for studying protein–protein interactions in humans.

It contains predictions of >37,000 high probability interactions of which >34,000 are Not reported in the interaction databases HPRD (Human Protein Reference Database), BIND (Biomolecular Interaction Network Database), DIP (The Database of Interacting Proteins) or OPHID (Online Predicted Human Interaction Database).

The predictions stored in PIPs are derived by a Bayesian prediction method that combines information on the likelihood of interaction from a variety of sources.

A novel feature of the method is to use a Transitive module that gathers evidence for interaction from examination of predicted common interactors to a pair of proteins.

The unique combination of features examined allowed the generation of a set of predictions that are mostly orthogonal to other PPI databases. The database and its interface allow the user to see the full evidence trail for each predicted interaction.

In this way, PIPs is a resource Not only for large-scale modeling of protein interaction networks, but also as an exploratory tool for the cell/molecular biologist who wishes to understand more about the predicted interaction network for the protein they are studying.

The Protein-Protein Interactions (PPIs) in PIPs are predicted by a naïve Bayesian model as described in Scott and Barton (BMC Bioinformatics (2007) 8:239).

Briefly, this method combines information from gene co-expression, orthology, co-occurrence of domains, post-translational modifications, co-localization of the proteins within the cell and analysis of the local topology of the predicted PPI network.

The different evidence types are programmed as separate modules with each module giving a 'score of interaction'.

The individual module scores are combined to give a prediction for the overall likelihood of interaction given the available data.

Feature Modules:

1) Gene Co-expression -- The data used is derived from the GDS596 dataset [a GEO dataset (see G6G Abstract Number 20013)].

This contains the profiles for 79 physiologically normal tissues collated from several sources.

The likelihood of interaction is calculated based on the hypothesis that proteins that are co-expressed are more likely to interact than two proteins selected at random.

The correlation is represented as a Pearson's Correlation.

2) Orthology -- Interactions that are inferred based on orthologous data rely on the assumption that if the two (2) proteins have been observed to interact in a second organism then they are likely to interact.

Orthology maps between human, yeast fly and worm have been downloaded from the InParanoid database (a database of Eukaryotic Ortholog Groups).

3) Domain Co-Occurrence -- Based on the hypothesis that if two (2) domains are found to interact between one pair of proteins, a second pair of proteins that have the same domains are also likely to interact.

Domain and motif information has been collated from the InterPro database (a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences).

The chi-square test was used to measure the likelihood of co- occurrence of specific domains.

4) Co-localization -- Interacting proteins are more likely to be annotated as being present within the same cellular compartment.

A full set of localization data was generated using the Predicting Subcellular Localization Tool (PSLT) - (PSLT is a Bayesian network localization predictor that is based on the combinatorial presence of InterPro motifs and specific membrane domains in human proteins).

The PSLT was used to classify protein pairs into four (4) groups:

5) Co-occurrence of Post-Translational Modification (PTM) -- The likelihood of interaction is based on the co-occurrence of post- translational modifications (PTMs).

The annotations of PTMs for humans were obtained from the UniProt [The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data] and HPRD databases.

Network Analysis (Transitive) -- The transitive score uses the predictions that have been made by all the other feature modules (described above) to make a prediction based on the local topology of the network of interactions between two proteins.

The predictions are based on the premise that two proteins are more likely to interact if they have a similar set of interactors than if the predicted interactors for the two proteins are completely different.

The PIPs web interface -- The front page of the PIPs interface allows for simple searches with the IPI, UniProt or RefSeq identifier for a protein, or a text search with keywords.

The output may be restricted by adjusting the minimum score threshold.

The Advanced Search allows the query protein sequence to be compared with the protein sequences stored in the PIPs database by MagicMatch (cross-references sequence identifiers across databases) which returns exact matches to the query sequence.

If No match is found, a BLAST (Basic Local Alignment Search Tool) search may optionally be run to find sequences that are similar to the query.

A batch mode is available to allow larger numbers of protein IPI identifiers to be run against the PIPs database as a single set.

System Requirements

Web based.

Manufacturer

Manufacturer Web Site PIPs database

Price Contact manufacturer.

G6G Abstract Number 20337

G6G Manufacturer Number 102856