Human Cancer Pathway Protein Interaction Network (HCPIN)

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract The HCPIN is a Web-accessible database. It is a collection of human proteins that participate in cancer-associated signaling pathways, and their protein-protein interactions.

The HCPIN Website provides an extensive collection of experimental and homology models of proteins or domains associated with human cancers.

It is designed for use by cancer biologists interested in assessing 3D protein structural information in the context of the protein interaction network.

The HCPIN is a collection of proteins from cancer-associated signaling pathways together with their protein-protein interactions.

The HCPIN version 1.0 was constructed by combining proteins from seven (7) KEGG classical cancer-associated signaling pathways together with protein-protein interaction data from the Human Protein Reference Database (HPRD).

HPRD is a resource of protein-protein interaction information manually collected from the literature and curated by expert biologists to reduce errors.

The manufacturer's used KEGG because of its high quality. Pathway interaction information from KEGG was excluded from HCPIN because of a lack of precise definitions.

The seven (7) pathways in the initial version of HCPIN include:

1) Cell cycle progression;

2) Apoptosis:

3) MAPK;

4) Innate immune response (Toll-like receptor);

5) TGF-ß;

6) PI3K; and

7) JAK-STAT pathways.

Note: Many well known important cancer-associated proteins, such as p53 and NF-?B, are associated with at least one of these pathways.

The current version of HCPIN includes 2,977 proteins and 9,784 protein-protein interactions, including 240 multi-protein complexes each comprised of at least three (3) proteins.

HCPIN also includes 2,328 proteins with Swissprot Ids, 1,009 Pfam Domains, 1,216 Proteins with PDB structure coverage, and 102 Proteins with Homology Modeling structure coverage.

Experimental Procedures used to build the HCPIN database --

Database Searches --

Cell cycle progression, apoptosis, MAPK, Toll-like receptor, TGF-ß, phosphoinositide 3-kinase (PI3K), and JAK-STAT signal transduction pathways were downloaded from the KEGG database (as stated above...).

Protein-protein interactions and multi-protein complexes were downloaded from the Human Protein Reference Database (as stated above...), which included ~16,000 proteins and ~20,000 interactions.

Interactions for all pathway proteins and also additional interactions between interaction proteins are included in the HCPIN.

The list of 363 genes involved in human cancer was obtained from the Cancer Gene Census (CGC) Database. This list is exclusively restricted to genes in which mutations that are reported are causally implicated in oncogenesis.

The manufacturer’s used an IPI human cross-reference file to cross-reference proteins from HCPIN, CGC, and Swiss-Prot.

HCPIN 3D structural coverage statistics is assessed by running a Basic Local Alignment Search Tool (BLAST) search against Protein Data Bank (PDB) sequences using the TargetDB search tool with standard default parameters.

TargetDB - The TargetDB, a protein target registration database, provides information on the experimental progress and status of target amino acid sequences selected for structural determination.

Disordered residues with missing coordinates for segments within otherwise well determined 3D structures are counted as “structurally covered” in the manufacturer’s structural coverage statistics.

HCPIN proteins with No cross-referenced Swiss-Prot ID are considered as Not having verified gene models and are excluded from structure statistical analysis.

Bioinformatics Programs --

SignalP v3.0 and TMHMM v2.0 were used for predicting secreted and transmembrane proteins. The Pfam domains are identified in the SwissPfam file provided from Pfam v19.0.

The program COILS was used to predict coiled coil regions. The manufacturers labeled regions of low complexity by using the program SEG.

SignalP - SignalP is a method for the identification of signal peptides and their cleavage sites based on neural networks (NN) trained on separate sets of prokaryotic and eukaryotic sequence.

TMHMM - TMHMM is a widely used bioinformatics tool, based on the hidden Markov model (HMM), which is used to predict transmembrane helices of integral membrane proteins.

Pfam - Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models.

COILS - COILS is a program that compares a sequence to a database of known parallel two-stranded coiled-coils and derives a similarity score.

SEG - SEG is a program for filtering low complexity regions in amino acid sequences. Residues that have been masked are represented as “X” in an alignment.

Default options were used for all programs. An in-house Perl program was written to predict disordered regions based on mean charge and mean hydrophobicity.

Topology and Statistics Analysis --

The program Pajek was used for network topology analysis. The program R was used for statistics analysis.

Pajek - Pajek (Slovene word for Spider) is a program, for Windows, for analysis and visualization of large networks.

Homology Modeling and Structure Quality Assessment --

HCPIN homology models are selected from MODBASE and/or built using the XPLOR homology modeling protocol of Homology Modeling Automatically (HOMA).

If multiple models are available from MODBASE, the model with highest sequence identity is selected by HCPIN.

MODBASE - MODBASE is a database of annotated comparative protein structure models and associated resources.

HOMA - HOMA is a web-based interface that can create homology models of a protein with unknown structure (the target or query protein) based on a homologous protein with known structure (the template protein).

Structure quality reports for each of the experimental structures and models were generated using the Protein Structure Validation Software suite (PSVS), which includes structure validation analysis with ProsaII, Verify3D, Procheck, MolProbity, and other structure quality assessment tools.

Protein Structure Validation Software suite (PSVS) - PSVS provides standard constraint analyses, statistics on goodness-of-fit between structures and experimental data, and knowledge-based structure quality scores in a standardized format suitable for database integration.

The analysis provides both global and site-specific measures of protein structure quality.

The HCPIN Web-accessed Database --

Generation of Web pages (HTML) for the HCPIN server was done using Java and a relational database (MySQL).

The manufacturers recommend Web browsers Firefox version 2.0 or higher and Internet Explorer 7 or higher to provide full Java functionality. Ribbon diagrams were generated using PyMOL.

PyMOL - PyMOL is an open-source, user-sponsored, molecular visualization system that can produce high quality 3D images of small molecules and biological macromolecules, such as proteins.

Note: The manufacturer's plan to update structure coverage annotation information weekly and update HCPIN protein information every four (4) months.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site HCPIN

Price Contact manufacturer.

G6G Abstract Number 20705

G6G Manufacturer Number 104277