cPath

Category Cross-Omics>Pathway Knowledge Bases/Databases/Tools

Abstract cPath is an open source database and web application for collecting, storing, and querying biological pathway data. Using cPath, researchers can import interaction and pathway data from multiple sources, access such data via a standard web interface, and export data to third-party applications via a standards-based web service. Biologists, computational biologists, and software developers can utilize cPath for content aggregation, query and analysis. cPath can serve as a modular, core software layer in larger pathway information systems that are capable of visualizing, analyzing, and modeling biological pathways. All cPath software is freely available under an open source license for local installation and modification. The key features provided by cPath are detailed below.

Key Feature: Identifier Mapping System -- cPath provides an identifier mapping system capable of storing equivalence between two (2) or more identifiers. The system is pre-populated with identifier mappings loaded from external files. For example, a single protein unification mapping may map UniProt accession numbers to equivalent RefSeq accession numbers. Identifier mapping files are simple tab-delimited text files that must be loaded into cPath prior to import of any interaction or pathway data sets.

With some scripting ability, cPath identifier mapping files can be created from external database resources, such as Alias Server (a web server to handle multiple aliases used to refer to proteins), the EBI International Protein Index (IPI - an integrated database for proteomics experiments), or Ensembl BioMart (a query-oriented data management system). Sample protein unification files, derived from the IPI Protein Cross-References dataset, are available for download from the cPath web site. cPath also uses identifier equivalences available in imported pathway datasets that contain multiple database references for the same interactors.

Importantly, cPath also provides a similar service for storing relationships between non-equivalent but related biological entities. For example, a researcher can import a UniProt to Affymetrix mapping file, then when a new protein with a matching UniProt identifier is subsequently imported into cPath, it is annotated with all known Affymetrix probe set identifiers. This is useful for tools that link gene expression data to pathways.

Key Feature: Scalable Pathway Data Aggregation -- To support data aggregation from multiple databases, such as to create custom integrated sets of pathways for local use, cPath supports the PSI-MI (The HUPO PSI's molecular interaction format -- a community standard for the representation of protein interaction data) and BioPAX (Biological Pathways Exchange) exchange formats. As more databases make their data available in either of these two standard formats, cPath becomes increasingly useful.

As some popular pathway databases do Not permit public redistribution of their data, it is difficult for central websites to collect a complete set of pathways for research use. A local installation of cPath is one way to effectively collect and access all of this data.

cPath supports PSI-MI format Level 1 and BioPAX format Levels 1 and 2. Level 1 of PSI-MI represents protein-protein interactions. Level 1 of BioPAX represents metabolic pathways; Level 2 adds support for molecular interactions and post-translational protein modifications, such as those supported by PSI-MI.

Key Feature: Standardized Web Interface for Browsing and Querying Pathways -- Once pathway data is stored in cPath, it is available for browsing via a standard web browser. For example, the Cancer Cell Map (contains selected cancer related signaling pathways which you can browse or search) currently uses cPath software as the underlying database, and makes available a set of cancer-specific pathways curated by the Institute of Bioinformatics in collaboration with Memorial Sloan-Kettering Cancer Center.

Users of cancer.cellmap.org or any other cPath-powered site, have multiple options for querying. A user can begin with a list of pathways, or search for a specific pathway of interest, and drill down to view embedded components, such as biochemical reactions, complexes and proteins. Alternatively, a user can enter a search string, such as a protein name or identifier, in the query box, and link from the resulting query results page to interactors, interactions or pathways. cPath includes a full-text search engine that automatically ranks records based on relevance of search results and supports a simple language to define more complex queries, such as Boolean combinations of words.

Key Feature: Standardized Web Service Interface for Application Communication -- Data stored in cPath can be made available for query and export via a standards-based web service interface. For example, a third-party application can retrieve a list of all pathways stored in cPath, and then retrieve the full details of each pathway in subsequent calls back to cPath. The result of each query is a BioPAX or PSI-MI formatted Extensible Markup Language (XML) data file that can be parsed and used by the application.

By exposing all data via a standards-based web service interface, cPath enables interoperable communication with other software modules, and enables third-party applications to more easily build and expand tools for visualization, analysis and model simulation. For example, the cPath plugin for Cytoscape (see G6G Abstract Number 20092) enables researchers to download and visualize protein-protein interaction networks. A second Cytoscape plugin enables researchers to view gene expression data along a color gradient and in the context of known biological pathways retrieved from cPath. The cPath web service is Not tied to a specific operating system or programming language, and uses a REST-based (Representational State Transfer) architecture, which has only two (2) requirements: queries must be specified as Internet Uniform Resource Locators (URLs), and response documents must be specified as XML documents.

Key Feature: caBIG Interoperability -- cPath meets specific interoperability and testing requirements of the National Cancer Institute (NCI) Cancer Biomedical Informatics Grid, or caBIG. The goal of caBIG is to create a common infrastructure of interoperable tools and data specifically focused on cancer research, and software funded via caBIG is required to meet specific interoperability requirements. For example, silver-level compliance requires that the software use standard exchange formats, make all data available via well described Application Programming Interfaces (APIs), and use standard messaging systems where appropriate. Through caBIG, cPath has been formally tested by a third-party partner institution, Oregon Health & Science University (OHSU). cPath was tested on multiple operating systems, and with multiple versions of the required software providing quality assurance (QA) of the entire software system.

Key Feature: Open Source License, Local Installation and Customization -- cPath is freely available under the GNU Lesser General Public License (LGPL) for academic and commercial use. cPath can be used to distribute pathway data on the Internet, or can be used to share private data locally within an individual lab, department or company. Stable releases of the cPath software are available for download, as are nightly snapshots of the latest code, which is Not guaranteed to be stable, but may have new features compared to the last stable release. A complete administrator guide describes the step- by-step process for installing a new instance of cPath.

System Requirements

Operating System(s): Platform independent; tested on Windows, Linux and Mac OS X

Programming Languages: Java

Other Requirements: MySQL 4.0 or higher; Apache Tomcat Server 4.1 or higher; Apache Ant 1.6 or higher, Perl 5.0 or higher. All required software is open source and freely available.

Manufacturer

Manufacturer Web Site cPath

Price License: Free for academic and commercial users under the GNU Lesser General Public License (LGPL).

G6G Abstract Number 20098

G6G Manufacturer Number 100648