Protein Model Portal (PMP)
Category Proteomics>Protein Structure/Modeling Systems/Tools
Abstract The Protein Model Portal (PMP) gives access to the various models that can be leveraged from Protein Structure Initiative (PSI) targets and other experimental protein structures by comparative modeling methods.
The current release of the portal allows searching over 13.8 million precomputed model structures provided by different partner sites, and provides access to various interactive services for template selection, target-template alignment, model building, and quality assessment.
The aim of the PMP is to foster the effective usage of molecular model information in biomedical research by providing unified access independent of individual sequence nomenclature and the accession code system and by supporting the development of data standards to facilitate exchange of information and algorithms.
Furthermore, PMP aims to provide a forum for discussions between developers of modeling methods and applied biomedical researchers on best-practices, including methods for quality assessment, guidelines for the publication of theoretical models, and educational resources on usage of models for different biological applications.
Model Types --
Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Experimental structural biology and homology modeling thereby complement each other in the exploration of the protein structure space.
For every structure determined, hundreds of models can be derived using a variety of established methods.
Sequence-Centric Models (SC) are generated by searching the best available template structures to build a model for a given protein target sequence, while Template-Centric Models (TC) result from using a specific solved structure as a template to build a number of models for a series of target protein sequences.
Target-Template Alignments --
The target-template alignment provided on the model info pages are generated dynamically by structural superposition of model and template structures using MAMMOTH.
MAMMOTH-mult is a multiple alignment version of MAMMOTH. MAMMOTH-mult produces biologically meaningful trees, and preserves conservation of functional and structural motifs in the alignments.
Typical alignments take an average of ~5 CPU seconds in a standard desktop workstation. Overall, MAMMOTH-mult can be particularly useful for large scale applications in protein structure classification, protein structure prediction and in structural genomics applications.
Model Quality --
Protein structure models are theoretical models which may contain large errors and therefore need to be treated with caution. The quality of protein models therefore needs to be analyzed carefully.
1) Model quality and applications of models - Generally, protein structure models can support the design of experiments and may help explaining experimental observations but have only limited predictive value. The quality of a model determines its suitability for a particular application.
Knowledge of the expected accuracy of a protein structure model is of crucial importance for a biologist intending to use the model.
The importance of quality estimation in modeling has been underlined in the literature. There are basically two (2) sources of information supporting the estimation of the accuracy of homology models.
- a) The first source is the availability of structural knowledge which is primarily determined by the evolutionary distance between the query protein and template proteins of known structure.
- This is based on the observation that there is a direct correlation between sequence identity of a pair of proteins and the structural similarity of their common core.
- b) The second source of information comes from the analysis of the geometry of the model. Especially when the sequence identity is low, individual models may vary considerably from the expected average quality due to various sources of errors in modeling and inaccuracies introduced by the modeling programs.
- It is therefore necessary to independently check the geometric plausibility and the ‘energy’ of the model. For this purpose scoring (or energy) functions have been developed via a ‘quality estimation server’ integrated in PMP.
2) Determinants of model accuracy -
- a) Sequence identity between target and template of known structure - The sequence identity between the target protein and template of known structure is commonly seen as a first indicator for the expected accuracy of a model, as confirmed by various studies.
- Based on the sequence identity to the template the manufacturers assign a model to one of three (3) categories of modeling complexity.
- The classification roughly agrees with the one introduced by Rost (Rost 1999) who defines three zones of sequence similarity: midnight zone (Zone A), twilight zone (Zone B), and safe zone (Zone C).
- Zone A: In models based on a target-template sequence alignment lower than 30% sequence identity, frequently substantial alignment errors and suboptimal template selection are observed. Careful validation of these models quality is strongly advised.
- Zone B: In models based on a target-template sequence alignment between 30% and 50% sequence identity alignment errors in non-conserved segments of the target protein, structural variation in templates, and incorrect reconstruction of loops (insertions and deletions) are frequent sources of model inaccuracies. Careful validation of the model quality and variability among template structures is advised.
- Zone C: Models based on a target-template sequence alignment higher than 50% sequence identity typically have the correct fold and the alignments tend to be mainly correct.
- Structural variation in templates and incorrect reconstruction of loops (insertions and deletions) are the main sources of model inaccuracies. Validation of the model quality and analysis of the variability among template structures is advised.
- b) Actuality of template selection - The Protein Model Portal provides access to several modeling repositories. These repositories contain models based on the best available template at the time of model building.
- It should therefore be always checked whether a newer template with a considerably higher sequence identity with respect to the query protein has become available in the PDB.
- c) Variability among available templates - In homology modeling, often several evolutionarily related proteins with known experimental structure are detected for a given query protein of interest. Depending on the protein family these templates may be structurally quite similar or vary considerably.
- Usually, some regions in the core of the templates agree more (the ‘structural core’) and some parts, mainly protein surface loops, are less similar (the ‘structurally variable regions’).
- The structural core, which also tends to be more conserved in sequence, serves as a template for structural extrapolation. These parts of the model which are directly inherited from the template(s) are generally more accurate compared to the remaining regions which need to be predicted from scratch.
- Structural variations among templates can have several regions such as differences in experimental conditions, presence or absence of ligands/co-factors but also evolutionary reasons.
- The variations may be characteristic for the family and a sign of flexibility or disorder. There are many examples of proteins which are largely disordered and whose function can only be explained by taking into account the non-existence of a well-defined three-dimensional structure.
3) Structure Comparison -- The variability among the models (the term ‘model’ applies to both homology models and experimental structures) of a given protein predicted by different programs/servers may be to a large extend explained by the variation in the templates but the model ensemble also contains additional information.
A strong consensus among models of various servers, e.g., is a good sign for the correctness of a model since the probability that many modeling resources predict the same feature all wrong is much lower than doing it all right.
In the model overview page of PMP, the structure comparison tool can be used to compare any subset of models and analyze the variability among them.
4) Sequence Annotation -- Annotation of the target model sequences is retrieved from UniProt using the REST interface. PFAM Domain structure for the model target sequence is annotated using the InterPro Distributed Annotation System.
5) Model Preview Images & Visualization -- The model preview images on the model info pages are generated dynamically using Molscript and Raster3d.
Interactive in-line visualization is accomplished using Jmol - (Jmol is an open-source Java viewer for chemical structures in 3D).
6) PSI Partner Sites -- Models and interactive tools made accessible by the Protein Model Portal are provided by the following partners:
CSMP - Center for Structures of Membrane Proteins;
JCSG - Joint Center for Structural Genomics;
MCSG - Midwest Center for Structural Genomics;
NESG - Northeast Structural Genomics Consortium;
NMHRCM - New Methods for High-Resolution Comparative Modeling;
NYSGXRC - New York SGX Research Center for Structural Genomics;
JCMM - Joint Center for Molecular Modeling;
ModBase and ModPipe - UCSF University of California, San Francisco; and
SWISS-MODEL Repository and SWISS-MODEL Workspace from SIB Swiss Institute of Bioinformatics & Biozentrum University of Basel.
System Requirements
Contact manufacturer.
Manufacturer
- PMP is developed by the Computational Structural Biology Group
- at the Swiss Institute of Bioinformatics (SIB) and the
- Biozentrum of the University of Basel
- Basel, Switzerland
Manufacturer Web Site Protein Model Portal (PMP)
Price Contact manufacturer.
G6G Abstract Number 20724
G6G Manufacturer Number 104294