SWISS-MODEL Repository

Category Proteomics>Protein Structure/Modeling Systems/Tools

Abstract SWISS-MODEL Repository is a database of 3D protein structure models generated by the SWISS-MODEL homology-modeling pipeline (workflow).

The aim of the SWISS-MODEL Repository is to provide access to an up-to-date collection of annotated 3D protein models generated by automated homology modeling for all sequences in Swiss-Prot (protein knowledge base) and for relevant models organisms.

Regular updates ensure that target coverage is complete, that models are built using the most recent sequence and template structure databases, and that improvements in the underlying modeling pipeline are fully utilized.

As of September 2008, the database contains 3.4 million entries for 2.7 million different protein sequences from the UniProt database.

SWISS-MODEL Repository allows the users to assess the quality of the models in the database, search for alternative template structures, and to build models interactively via SWISS-MODEL Workspace (see below...).

Annotation of models with functional information and cross-linking with other databases such as the Protein Model Portal (see below...) of the Protein Structure Initiative (PSI) Structural Genomics Knowledge Base (KB) facilitates the navigation between protein sequence and structure resources.

Homology modeling --

The SWISS-MODEL Repository contains models that are calculated using a fully automated homology modeling pipeline. Homology modeling typically consists of the following steps:

1) Selection of a suitable template;

2) Alignment of Target sequence and Template structure;

3) Model building;

4) Energy minimization and/or Refinement; and

5) Model quality assessment.

This requires a set of specialized software tools as well as up-to-date sequence and structure databases. The SWISS-MODEL pipeline integrates these steps into a fully automated workflow by combining the required programs in a PERL (programming language) based framework.

Since template search and selection is a crucial step for successful model building, the manufacturers have implemented a hierarchical template search and selection protocol, which is sufficiently fast to be used for automated large-scale modeling, sensitive in detecting low homology targets, and accurate in correctly identifying close target structures.

In the first step, segments of the target sequence sharing close similarity to known protein structures are identified using a conservative Basic Local Alignment Search Tool (BLAST) search with restrictive parameters. This ensures that information about close sequence relationships is Not dispersed by the subsequent profile-based search strategies.

If regions of the target sequence remain uncovered, in the second step a search for suitable templates is performed against a library of Hidden Markov Models for the Swiss Model Template Library (SMTL) using HHSearch.

HHsearch - HHsearch is a software suite for detecting remote homologues of proteins and generating high quality alignments for homology modeling and function prediction.

Templates resulting from both steps are ranked according to their E-value, sequence identity to the target, resolution and structure quality. From this ranked list, the best templates are progressively selected to maximize the length of the modeled region of the protein.

New templates are added if they significantly increase the coverage of the target sequence (spanning at least 25 consecutive residues), or new information is gained (e.g. templates spanning several domains help to infer relative domain orientation).

For each selected target-template alignment, 3D models are calculated using ProModII (protein modeling software) and energy is minimized using the Gromos force field. The quality of the resulting model is assessed using the ANOLEA mean force potential.

Depending on the size of the protein and the evolutionary distance to the template, model building can be relatively time-consuming. Therefore, comprehensive databases of pre-computed models have been developed in order to be able to cross-link real-time model information with other biological data resources, such as sequence databases or genome browsers.

Model database --

The SWISS-MODEL Repository is a relational database of models generated by the automated SWISS-MODEL pipeline based on protein sequences from the Universal Protein Resource (UniProt) database. Within the database, model target sequences are uniquely identified by their MD5 cryptographic hash of the full length raw amino acid sequence.

This mechanism allows the redundancy in protein sequence databases entries to be reduced, and facilitates cross-referencing with databases using different accession code systems.

Mapping between UniProt and various database accession code systems to the manufacturers MD5 based reference system is derived from the iProClass database (iProClass is a central data infrastructure that supports both data integration and functional annotation of proteins).

The SWISS-MODEL Repository release contains over 3.45 million models for over 2.72 million unique sequences, built on over 26,185 different template structures (over 34,540 chains), covering over 48.8% of the entries from UniProt (14.0), and more specifically over 65.4% of the unique sequences of Swiss-Prot, the manually annotated section of the UniProt knowledge base.

Graphical user web interface --

The web interface provides the main entry point to the SWISS-MODEL Repository. Models for specific proteins can be queried using different database accession codes (e.g. UniProt AC and ID, GenBank, IPI, and RefSeq) or directly with the protein amino acid sequence (or fragments thereof, e.g. for a specific domain).

Functional and domain annotation for the target protein is retrieved dynamically in real time using web service protocols to ensure that the annotation information is up-to-date. UniProt annotation of the target protein is retrieved via Representational State Transfer (REST) queries.

Structural domains in the target protein are annotated by PFAM domain assignment, which is retrieved dynamically by querying the InterPro database using the Distributed Annotation System (DAS) protocol.

The MD5-based reference frame for target proteins allows you to update the database accession mappings in between modeling release cycles.

Finally, for each model, a summary page provides information on the modeling process (template selection and alignment), model quality assessment by ANOLEA (ANOLEA is a server/stand-alone program to assess the quality of a three-dimensional protein structure. It uses a statistical potential at the atomic level and gives an energy profile as output.

Also, the user can choose to have a molecular graphical output representation of the energy profile) and Gromos (GROMOSTM is a general-purpose molecular dynamics computer simulation package for the study of biomolecular systems), and in page visualization of the structure using the AstexViewer plug-in (AstexViewer is a Java molecular graphics program that can be used for visualization in many aspects of structure-based drug design).

Integration with SWISS-MODEL Workspace --

The SWISS-MODEL Repository is a large-scale database of pre-computed 3D models. Often however, one may be interested in performing additional analysis either on the models themselves, or on the underlying protein target sequence.

The manufacturers have therefore implemented a tight link between the entries of the SWISS-MODEL Repository and the corresponding modules in the SWISS-MODEL Workspace, which provides an interactive web-based, personalized working environment.

The Protein Model Portal (PMP) --

One of the major bottlenecks in the use of protein models is that, unlike for experimental structures, modeling resources are heterogeneous and distributed over numerous servers. However, it is often beneficial for the user to directly compare the results of different modeling methods for the same protein.

The manufacturers have therefore developed the Protein Model Portal (PMP) as a component of the PSI structural genomics knowledge base.

This resource provides access to all structures in the PDB, functional annotations, homology models, structural genomics protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets.

The PMP currently provides access to several million pre-built models from four (4) PSI centers, ModBase and the SWISS-MODEL Repository.

ModBase - ModBase is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure.

The models are calculated by ModPipe, an automated modeling pipeline (workflow) that relies primarily on the MODELLER package for fold assignment, sequence-structure alignment, model building and model assessment.

ModBase currently (as of September 2010) contains 10,355,444 reliable models for domains in 2,421,920 unique protein sequences.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site SWISS-MODEL Repository

Price Contact manufacturer.

G6G Abstract Number 20722

G6G Manufacturer Number 104292