I-TASSER

Category Proteomics>Protein Structure/Modeling Systems/Tools

Abstract The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm.

Starting from an amino acid sequence, I-TASSER first generates three-dimensional (3D) atomic models from multiple threading alignments and iterative structural assembly simulations.

The function of the protein is then inferred by structurally matching the 3D models with other known proteins.

The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology (GO) terms.

An estimate of accuracy of the predictions is provided based on the confidence score of the modeling.

I-TASSER (as ‘Zhang-Server’) was ranked as the No. 1 server for protein structure prediction in CASP7 and CASP8 experiments.

The server is in active development with the goal to provide the most accurate structural and functional predictions using state-of-the-art algorithms.

Note: The server is only for non-commercial use.

How does I-TASSER generate structure and functional predictions?

When users submit an amino acid sequence, the server first tries to retrieve template proteins of similar folds (or super-secondary structures) from the Protein Data Base (PDB) library by LOMETS, a locally installed meta-threading approach.

In the second step, the continuous fragments excised from the PDB templates are reassembled into full-length models by replica-exchange Monte Carlo simulations with the threading unaligned regions (mainly loops) built by ab initio modeling.

In cases where No appropriate template is identified by LOMETS, I-TASSER will build the whole structures by ab initio modeling.

The low free-energy states are identified by SPICKER through clustering the simulation decoys.

Structure-PICKER (SPICKER) - SPICKER is a clustering algorithm that can be used to identify the near-native models from a pool of protein structure decoys. SPICKER aims at selecting the best fold of the lowest free-energy by clustering structural decoys generated by I-TASSER or other protein structure assembly simulations.

In the third step, the fragment assembly simulation is performed again starting from the SPICKER cluster centroids, where the spatial restrains collected from both the LOMETS templates and the PDB structures by TM-align are used to guide the simulations.

TM-align - TM-align is a structural alignment program (algorithm) for comparing two proteins whose sequences can be different. The TM-align will first find the best equivalent residues of two proteins based on the structure similarity and then output a TM-score.

TM-score - TM-score is an algorithm to calculate the similarity of topologies of two (2) protein structures.

The purpose of the second iteration is to remove the steric clash as well as to refine the global topology of the cluster centroids. The decoys generated in the second simulations are then clustered and the lowest energy structures are selected.

The final full-atomic models are obtained by REMO (see below...) which builds the atomic details from the selected I-TASSER decoys through the optimization of the hydrogen-bonding network.

Predicting the biological function of the protein --

For predicting the biological function of the protein, the I-TASSER server matches the predicted 3D models to the proteins in three (3) independent libraries which consist of proteins of known enzyme classification (EC) number, gene ontology (GO) vocabulary, and ligand-binding sites.

The final results of function predictions are deduced from the consensus of top structural matches with the function scores calculated based on the confidence score of the I-TASSER structural models, the structural similarity between model and templates as evaluated by TM-score, and the sequence identity in the structurally aligned regions;

[A similar approach to structure-based function annotation was proposed by Brylinski and Skolnick (PNAS 2008. 205:129) who tried to match the target structures on the threading templates. Here the I-TASSER server matches the target models on all template proteins in the libraries].

REMO is a Protocol to Refine Full Atomic Protein Models from C-alpha Traces by Optimizing Hydrogen-Bonding Networks --

REMO generates full atomic protein models by optimizing the hydrogen-bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures.

The algorithm has been benchmarked on 230 non-homologous proteins with reduced structure decoys generated by I-TASSER simulations.

The results showed that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen-bonding network of the final models is dramatically improved during the procedure.

The REMO algorithm has been exploited in a CASP8 experiment which demonstrated significant improvements of the I-TASSER models in both atomic-level structural refinement and hydrogen-bonding network construction.

What are the outputs of the I-TASSER server if you submit a sequence?

The outputs of the I-TASSER server include:

1) Up to five (5) full-length atomic models (ranked based on cluster density);

2) Estimated accuracy of the predicted models (including a confidence score of all models, and predicted TM-score and root-mean-square deviation (RMSD) for the first model);

3) GIF images of the predicted models;

4) Predicted secondary structures;

5) Top 10 threading alignment from LOMETS;

6) Top 10 proteins in the PDB which are structurally closest to the predicted models;

7) Predicted EC numbers and the confidence score;

8) Predicted GO terms and the confidence score;

9) Predicted ligand-binding sites and the confidence score; and

10) An image of the predicted ligand-binding sites.

How long does it take for I-TASSER to generate the predictions for your protein?

It usually takes the server hours to 1~2 days from submitting a sequence to receiving the prediction results.

But if too many sequences are accumulated in the queue, the procedure may take a much longer time. The time also depends on the protein size and a smaller protein takes a shorter time than a larger protein.

Currently, the major time consuming part in the I-TASSER protocol is the structural refinement assembly simulations.

For those users who want a quicker response or those who do Not need a refined models, the manufacturer’s recommend them to use their LOMETS (meta-server) or MUSTER (single-server fold-recognition).

Because these two (2) servers do Not attempt to refine the threading models, the response time is faster than the I-TASSER server.

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site I-TASSER

Price Contact manufacturer.

G6G Abstract Number 20732

G6G Manufacturer Number 104289