|
A Guide to Structure Prediction
PDB Protein Data Bank
Structural Classification of Proteins (SCOP)
CATH, The CATH Protein Structure Classification
CPE Protein Structure Prediction Pages
UCLA-DOE Structure Prediction Server
Database of Comparative Protein Structure Models (ModBase)
Protein Sequence Analysis (PSA)
Predicting Protein-3D structures based on homologous sequence search
Sacch3D - Structural Information for Yeast Proteins
The Protein Structure Database (PSdb)
RELIBASE
Image Library of Biological Macromolecules
A Library of Proteins Family Cores (LPFC)
Protein Topology Home Page
VAST - Vector Alignment Search Tool
DALI - compare protein structures in 3D
GETAREA - Predicted Solvent Accessible Surface Areas
A Guide to Structure Prediction
This is a summary of a general approach to the problem of structure
prediction.
The assumption is that you have a sequence of a protein that you want to know more about. Before you start, remember that this approach will not always provide satisfying or complete answers. However, it is increasingly rare that the techniques described here fail to shed any light on a protein sequence. Just a little time to analyse a sequence can possibly save time and money by aiding experimental design.
PDB Protein Data Bank
The PDB is a database of crystallographic protein structures,
maintained at the Brookhaven National Laboratory, Upton, NY
It contains atomic coordinates for the 3-dimensional
structure of biomolecules obtained using x-ray, electron or
neutron diffraction, nuclear magnetic resonance or molecular
modelling.
Structural Classification of Proteins (SCOP)
SCOP aims to provide
a detailed and comprehensive description of the structural and evolutionary
relationships between all proteins whose structure is known. As such, it provide
s a
broad survey of all known protein folds, detailed information about the close
relatives of any particular protein, and a framework for future research and
classification.
CATH, The CATH Protein Structure Classification
CATH is a hierarchical classification of protein domain
structures which clusters proteins at four major levels,
class(C), architecture(A), topology(T) and homologous
superfamily (H). Hyperlinks are provided to several
secondary sources such as PDB summary files and OWL.
CPE Protein Structure Prediction Pages
This holds information related to research on Protein Structure Prediction, i.e.
attempts to solve "The folding problem" and particularly information about recent and
forthcoming structure prediction competitions, meetings and network services.
UCLA-DOE Structure Prediction Server
The UCLA-DOE Fold-Recognition server is a project aimed to help in
the computational analysis and prediction of structure from amino acid
sequences. It provides easy access to the results from various
programs. These include various methods developed in this lab as well
as other methods from around the world. Rather than a set of programs
and www links, it is a comprehensive package providing users with
computation time, storage and collection of data, and organization of
the results for easy analysis.
Database of Comparative Protein Structure Models (ModBase)
ModBase is a queryable database of many annotated comparative protein
structure models. The models consist of coordinates for all
non-hydrogen atoms in the modeled part of a protein. They are derived
by an automated modeling pipeline relying mainly on the program
MODELLER.
The database also includes fold assignments and alignments on which the models were based. In addition, special care is taken to assess the overall quality of the models and their accuracy at the residue level.
Protein Sequence Analysis (PSA)
The PSA server analyzes your protein sequence and determines which
of 209 sequence-structure models, spanning 15 different protein folding
classes, are the most probable explanations of your sequence. The
analysis results (PostScript files depicting the folding-class
probabilities and secondary-structure probabilities) are returned to
you by e-mail.
The PSA e-mail server is particularly suited for analyzing novel sequences that are unlike any others in the sequence databanks.
Predicting Protein-3D structures based on homologous sequence search
This server is dedicated to find homologous PDB sequences to a given
query sequence. It uses a version of NRDB that includes all the PDB
entries (excluding the BRK_MOD sequences and sequences only containing
'X's). Sequences are compared to this database with PSI-BLAST using an
e-value cutoff of 0.001, and a maximum of five iterations.
Coiled coil, transmembrane regions and low complexity regions are automatically filtered out from the query sequence, using COILS, TMpred and SEG, respectively. A graphical overview is given for the matched regions between the query sequence and found hit sequences.
The accuracy of the prediction was estimated to be above 98%, based on the results from a test set of 685 PDB sequences extracted with PDB-select that have less than 25% identity to each other.
Sacch3D - Structural Information for Yeast Proteins
Sacch3D is a facility offered by the Saccharomyces Genome Database
to present structural information about yeast proteins. Here you can
find text, graphics, and interactive 3D images to help you explore the
structure and function of yeast proteins.
The Protein Structure Database (PSdb)
The Protein Structure Database (PSdb) relates secondary (e.g. Helix, Sheet, Turn, Random Coil), supersecondary (e.g., helix-helix
interactions), and tertiary information (e.g. Solvent accessibility, internal relative distances, and
ligand interactions) to the primary structure. The data for each protein is supplied on a residue by
residue basis and encoded in a series of flat ASCII files.
Relationships between the various levels of structure (primary, secondary, tertiary) can be investigated visually using PSdbView, a graphical tool provided to view the information within the PSdb. This tool allows for side by side comparison of residue based data and includes a variety of standard mechanisms for visualizing protein data including Ramachandran plots, C(alpha)-C(alpha) distance plots, and differences in solvent accessible molecular surface area graphs (e.g., differences in the exposed surface with and without including either the ligands, metalions or buried waters in the computations).
RELIBASE
RELIBase is an archive for structural data about receptor/ligand
complexes.
The main purpose of RELIBase is to provide an selective and efficient access to the receptor/ligand complexes currently deposited in the Brookhaven Protein Databank (PDB) and to make the enormous wealth of information contained in the receptor/ligand structures available for structure based drug design studies.
The www public accss relibase data base and search tools can be used to input a sub-structure search object either by text, a smiles string, or by an interactive java based molecule editor, and the system can perform the following functions:
Image Library of Biological Macromolecules
An access to graphical structural information on biological
macromolecules. The Image Library contains structural images
of RNA, DNA and proteins deposited at PDB and NDB.
A Library of Proteins Family Cores (LPFC)
Core structures computed from structural alignments of protein
families
Protein Topology Home Page
Users supply a target protein domain which can be compared with with a
representative set of 3000 domains, or the entire PDB (15300 domains, as
of April 1998). You can upload a file containing the description of your
target protein, either as a PDB format file, or as a Tops file. The system
will email you the domains in the representative set, ordered by distance
from your target protein, annotated with their CATH codes (where known)
and also with the distance measure. A larger distance measure indicates a
remoter topological relationship.
VAST - Vector Alignment Search Tool
VAST Search is a service offered by the NCBI Structure Group that
allows to search for structure neighbors starting with
3D-coordinates specified by the user. This service is meant to be used
with newly determined protein structures, which are not part of
MMDB yet. Structure neighbors for proteins in MMDB can be looked up
from MMDB's structure summary pages!
Protein structure neighbors in Entrez are determined by direct comparison of 3-dimensional protein structures with the VAST algorithm. Each of the more than 15,000 domains in MMDB is compared to every other one. From the MMDB structure summary pages, retrieved by Entrez, structure neighbors are available with the click of a button.
DALI - compare protein structures in 3D
With a rapidly growing pool of known tertiary structures, the importance
of protein structure comparison parallels that of sequence alignment.
We have developed a novel algorithm (DALI) for optimal pairwise
alignment of protein structures. The three-dimensional coordinates of
each protein are used to calculate residue-residue (Calpha-Calpha)
distance matrices. The distance matrices are first decomposed into
elementary contact patterns, e.g., hexapeptide-hexapeptide submatrices.
Then, similar contact patterns in the two matrices are paired and
combined into larger consistent sets of pairs. A Monte Carlo procedure
is used to optimize a similarity score defined in terms of equivalent
intramolecular distances. Several alignments are optimized in parallel,
leading to simultaneous detection of the best, second-best and so on
solutions. The method allows sequence gaps of any length, reversal of
chain direction, and free topological connectivity of aligned segments.
Sequential connectivity can be imposed as an option. The method is
fully automatic and identifies structural resemblances and common
structural cores accurately and sensitively, even in the presence of
geometrical distortions. An all-against-all alignment of over 200
representative protein structures results in an objective classification
of known 3D folds in agreement with visual classifications. Unexpected
topological similarities of biological interest have been detected,
e.g., between the bacterial toxin colicin A and globins, and between the
eukaryotic POU-specific DNA-binding domain and the bacterial lambda
repressor.
GETAREA - Predicted Solvent Accessible Surface Areas
This calculates the solvent accessible surface area of molecules. To
calculate SASA of proteins, supply the name of the local file containing
atomic coordinatesin PDB format. There is an otpion to calculate
solvation energy and inclusion of molecules other than proteins.
Any Comments, Questions? Support@hgmp.mrc.ac.uk