LON-CAPA Protein 3D Structure Analysis

Welcome to the GenomeWeb
Protein 3D Structure Analysis

Search for:

These are a collection of protein 3D structure analysis and database sites.

A Guide to Structure Prediction
PDB Protein Data Bank
Structural Classification of Proteins (SCOP)
CATH, The CATH Protein Structure Classification
CPE Protein Structure Prediction Pages
UCLA-DOE Structure Prediction Server
Database of Comparative Protein Structure Models (ModBase)
Protein Sequence Analysis (PSA)
Predicting Protein-3D structures based on homologous sequence search
Sacch3D - Structural Information for Yeast Proteins
The Protein Structure Database (PSdb)
RELIBASE
Image Library of Biological Macromolecules
A Library of Proteins Family Cores (LPFC)
Protein Topology Home Page
VAST - Vector Alignment Search Tool
DALI - compare protein structures in 3D
GETAREA - Predicted Solvent Accessible Surface Areas

Detailed information on the above options

A Guide to Structure Prediction
This is a summary of a general approach to the problem of structure prediction.

The assumption is that you have a sequence of a protein that you want to know more about. Before you start, remember that this approach will not always provide satisfying or complete answers. However, it is increasingly rare that the techniques described here fail to shed any light on a protein sequence. Just a little time to analyse a sequence can possibly save time and money by aiding experimental design.

PDB Protein Data Bank
The PDB is a database of crystallographic protein structures, maintained at the Brookhaven National Laboratory, Upton, NY It contains atomic coordinates for the 3-dimensional structure of biomolecules obtained using x-ray, electron or neutron diffraction, nuclear magnetic resonance or molecular modelling.

Structural Classification of Proteins (SCOP)
SCOP aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provide s a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification.

CATH, The CATH Protein Structure Classification
CATH is a hierarchical classification of protein domain structures which clusters proteins at four major levels, class(C), architecture(A), topology(T) and homologous superfamily (H). Hyperlinks are provided to several secondary sources such as PDB summary files and OWL.

CPE Protein Structure Prediction Pages
This holds information related to research on Protein Structure Prediction, i.e. attempts to solve "The folding problem" and particularly information about recent and forthcoming structure prediction competitions, meetings and network services.

UCLA-DOE Structure Prediction Server
The UCLA-DOE Fold-Recognition server is a project aimed to help in the computational analysis and prediction of structure from amino acid sequences. It provides easy access to the results from various programs. These include various methods developed in this lab as well as other methods from around the world. Rather than a set of programs and www links, it is a comprehensive package providing users with computation time, storage and collection of data, and organization of the results for easy analysis.

Database of Comparative Protein Structure Models (ModBase)
ModBase is a queryable database of many annotated comparative protein structure models. The models consist of coordinates for all non-hydrogen atoms in the modeled part of a protein. They are derived by an automated modeling pipeline relying mainly on the program MODELLER.

The database also includes fold assignments and alignments on which the models were based. In addition, special care is taken to assess the overall quality of the models and their accuracy at the residue level.

Protein Sequence Analysis (PSA)
The PSA server analyzes your protein sequence and determines which of 209 sequence-structure models, spanning 15 different protein folding classes, are the most probable explanations of your sequence. The analysis results (PostScript files depicting the folding-class probabilities and secondary-structure probabilities) are returned to you by e-mail.

The PSA e-mail server is particularly suited for analyzing novel sequences that are unlike any others in the sequence databanks.

Predicting Protein-3D structures based on homologous sequence search
This server is dedicated to find homologous PDB sequences to a given query sequence. It uses a version of NRDB that includes all the PDB entries (excluding the BRK_MOD sequences and sequences only containing 'X's). Sequences are compared to this database with PSI-BLAST using an e-value cutoff of 0.001, and a maximum of five iterations.

Coiled coil, transmembrane regions and low complexity regions are automatically filtered out from the query sequence, using COILS, TMpred and SEG, respectively. A graphical overview is given for the matched regions between the query sequence and found hit sequences.

The accuracy of the prediction was estimated to be above 98%, based on the results from a test set of 685 PDB sequences extracted with PDB-select that have less than 25% identity to each other.

Sacch3D - Structural Information for Yeast Proteins
Sacch3D is a facility offered by the Saccharomyces Genome Database to present structural information about yeast proteins. Here you can find text, graphics, and interactive 3D images to help you explore the structure and function of yeast proteins.

The Protein Structure Database (PSdb)
The Protein Structure Database (PSdb) relates secondary (e.g. Helix, Sheet, Turn, Random Coil), supersecondary (e.g., helix-helix interactions), and tertiary information (e.g. Solvent accessibility, internal relative distances, and ligand interactions) to the primary structure. The data for each protein is supplied on a residue by residue basis and encoded in a series of flat ASCII files.

Relationships between the various levels of structure (primary, secondary, tertiary) can be investigated visually using PSdbView, a graphical tool provided to view the information within the PSdb. This tool allows for side by side comparison of residue based data and includes a variety of standard mechanisms for visualizing protein data including Ramachandran plots, C(alpha)-C(alpha) distance plots, and differences in solvent accessible molecular surface area graphs (e.g., differences in the exposed surface with and without including either the ligands, metalions or buried waters in the computations).

RELIBASE
RELIBase is an archive for structural data about receptor/ligand complexes.

The main purpose of RELIBase is to provide an selective and efficient access to the receptor/ligand complexes currently deposited in the Brookhaven Protein Databank (PDB) and to make the enormous wealth of information contained in the receptor/ligand structures available for structure based drug design studies.

The www public accss relibase data base and search tools can be used to input a sub-structure search object either by text, a smiles string, or by an interactive java based molecule editor, and the system can perform the following functions:

Fast identification of all ligands which contain a specific functional group
Identification of receptor/ligand complexes with specific spatial interactions
Analysis of interaction preferences of functional groups.
Mutations.
Protein modifications.
Cross links to several protein sequence databases and the Beilstein Database of small molecules.

Image Library of Biological Macromolecules
An access to graphical structural information on biological macromolecules. The Image Library contains structural images of RNA, DNA and proteins deposited at PDB and NDB.

A Library of Proteins Family Cores (LPFC)
Core structures computed from structural alignments of protein families

Protein Topology Home Page
Users supply a target protein domain which can be compared with with a representative set of 3000 domains, or the entire PDB (15300 domains, as of April 1998). You can upload a file containing the description of your target protein, either as a PDB format file, or as a Tops file. The system will email you the domains in the representative set, ordered by distance from your target protein, annotated with their CATH codes (where known) and also with the distance measure. A larger distance measure indicates a remoter topological relationship.

Browse the Atlas of topology cartoons.
Generate your own topology cartoons.
Search databases for topological patterns.
Topology-based structure comparison
Explain the meaning of topology cartoons.
Software for generating, editing and viewing of cartoons
Articles about TOPS

VAST - Vector Alignment Search Tool
VAST Search is a service offered by the NCBI Structure Group that allows to search for structure neighbors starting with 3D-coordinates specified by the user. This service is meant to be used with newly determined protein structures, which are not part of MMDB yet. Structure neighbors for proteins in MMDB can be looked up from MMDB's structure summary pages!

Protein structure neighbors in Entrez are determined by direct comparison of 3-dimensional protein structures with the VAST algorithm. Each of the more than 15,000 domains in MMDB is compared to every other one. From the MMDB structure summary pages, retrieved by Entrez, structure neighbors are available with the click of a button.

DALI - compare protein structures in 3D
With a rapidly growing pool of known tertiary structures, the importance of protein structure comparison parallels that of sequence alignment. We have developed a novel algorithm (DALI) for optimal pairwise alignment of protein structures. The three-dimensional coordinates of each protein are used to calculate residue-residue (Calpha-Calpha) distance matrices. The distance matrices are first decomposed into elementary contact patterns, e.g., hexapeptide-hexapeptide submatrices. Then, similar contact patterns in the two matrices are paired and combined into larger consistent sets of pairs. A Monte Carlo procedure is used to optimize a similarity score defined in terms of equivalent intramolecular distances. Several alignments are optimized in parallel, leading to simultaneous detection of the best, second-best and so on solutions. The method allows sequence gaps of any length, reversal of chain direction, and free topological connectivity of aligned segments. Sequential connectivity can be imposed as an option. The method is fully automatic and identifies structural resemblances and common structural cores accurately and sensitively, even in the presence of geometrical distortions. An all-against-all alignment of over 200 representative protein structures results in an objective classification of known 3D folds in agreement with visual classifications. Unexpected topological similarities of biological interest have been detected, e.g., between the bacterial toxin colicin A and globins, and between the eukaryotic POU-specific DNA-binding domain and the bacterial lambda repressor.

GETAREA - Predicted Solvent Accessible Surface Areas
This calculates the solvent accessible surface area of molecules. To calculate SASA of proteins, supply the name of the local file containing atomic coordinatesin PDB format. There is an otpion to calculate solvation energy and inclusion of molecules other than proteins.

Any Comments, Questions? Support@hgmp.mrc.ac.uk