LON-CAPA Protein Pattern and Domain Databases

Welcome to the GenomeWeb
Protein Pattern and Domain Databases

Search for:

These are a collection of protein pattern and domain database sites.

PROSITE
PrositeScan - search the PROSITE database with your sequence
ProfileScan - Search the profiles-entries in PROSITE with your sequence
PatternFind - search a protein database with a pattern
PRINTS
Pfam
ProDom
Blocks
SBASE
MOTIF - Search for protein sequence motifs
PSITE - Search for of prosite patterns with statistical estimation
ProClass
Clusters of Orthologous Groups (COGs)
MODULES in Proteins
SMART - Simple Modular Architecture Research Tool
3Dee - Database of Protein Domain Definitions

Detailed information on the above options

PROSITE
PROSITE is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs.

PrositeScan - search the PROSITE database with your sequence
This allows you to search one or more sequences against the current release of Amos Bairochs PROSITE database.

ProfileScan - Search the profiles-entries in PROSITE with your sequence
This uses the pfscan program to search a single sequence against all profile entries in the current release of PROSITE. The PROSITE collection of protein sequence motifs contains a large number of patterns and currently only a few profiles. The particular strength of profiles is that they can be used to describe very divergent protein motifs.

PatternFind - search a protein database with a pattern
This takes a user-defined pattern (PROSITE-format or regular expression) and searches a protein database. It offers several useful output options.

PRINTS
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of OWL. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs: the database thus provides a useful adjunct to PROSITE.

Pfam
Pfam is a high-quality comprehensive collection of protein domain families.

HMM Search - Compare your query sequence to all Pfam HMMs.
Browse families - List of Pfam families and inspect annotation and alignments.
Browse Swissprot - Get the Pfam organisation of any Swissprot entry and keywo

rd search.

ProDom
PRODOM is a comprehensive collection of protein families. It was constructed by clustering all complete protein sequences in Swiss-prot by the clustering algorithm Domainer (Sonnhammer and Kahn, 1994). The novelty of ProDom is that the modular arrangement of proteins have been taken into account and whenever domain boundaries were detected the sequences were cut to produce consistent families of domains.

Blocks
Blocks is operated by the Fred Hutchinson Cancer Research Center. An aid to detection and verification of protein sequence homolgies, Blocks compares a protein or DNA sequence to a database of protein blocks. Blocks are short multiply aligned sequences corresponding to the most highly conserved regions of proteins. The rationale behind searching a database of blocks is that information from multiply aligned sequences is present in a concatonated form, reducing background and increasing sensitivity to distant relationships.

SBASE
SBASE is a database of annotated protein domains. SBASE is searchable by subfields, cross-referenced to Swiss-Prot, PROSITE and EMBL, MEDLINE, MEDLARS, OMIM, PRODOM, PRINTS and BLOCKS.

There is an interface to a Blast mailserver.

MOTIF - Search for protein sequence motifs
Search for protein sequence motifs in PROSITE PATTERN, PROSITE PROFILE, BLOCKS, ProDom, PRINT, User defined profile.

PSITE - Search for of prosite patterns with statistical estimation
Search for of prosite patterns with statistical estimation.

ProClass
The ProClass database is a non-redundant protein database organized according to family relationships as defined collectively by ProSite patterns and PIR superfamilies. The ProClass database can facilitate protein family information retrieval, unveil domain and family relationships, and classify multi-domained proteins, by combining global and motif similarities into a single family organization scheme.

Clusters of Orthologous Groups (COGs)
Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in 7 complete genomes, representing 5 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

MODULES in Proteins
The module pages contain information and research tools on mobile protein domains.

SMART, a simple modular architecture research tool
Alerting system for extracellular modules
1D and 3D Cartoons of modular extracellular proteins
1D and 3D Cartoons of modular intracellular proteins

SMART - Simple Modular Architecture Research Tool
This does a search with your protein sequence against a database of domain profiles and displays a nice diagram of the domains together with low complexity regions, transmembrane regions etc.

You can then optionally do a BLAST search of the regions of your sequence which did not match a known domain.

3Dee - Database of Protein Domain Definitions
This database contains definitions of structural domains for all protein chains in the Brookhaven Protein Databank (PDB) that have 20 or more residues and are not theoretical models. The domains have been clustered on sequence similarity and structural similarity to form families. The families are stored as a hierarchy.

Updating does not require complete regeneration of the database and is almost completely automated so we expect to be able to complete updates every 1-2 months.

Any Comments, Questions? Support@hgmp.mrc.ac.uk