|
PROSITE
PrositeScan - search the PROSITE database with your sequence
ProfileScan - Search the profiles-entries in PROSITE with your sequence
PatternFind - search a protein database with a pattern
PRINTS
Pfam
ProDom
Blocks
SBASE
MOTIF - Search for protein sequence motifs
PSITE - Search for of prosite patterns with statistical estimation
ProClass
Clusters of Orthologous Groups (COGs)
MODULES in Proteins
SMART - Simple Modular Architecture Research Tool
3Dee - Database of Protein Domain Definitions
PROSITE
PROSITE is a method of determining what is the function of
uncharacterized proteins translated from genomic or cDNA sequences. It
consists of a database of biologically significant sites, patterns and
profiles that help to reliably identify to which known family of protein
(if any) a new sequence belongs.
PrositeScan - search the PROSITE database with your sequence
This allows you to search one or more sequences against the current
release of Amos Bairochs PROSITE database.
ProfileScan - Search the profiles-entries in PROSITE with your sequence
This uses the pfscan program to search a single sequence against all
profile entries in the current release of PROSITE. The PROSITE
collection of protein sequence motifs contains a large number of
patterns and currently only a few profiles. The particular strength of
profiles is that they can be used to describe very divergent protein
motifs.
PatternFind - search a protein database with a pattern
This takes a user-defined pattern (PROSITE-format or regular expression)
and searches a protein database. It offers several useful output
options.
PRINTS
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to
characterise a protein family; its diagnostic power is refined by iterative scanning of OWL. Usually the
motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space.
Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single
motifs: the database thus provides a useful adjunct to PROSITE.
Pfam
Pfam is a high-quality comprehensive collection of protein domain families.
ProDom
PRODOM is a comprehensive collection of protein families. It was
constructed by clustering all complete protein sequences in Swiss-prot by the
clustering algorithm Domainer (Sonnhammer and Kahn, 1994).
The novelty of ProDom is that the
modular arrangement of proteins have been taken into account and whenever
domain boundaries were detected the sequences were cut to produce consistent
families of domains.
Blocks
Blocks is operated by the Fred Hutchinson Cancer Research
Center. An aid to detection and verification of protein
sequence homolgies, Blocks compares a protein or DNA sequence
to a database of protein blocks. Blocks are short multiply
aligned sequences corresponding to the most highly conserved
regions of proteins. The rationale behind searching a
database of blocks is that information from multiply aligned
sequences is present in a concatonated form, reducing
background and increasing sensitivity to distant
relationships.
SBASE
SBASE is a database of annotated protein domains. SBASE is searchable
by subfields, cross-referenced to Swiss-Prot, PROSITE and EMBL, MEDLINE,
MEDLARS, OMIM, PRODOM, PRINTS and BLOCKS.
There is an interface to a Blast mailserver.
MOTIF - Search for protein sequence motifs
Search for protein sequence motifs in PROSITE PATTERN, PROSITE PROFILE,
BLOCKS, ProDom, PRINT, User defined profile.
PSITE - Search for of prosite patterns with statistical estimation
Search for of prosite patterns with statistical estimation.
ProClass
The ProClass database is a non-redundant protein database organized according
to family relationships as defined collectively by ProSite patterns and PIR
superfamilies. The ProClass database can facilitate protein family information
retrieval, unveil domain and family relationships, and classify multi-domained
proteins, by combining global and motif similarities into a single family
organization scheme.
Clusters of Orthologous Groups (COGs)
Clusters of Orthologous Groups (COGs) were delineated by comparing protein sequences encoded in 7
complete genomes, representing 5 major phylogenetic lineages. Each COG consists of individual proteins
or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.
MODULES in Proteins
The module pages contain information and research tools on mobile protein domains.
SMART - Simple Modular Architecture Research Tool
This does a search with your protein sequence against a database of domain
profiles and displays a nice diagram of the domains together with low complexity regions, transmembrane regions etc.
You can then optionally do a BLAST search of the regions of your sequence which did not match a known domain.
3Dee - Database of Protein Domain Definitions
This database contains definitions of structural domains for all protein
chains in the Brookhaven Protein Databank (PDB) that have 20 or more
residues and are not theoretical models. The domains have been
clustered on sequence similarity and structural similarity to form
families. The families are stored as a hierarchy.
Updating does not require complete regeneration of the database and is almost completely automated so we expect to be able to complete updates every 1-2 months.
Any Comments, Questions? Support@hgmp.mrc.ac.uk