LON-CAPA Protein Family Databases

Welcome to the GenomeWeb
Protein Family Databases

Search for:

These are a collection of protein family information sites.

The AAA Protein Superfamily
Aldehyde dehydrogenase (ALDH)
Amyloid Precursor Protein

The CBS domain web page
Cholinesterases
The Chaperonin Home Page
Chromatin Structure & Function Page
Chromo shadow domain
CySPID -- The Cytoskeletal Protein Interactions Database
Cytochrome P450 family
Cytokines
Dictionary of Cytokines
Cytokines
The International Cytokine Society
Cytokine Family cDNA Database (dbCFC)
Cytokines Online Pathfinder Encyclopaedia (COPE)

The DExH/D protein family database
DNA repair

EF-Hand Calcium-Binding Proteins
Eph Receptor Tyrosine Kinases & Their Ligands
ESTHER - ESTerases and alpha/beta Hydrolase Enzymes and Relatives

Gene Family Database
G protein-coupled receptor database (GCRDb)
Globin Gene Server
Glucoamylases
Glycosyltransferases
GnRH Family

Histone, Histone Sequence Database
Homeobox
HOX DataBase
HOX Pro database
HSP100 Alignments

The Integrin Page
InBase, The New England Biolabs Intein Database
Ion Channel Network
Ion Channel Resources
Insulin gene family
Inteins - protein introns

The Kinesin Home Page
Kinesins

Labial Homeobox

MADS-box Gene
MEROPS - The Peptidase Database
Metallothionein
MutS Protein Family

The Nuclear Receptor Resource

Olfactory Receptor DataBase

PAX family
Pentapeptide repeat
PHD-finger
PROMISE - The Prosthetic groups and Metal Ions in Protein Active Sites Database
Protein Kinase Database Project

The RecA Protein Family Web Site
The Ribonuclease P Database

SAND domain
SANT domain
SNF2 family of proteins

The Thyroid Hormone Receptor Resource
Topoisomerases
Transposases

Wnt and Frizzled gene Homepage

Detailed information on the above options

The AAA Protein Superfamily
The AAA (for ATPases Associated with various cellular Activities) protein superfamily is characterized by a highly conserved module of approximately 230 amino acid residues including an ATP binding consensus, present in one or two copies in the AAA proteins. AAA proteins are found in all organisms (Archaea, Eubacteria, Eukaryota: Protista, Fungi, Plants, Animals) and are essential for, e.g., cell cycle functions, vesicular transport, mitochondrial functions, peroxisome assembly, and proteolysis.

Aldehyde dehydrogenase (ALDH)
Aldehyde dehydrogenase (ALDH), often in tandem with Alchohol dehydrogenase, acts in detoxifying a wide variety of organic compounds, toxins and pollutants. Defects in ALDH leads to Sjogren-Larsson syndrome in humans.

Amyloid Precursor Protein
The amyloidogenic glycoprotein family consists of the amyloid precursor protein (APP) and two other APP-like proteins of unknown function, APPLP1 and APPLP2. Two of the major protein products of the APP gene, arising from alternative splicing (APP751 and APP770), contain a domain that is homologous to the Kunitz class of serine protease inhibitors.

The CBS domain web page
The CBS domain is widespread: found in all species. The CBS domain is named after Cystathionine Beta Synthase. All CBS domains identified to date occur in the cytoplasm or nucleus. The domain is about 60 residues long, and usually found in two or four copies per protein.

Cholinesterases
Information on cholinesterases.

The Chaperonin Home Page
The Chaperonin Home Page is designed to be a repository for information about the important class of heat shock proteins known as chaperonins. It includes the GroEL/GroES mutation databases.

Chromatin Structure & Function Page
This site is intended to disseminate information regarding the rapidly evolving and highly exciting field of chromatin structure, proteins that modify chromatin structure, and the effects that these modifications have on cell function.

other chromatin sites
other Chromatin-Associated Proteins sites
papers & meetings

Chromo shadow domain
The chromo domain was originally identified as a protein sequence motif common to the Drosophila chromatin proteins, Polycomb (Pc) and Heterochromatin protein 1 (HP1).

CySPID -- The Cytoskeletal Protein Interactions Database
CySPID is a database that holds properties and relationships among cytoskeletal proteins and other entities in the database, such as protein classes, genes, or macromolecular complexes.

Cytochrome P450 family
The family referred here to as FAD-dependent pyridine nucleotide reductases (FADPNR) includes FAD flavoproteins belonging to the family of pyridine nucleotide-disulphide oxidoreductases (glutathione reductase, trypanothione reductase, lipoamide dehydrogenase, mercuric reductase, thioredoxin reductase, alkyl hydroperoxide reductase), iron-sulphur protein reductases involved in oxidative metabolism of a variety of hydrocarbons.

Cytokines
This provides information about cytokines and their receptors including topological, evolutionary and mechanistic relationships between the molecules, and illustrations of known three-dimensional structures.

Dictionary of Cytokines
A dictionary of cytokine alternative names, elated factors, signal transducers etc. Links to other Cytokine-related sites.

Cytokines
This is a useful site for anyone interested in cytokines, adhesion molecules, growth factors and related agents.

The International Cytokine Society

society affairs
newsletters
The Cytokine-Interferon Open Forum

Cytokine Family cDNA Database (dbCFC)
The Cytokine Family cDNA Database (dbCFC) is a collection of EST (Expressed Sequence Tag) records of cytokines deposited in the NCBI GenBank. It provides information about the identification of EST records to cytokine members and related data contained in other databases including GenBank, dbEST, GDB, Online Mendelian Inheritance in Man (OMIM), The Transgenic/Targeted Mutation Database (TBASE), Unique Human Gene Sequence Collection (UniGene), Anatomical Expression Database of Human Genes (BodyMap), Mouse Genome Database (MGD) and Human/Mouse Homology Relationships.

Cytokines Online Pathfinder Encyclopaedia (COPE)
COPE consists of 6000 WWW pages hypertexted with 43000 links and 14000 references and covers all aspects of Cytokine research.

The DExH/D protein family database
This is a database covering the putative RNA helicases of the DEAD, DEAH, and DExH proteins.

DExH/D proteins are essential in all aspects of the RNA metabolism in the cell: they play important roles in

transcription
pre-mRNA splicing
RNA export
RNA degradation
ribosome biogenesis
translation
mitochondrial RNA splicing
development
replication of many viruses

DNA repair
The maintenance of genetic stability is an essential task for all organisms, ensuring the proper function of cellular systems and the passage of information to the next generation. Two distinct pathways of DNA repair exist whose roles are delineated by the type of DNA lesion that they recognize and repair. Initial work in the bacterial and yeast systems has laid the foundation for identifying the genes of these pathways in mammals.

EF-Hand Calcium-Binding Proteins
The EF-Hand Calcium-Binding Proteins Data Library is a growing collection of published sequence, structural, functional, and other information about EF-hand calcium-binding proteins and their roles in cellular signal transduction.

Eph Receptor Tyrosine Kinases & Their Ligands
This is a database for the Eph family of receptor protein tyrosine kinases and their ligands, the ephrins. Much excitement regarding this new family results from their roles in developmental neurobiology as molecular guides for axons. However, they may be involved in many other processes; cancer, angiogenesis, haematopoiesis, and kidney development. The expression patterns of members in this family suggest that their functions during development and in the adult organism is still relatively unknown.

ESTHER - ESTerases and alpha/beta Hydrolase Enzymes and Relatives
ESTHER (for esterases, [alpha]/[beta] hydrolase enzymes and relatives) is a database aimed at collecting in one information system, sequence data together with biological annotations and experimental biochemical results related to the structure-function analysis of the enzymes of the family.

FYVE finger
The FYVE finger is a novel zinc finger-like domain found in several proteins involved in membrane trafficing. The basic motif consists of 8 Cysteins, 4 of which are part of the core motif R+HHC+XCG (where '+' is a positively charged residue and 'X' is any aminoacid). This finger has only been observed as a single copy in each of the proteins and it has been shown to bind 2 zinc ions per finger.

Gene Family Database
A prototype database of gene families in the human and mouse that contains both community contributed information and hypertext links to various biological databases.

PAX
Cytochrome P450
Insulin including the Insulin-like growth factors
Amyloid Precursor Protein (APP, APPLP1, APPLP2)
DNA repair proteins

G protein-coupled receptor database (GCRDb)
GCRDb was started in 1989 to keep track of all new sequence data of this biologically important class of proteins. The systematic collection of these data has been a large undertaking which has been aided by Amos Bairoich, Gert Vriend, Kevin Lynch and others.

Globin Gene Server
This provides access to sequence alignments and experimental results for the beta-like globin gene cluster of mammals

Glucoamylases
Glucoamylase (also known as amyloglucosidase) is an important industrial enzyme used in saccharification steps in both in Starch Enzymatic Conversion and in Alcohol Production.

Glycosyltransferases
This guide lists, and gives WWW links to sequence databases for:

1. Cloned eukaryotic glycosyltransferases involved in the biosynthesis of glycoproteins, glycolipids, glycosylphosphatidylinositols and other complex glycoconjugates (together with journal references)

2. Cloned prokaryotic glycosyltransferases involved in lipopolysaccharide biosynthesis

3. Cloned glucuronyltransferases and some yeast chitin synthases (Swissprot links only)

However, this guide does not cover glycosyltransferases involved in the metabolism of sucrose, trehalose, glucan or many other polysaccharides, nor does it list the many expressed sequence tags (ESTs) or predicted Caenorhabditis elegans glycosyltransferase sequences.

GnRH Family
The vertebrate gonadotropin-releasing hormone (GnRH) is a decapeptide involved in regulating reproduction. The release of GnRH from the hypothalamus regulates the production of gonadotropins in the pituitary and these gonadotropins are responsible for gonadal development and growth in vertebrates.

In addition to the hypothalamic GnRH (also called GnRH1 or GnRH-I), many vertebrate species have been found to express other GnRH forms. These include a midbrain GnRH (called GnRH-II or GnRH2) and a telencephalic GnRH (called GnRH-III or GnRH3). The function of these non-hypothalamic forms remains unclear.

Histone, Histone Sequence Database
Database of aligned histone protein sequences. Also contains sequences of proteins identified as containing the histone fold motif. Structures of all known histone and histone fold proteins.

Information included regarding discrepancies between similar sequence entries in different source databases. Multiple sequence alignments for each histone.

Homeobox
Information relevant to homeobox genes (in particular about classification/evolution).

HOX DataBase
Contains description of homeotic Hox genes and features of encoding HD-proteins that are sequence-specific transcription factors, part of a developmental regulatory system

HOX Pro database
The HOX-Pro is aimed at

analysis and classification of regulatory regions in diverse homeobox and related genes-controllers of invertebrate and vertebrate development;
comparative analysis of organisation of HOX clusters and "hox-based" genetic networks for C.elegans, sea urchins, Drosophila and vertebrates;
analysis of phylogeny and evolution of homeobox genes and clusters.

HSP100 Alignments
Information on the HSP100 proteins.

The Integrin Page
Integrins are receptor proteins which are of crucial importance. They are the main way that cells both bind to and respond to the extracellular matrix.

InBase, The New England Biolabs Intein Database
Protein splicing is defined as the excision of an intervening protein sequence (the INTEIN) from a protein precursor and the concomitant ligation of the flanking protein fragments (the EXTEINS) to form a mature extein protein and the free intein (Perler 1994). Protein splicing results in a native peptide bond between the ligated exteins (Cooper 1993). Extein ligation differentiates protein splicing from other forms of autoproteolysis. Inteins are named with a 3 letter genus/species designation followed by the extein gene name. If more than 1 intein is present in an extein gene, the inteins are given a numerical suffix.

Ion Channel Network
The Ion Channel Network (ICN) is a pilot WWW site aimed at making distributed information about ion channel molecules more 'accessible' and 'systematic' in coverage.

Ion Channel Resources
Resources for ion channel research

researchers
ion channel toxins
human Kv sequences
recent articles
reference list
biophysical software tools
publications
ion channel basics
ion channel links

Insulin gene family
The insulin gene family is an ancient and highly diverse group that includes insulin, insulin-like growth factor I (IGF I ), ( IGF II ), and relaxin , from a wide variety of vertebrate species, and a number of related peptides in the invertebrates, such as insect prothoracicotrophic hormone ( PTTH ) and molluscan insulin-related peptides ( MIP I and MIP II ).

Inteins - protein introns
Inteins are proteins inserted in-frame and translated together with their host proteins. The precursor protein then undergoes protein splicing resulting in two products: the host protein and the intein. This reaction is autoproteolytic.

This site mainly focuses on intein sequence motifs and evolution.

The Kinesin Home Page
Kinesin is a mechanochemical protein capable of utilizing chemical energy from ATP hydrolysis to generate mechanical force. In the presence of ATP, kinesin can bind to and move on microtubules. The ability to translocate along the microtubule lattice has led to the classification of kinesin as a microtubule motor protein. Kinesin is unrelated in sequence to the other known class of microtubule motor proteins, the dyneins, and is thought to perform functions in the cell distinct from the dyneins.

Kinesins
Information on the kinesin protein family.

Labial Homeobox
The homeodomain sequences and references of over 40 putative labial genes among metazoan organisms, as well as several hexapeptide sequences, are presented in alignments.

MADS-box Gene
The MADS box is a highly conserved sequence motif found in a family of transcription factors. The conserved domain was recognized after the first four members of the family, which were MCM1, AGAMOUS, DEFICIENS and SRF (serum response factor). The name MADS was constructed form the "initials" of these four "founders".

MEROPS - The Peptidase Database
This database employs a structure-based classification of peptidases by clan, and family. Each peptidase is given a unique MEROPS identifier. Links give access to database entries for the enzymology, protein and nucleic acid sequences, tertiary structures, genetics and more.

Metallothionein
Metallothioneins (MTs) are ubiquitous low molecular weight proteins and polypeptides of extremely high metal and sulfur content. They are thought to play roles both in the intracellular fixation of the essential trace elements zinc and copper, in controlling the concentrations of the free ions of these elements, in regulating their flow to their cellular destinations, in neutralising the harmful influences of exposure to toxic elements such as cadmium and mercury and in the protection from of a variety of stress conditions.

MutS Protein Family
The ability of many species to repair mismatches in double-stranded DNA has been well documented. The first critical step in this process is the recognition of the mismatched DNA. In the major mismatch repair pathway in Escherichia coli, this is accomplished by the MutS protein.

The Nuclear Receptor Resource
The Nuclear Receptor Resource (NRR) Project is a collection of individual databases on members of the steroid and thyroid hormone receptor superfamily. Although the databases are located on different servers and are managed individually, they each form a node of the NRR. The NRR itself integrates the separate databases and allows an interactive forum for the dissemination of information about the superfamily.

Glucocorticoid receptor resource
Thyroid hormone recptor resource
Androgen recptor resource
Mineralocorticoid receptor resource
Androgen receptor mutations resource
PPAR resource
steroid receptor associated proteins
Vitamin D receptor resource
steroid structures & activities

Olfactory Receptor DataBase
ORDB is a database of sequences of olfactory receptor proteins. It contains public and private sections which provide tools for investigators to analyze the functions of this very large gene family of G protein-coupled receptors. It also provides links to a local cluster of databases of related information, and to other relevant databases worldwide.

PAX family
Members of the mammalian PAX (paired box) family of genes were initially identified due to sequence homology to the Drosophila segmentation genes paired and gooseberry (Burri et al, 1989). Currently, nine members of this family have been identified in the human (designated PAX1 through PAX9). PAX genes have also been identified in mouse and zebrafish, where they are expressed in a tissue specific manner and appear to play a role in embryonic development. The PAX family is characterized by the presence of paired box domains. Some members of the family (PAX3, PAX4, PAX6, PAX7) also contain a functional homeobox domain.

Pentapeptide repeat
The pentapeptide repeat is found in several bacterial proteins of uncertain function [1]. Pentapeptide repeat proteins contain a striking repeat of five residues which can be clearly seen in self dot-plots. The repeat can be approximately described as A(D/N)LXX, where X can be any amino acid. This family is found to have many members in the bacterial genome of the cyanobacterium Synechocystis sp.

PHD-finger
PHD-finger, a zinc finger-like motif , occurs in a set of proteins that includes members of the Drosophila Polycomb and trithorax group genes. These genes regulate the expression of the homeotic genes through a mechanism thought to involve some aspect of chromatin structure.

PROMISE - The Prosthetic groups and Metal Ions in Protein Active Sites Database
The PROMISE (Prosthetic centres and metal ions in protein active sites) database aims to gather together comprehensive sequence, structural, functional and bibliographic information on proteins which possess prosthetic centres, with an emphasis on active site structure and function.

Protein Kinase Database Project
The Protein Kinase Database Project at SDSC aims to create a system that is narrowly focused on kinases, phosphatases, and related molecules. It will integrate structural, genetic, and molecular biological data.

The RecA Protein Family Web Site
The RecA protein is involved in at least three distinct biological processes in E. coli:

Homologous recombination (and the recombinational repair of DNA damage).
DNA damage induced mutagenesis.
Activation of the SOS system.

The Ribonuclease P Database
The RNase P Database is a compilation of RNase P sequences, sequence alignments, secondary structures, three-dimensional models, and accessory information. The database primarily contains information on the bacterial and archaeal enzymes, focusing on the RNA subunit. Some information is also included on the eucaryal and organellar RNase P RNAs.

SAND domain
The SAND domain adds to the burgeoning set of domains present in modular chromatin-associated proteins. The functions of most of these domains are not at all well understood, and gaining a better understanding will be one key to understanding how chromatin is assembled and regulated. The SAND domain appears in various nuclear contexts. Sp100/Sp140 are found in recently described nuclear bodies or dots, discrete structures within the nucleus that do not yet have known functions.

SANT domain
SANT domains are a epeated motif in N-CoR, the nuclear receptor co-repressor.

SNF2 family of proteins
The SNF2 family of proteins is defined by the presence of a conserved set of amino-acid motifs which together are called the SNF2 domain.

The Thyroid Hormone Receptor Resource
The TRR provides a variety of information on TRs as well as more general information shared with other NHRR sites.

Topoisomerases
Topoisomerase enzymes (topo) are critical to normal function of any living cell. They adjust DNA winding (topology) by making temporary cuts in the DNA for transcription and replication.

Transposases
The Mutator transposable element system of maize is one of the most active transposable element systems known in plants. The autonomous element of this system, MUDRA, encodes two proteins MUDRB.

Wnt and Frizzled gene Homepage
Wnt proteins are now recognized as one of the major families of developmentally important signaling molecules, with mutations in Wnt genes displaying remarkable phenotypes in the mouse, Caenorhabditis elegans, and Drosophila. Among functions provided by Wnt proteins are such intriguing processes as embryonic induction, the generation of cell polarity, and the specification of cell fate.

Any Comments, Questions? Support@hgmp.mrc.ac.uk