What is ThYme?
What are enzyme families?
All families are based on one or multiple query sequences that have evidence at protein level. The families are populated using BLAST on the catalytic domain of the query sequences, and confirmed with multiple sequence alignments, three-dimensional structure superpositions, and the exact position of catalytic residues if known. More details in the methods to find and create families appear in Cantu et al. (2010), found in our "Citing Us" section.
How are the families and sequences shown?
Each enzyme group has its own main page, where all the families are listed with "Names of enzymes and genes present". The names shown there present an overview of the sequences found, and the list is not exhaustive.
At the top of each family's page a table describes protein folds (if known from crystal structures), the names of enzymes and genes present (the list is not exhaustive), EC numbers (the most common ones), the catalytic residues (if they are known from the literature), and notes. The annotation might not be complete for all families.
Sequences appear by rows ordered into archaea, bacteria, and eukaryota, and alphabetically by species within. All sequences in a row are identical and come from only one species. Identical sequences among different species are separated into different rows; however, identical sequences among different strains of the same species are not separated. If more than 500 rows exist, they are shown in multiple pages for a single family.
On columns the sequence name, EC number, organism, GenBank accession, RefSeq accession, UniProt accession, and PDB accession appear. We provide links to the GenBank, RefSeq, UniProt, and PDB databases. The sequence name and EC number come from the sequence's annotation in UniProt or GenBank. WE DO NOT ASSIGN NAMES OR EC NUMBERS; rather we only display the existing annotation.
Experimentally verified proteins
UniProt accessions with "Evidence at Protein Level" are marked with a [P], and those with "Evidence at Transcript Level" are marked with a [T]. These sequences represent those with experimental work. The UniProt link or its equivalent in GenBank shows the experimental work's literature.
The content of existing families is updated continuously as the GenBank, UniProt, and PDB databases are updated; if a new sequence belongs in an existing family, it will appear there. To delete or merge existing families, the authors' inspection and judgment is necessary.
Note on multidomain proteins
Some enzymes shown are multidomain fatty acid synthases, polyketide synthases, or non-ribosomal peptide synthases. Each domain in these enzymes has its specific function, but all of them appear under the same GenBank, UniProt or PDB accession. So when one of these accessions appears under a family, it means that only that that domain belongs in the family, and not the others.
Example: UniProt P12785 is a rat fatty acid synthase. Its AT domain appears in AT2, its KS domain appears in KS3, its HD domain appears in HD4, and its TE domain appears in TE16.
Many PDB structures from different domains can come from a single sequence. Only the structure related to the domain is shown.
Example: UniProt P49327 has several PDB structures. Among them, 1XKT shows the TE domain that appears in a TE family, 2JFD shows the acyltransferase domain that appears in an AT family, and so forth.
Our group also created and maintains the CASTLE (CArboxylic eSTer hydroxyLasE) database, which focuses on carboxylic ester hydrolases. You can visit it at http://castle.cbe.iastate.edu/, if you're interested.