Abstract The Human Metabolome Database (HMDB) is a freely available electronic knowledge base/database containing detailed information about small molecule metabolites found in the human body.

It is intended to be used for applications in metabolomics, clinical chemistry, biomarker discovery and general education.

HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community.

The database is designed to contain or link three (3) kinds of data:

1) Chemical data;

2) Clinical data; and

3) Molecular biology/biochemistry data.

The database (version 2.5) contains over 7,900 metabolite entries including both water-soluble and lipid soluble metabolites as well as metabolites that would be regarded as either abundant (> 1 uM) or relatively rare (less than 1 nM).

Additionally, approximately 7,200 protein (and DNA) sequences are linked to these metabolite entries.

Each MetaboCard entry contains more than 110 data fields with 2/3 of the information being devoted to chemical/clinical data and the other 1/3 devoted to enzymatic or biochemical data.

Many data fields are hyperlinked to other databases [KEGG, PubChem, MetaCyc, Chemical Entities of Biological Interest (ChEBI), Protein Data Bank (PDB), Swiss-Prot, and GenBank] and a variety of structure and pathway viewing applets.

The HMDB database supports extensive text, sequence, chemical structure and relational query searches.

Four (4) additional databases, DrugBank, T3DB, SMPDB and FooDB are also part of the HMDB suite of databases.

1) The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. DrugBank contains the equivalent information on ~1,500 drugs;

2) Toxin and Toxin Target Database (T3DB) is a unique bioinformatics resource that combines detailed toxin data with comprehensive toxin target information.

T3DB currently houses over 2,900 toxins described by over 34,200 synonyms, including pollutants, pesticides, drugs, and food toxins, which are linked to over 1,300 corresponding toxin target records.

3) SMPDB (The Small Molecule Pathway Database) is an interactive, visual database containing more than 350 small molecule pathways found in humans. According to the manufacturer's of SMPDB, more than 2/3 of these pathways (>280) are Not found in any other pathway database.

SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, and proteomics and systems biology.

It is able to do so, in part, by providing exquisitely detailed, fully searchable, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways.

All SMPDB pathways include information on the relevant organs, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures.

Each small molecule is hyperlinked to detailed descriptions contained in the HMDB or DrugBank and each protein or enzyme complex is hyperlinked to UniProt.

4) FooDB (The food component database) contains the equivalent information on ~2,000 food components and food additives.

First described in 2007, the HMDB is currently one of the largest and most comprehensive, organism-specific metabolomics database assembled to date.

It contains spectroscopic, quantitative, analytic and molecular-scale information about human metabolites, their associated enzymes or transporters, their abundance and disease-related properties.

Since its initial release, the HMDB has been used in a wide range of metabolomics applications including the characterization and rationalization of biomarkers for multiple sclerosis, the identification of metabolites with anticancer properties and the network modeling of liver cancer.

User interface improvements since its initial release --

Both the front-end and selected components of the back-end of the HMDB have been substantially redesigned to accelerate searches, improve data visualization and allow greater flexibility in the number of ‘query tools’ and links that can be provided by the database.

The HMDB’s navigation bar (located at the top of each page of the HMDB web-site) has been simplified to six (6) pull-down menu tabs (‘Home’, ‘Browse’, ‘Search’, ‘About’, ‘Downloads’ and ‘Contact Us’).

The HMDB ‘Browse’ tab -

The ‘Browse’ tab - allows users to select from six (6) browsing options (HMDB Browse, Disease Browse, PathBrowse, Biofluid Browse, HML Browse, and ClassBrowse).

1) HMDB Browse - generates a tabular synopsis of the HMDB’s content. This browse view allows users to casually scroll through the database or re-sort its contents. Clicking on a given MetaboCard button brings up the full data content for the corresponding metabolite.

2) Disease Browse - allows users to scroll and search through tables of diseases, which are co-listed with hyperlinked metabolite and enzyme/protein names. As with PathBrowse users (see below...) may submit multiple lists of compounds and then view hyperlinked tables of diseases or conditions that may be associated with the observed metabolic changes.

3) PathBrowse - allows users to browse through the custom-drawn HMDB pathway images. Each pathway is named and each image is zoomable and extensively hyperlinked.

Users may also search PathBrowse using lists of compounds (obtained from a metabolomic experiment) and view hyperlinked tables that display all of the pathways that are potentially affected.

4) Biofluid Browse - generates hyperlinked tables listing normal and abnormal concentrations of different metabolites for sixteen (16) different biofluids.

5) HML Browse - allows users to browse or search through the HML. The Human Metabolome Library (HML) is a library of ~1000 reference metabolites stored in -80°C freezers.

Small amounts of these compounds are freely available to designated HMDB collaborators. They are also available on a cost-recovery basis to other laboratories on an as-needed basis.

6) ClassBrowse - allows users to view compounds according to their chemical class designation. Each displayed compound name is hyperlinked to the HMDB MetaboCard. Users may search for compounds (via a text box) or show the full list of metabolites.

HMDB’s additional menu options -

The HMDB’s ‘Search’ menu offers eight (8) different ‘querying tools’ including Chem Query, Text Query, Sequence Search, Data Extractor, MS Search, MS/MS Search, GC/MS Search and NMR Search.

While only the GC/MS and MS Search features are new, significant improvements in terms of speed, accuracy and robustness have been made to many of the other query tools.

Adjacent to the ‘Search’ menu, the ‘About’ pull-down menu contains information on the HMDB database, About the HMDB, Release Notes, Citing the HMDB, What's New, Statistics, Data Sources, MetaboCard Explanation and links to Other useful metabolomic Databases.

Finally, the ‘Downloads’ menu contains downloadable data for all HMDB compounds (in SDF format), all Nuclear Magnetic Resonance (NMR) spectra [in BioMagResBank (BMRB) format and as Portable Network Graphic (PNG) images], all Gas Chromatography/Mass Spectrometry (GC/MS) spectra (in NIST format), all tandem Mass Spectrometry (MS/MS) spectra (as PNG images), all enzyme/protein sequences as well as complete flat file data sets of current and past HMDB releases.

Additional improvements since its initial release --

Over and above these enhancements to the menu structure and database navigation scheme, improvements have also been made to the formatting and display of all of HMDB’s MetaboCards.

For instance, certain data fields have been reordered to bring logically similar data sets (such as structure files or pathway diagrams) closer together in each MetaboCard.

Other data fields (such as the NMR and MS spectral data fields) have had extra information added to the data cell, such as collection conditions and FID data. In other cases, data fields have reformatted to provide more information in a more structured manner.

For example, the information in normal and abnormal biofluid concentrations data cell has been reformatted to display much more data in a more readable tabular format. A similar change has been made to the associated disorders field.

Likewise all PubMed IDs and abbreviated chemical synthesis references have been replaced with full reference information (authors, title, journal, volume, page, year).

In a similar manner, the SNP (Single Nucleotide Polymorphism) data field (found in HMDB’s Enzyme section) has also been modified so that SNPs are displayed in hyperlinked summary tables containing information on their type (synonymous, non-synonymous), location, and validation status and population distributions.

This change to the SNP data field has also made the browsing of MetaboCards much faster and less taxing on the manufacturer’s servers.

