Chemical Entities of Biological Interest (ChEBI)

Category Metabolomics/Metabonomics>Knowledge Bases/Databases/Tools

Abstract Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.

The term ‘molecular entity’ refers to any constitutionally or isotopically distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity.

The molecular entities in ChEBI are either products of nature (metabolites) or synthetic products used to intervene in the processes of living organisms (drugs or toxins).

ChEBI contains structure and nomenclature information along with hyperlinks to many well-regarded databases.

ChEBI uses a carefully developed ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are precisely specified (see below…).

ChEBI has >15,500 chemical entities in its database.

ChEBI uses nomenclature, symbolism and terminology endorsed by the following international scientific bodies --

1) International Union of Pure and Applied Chemistry (IUPAC).

2) Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB).

Molecules directly encoded by the genome (e.g. nucleic acids, proteins and peptides derived from proteins by cleavage) are Not as a rule included in ChEBI.

Note: All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original source.

ChEBI Sources --

In order to create ChEBI, data from a number of sources were incorporated and subjected to merging procedures to eliminate redundancy.

Four (4) of the main sources from which the data are drawn are:

1) IntEnz - the Integrated relational Enzyme database of the EBI. IntEnz is the master copy of the Enzyme Nomenclature, the recommendations of the NC-IUBMB on the Nomenclature and Classification of Enzyme-Catalyzed Reactions.

2) KEGG COMPOUND - One part of the Kyoto Encyclopedia of Genes and Genomes LIGAND database, COMPOUND is a collection of biochemical compound structures.

3) PDBeChem - This Chemical component dictionary service provides web access to the Chemical Component Dictionary of the wwPDB as this data is loaded into the PDBe database at the European Bioinformatics Institute (EBI).

4) ChEMBL - A database of approximately 500,000 bioactive compounds, their quantitative properties and bioactivities, abstracted from primary scientific literature. It is part of the ChEMBL resources at the European Bioinformatics Institute (EBI).

Other data sources are listed in the ChEBI User Manual located on the manufacturers web-site.

ChEBI Data --

ChEBI shows/contains the following data fields:

1) ChEBI Identifer - the unique identifier - A unique and stable identifier for an entity, for example, CHEBI:16236. It has No chemical significance and may be cited by external users.

2) ChEBI Name - the name recommended for use in biological databases.

3) ChEBI ASCII Name - the ChEBI name with any special characters rendered in the ASCII format.

4) Star rating - A rating based on the level of manual annotation.

5) Structure - graphical representation(s) of a molecular structure and associated molfile(s), IUPAC International Chemical Identifier (InChI) and SMILES strings.

Molfile(s) - An MDL Molfile is a file format created by MDL (now Symyx), for holding information about the atoms, bonds, connectivity and coordinates of a molecule.

The molfile consists of some header information, the Connection Table (CT) containing atom info, then bond connections and types, followed by sections for more complex information.

InChI - InChI is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.

Developed by IUPAC, it expresses chemical structures in terms of atomic connectivity, tautomeric state, isotopes, stereochemistry and electronic charge in order to produce a sequence of machine-readable characters unique to the respective molecule.

SMILES - SMILES (Simplified Molecular Input Line Entry System) is a simple but comprehensive chemical line notation, created in 1986 by David Weininger and further extended by Daylight Chemical Information Systems, Inc. SMILES specifically represents a valence model of a molecule and is widely used as a data exchange format.

6) Formula - Molecular formula.

7) Charge - For ions the magnitude of the charge is given in Arabic numerals preceded by the sign of the charge. For neutral molecules the charge is indicated as a numerical zero.

8) Mass - Relative molecular, atomic and ionic masses are shown for molecular, atomic and ionic entities respectively. The relative masses are calculated from tables of relative atomic masses (atomic weights) published by IUPAC.

9) ChEBI Ontology.

ChEBI Ontology -

The ChEBI Ontology is a structured classification of the entities contained within ChEBI. Its structure is essentially that of a directed acyclic graph (DAG), which differs from a simple taxonomy in that a ‘child term’ can have many parent terms.

Additionally, a number of relationships are incorporated which are cyclic in nature.

It comprises four (4) separate sub-ontologies (Molecular Structure, Biological Role, Application, and Subatomic Particle), employs a number of different relationships and offers the user a choice of two (2) views: a ‘Parents and Children View’ (in which the types of relationship between an entry and its immediate parent or children are stated in words) and a ‘Tree View’ (a graphic which places the ChEBI entry into context within the overall ontology structure).

Further information about the sub-ontologies, relationships and views are given in the ChEBI User Manual on the manufacturer’s web-site.

10) IUPAC Name - name(s) generated according to recommendations of IUPAC.

11) INN - International Nonproprietary Name, also known as generic name, assigned by the World Health Organization (WHO).

12) Synonyms - other names together with an indication of their source.

13) Brand Name - a trade or proprietary name.

14) Database Links - manually curated cross-references to other non-proprietary databases.

15) Registry Number - CAS Registry Number, Beilstein Registry Number, and Gmelin Registry Number (if available).

16) Citations - Publications which cite the entity along with hyperlinks to their entries.

In addition, a separate page called ‘Automatic Xrefs’ contains automatically generated cross-references to a number of biological and chemical databases.

