Abstract Diseasome is an integrated database of genes, genetic variation and diseases.

This database provides a disease thesaurus with a tree view that shows the number of genes that are associated with diseases, and a genome browser for conveniently looking up potential deleterious Single Nucleotide Polymorphisms (SNPs) among the genes that are strongly associated with specific diseases and clinical phenotypes.

It also provides semi-automatic ways of deriving a list of candidate SNPs to be evaluated in epidemiological or molecular biological experiments for disease association studies.

Currently, it contains 14,674 records on genetic variation and 109,715 records on genes related to human diseases.

Diseasome integrated database-pipeline system --

The manufacturer's developed an integrated database-pipeline system for studying SNPs and diseases.

And, to implement this pipeline system for the Diseasome integrated database, they first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases.

Next, they collected and integrated representative databases for three (3) categories of information.

For genes and proteins, they examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases.

For genetic variants they used the dbSNP, JSNP, ALFRED, and HGVbase databases.

For disease, they employed OMIM, Genetic Association Database (GAD) (see G6G Abstract Number 20314), and HGMD databases.

The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease.

Diseasome main web interface --

The main web interface provides two (2) ways to explore integrated disease-related information: the disease terms tree view and through query-searching.

The 'disease terms tree view' consists of 23 top disease terms having an average of five or six sub-term classes.

Users can look over all the 'disease terms', displayed together in the tree view.

Note: Diseases that have been of great research interest such as diabetes mellitus, breast cancer, and Parkinson disease are represented at the top right of the web interface.

1) When users clicks on a disease term, they can get results consisting of targeted disease-related information (disease name, synonyms, and title), and gene and SNP information associated with diseases directly.

2) To get more detail information about a gene or SNP, users can explore using the gene symbol or SNP identifier (rs number) in the genes or SNPs list results.

3) The web interface allows querying with three (3) kinds of terms: 1) SNP identifier (rs number from dbSNP), 2) Gene ID (symbol and description), or 3) Disease term.

1) SNP identifier -

When the user submits a SNP identifier, the system shows the gene, SNP, and disease information through a genome browser, (Gbrowse) generally.

In addition, it provides more detailed SNP information and effects such as SNP position on the genome, SNP allele, SNP effects by using PolyPhen [PolyPhen (=Polymorphism Phenotyping) is an automatic tool for the prediction of the possible impact of an amino acid substitution on the structure and function of a human protein.] and

SIFT (SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.), transcripts information in which the SNPs are located, allele frequencies and flanking sequences from SNP resources.

2) Gene ID -

3) Disease term -

