Abstract The PhenomiR database provides information about differentially regulated miRNA expression in diseases and other biological processes.

The content of PhenomiR is completely generated by manual curation from experienced annotators. Data was extracted from more than 365 scientific articles and resulted in more than 632 database entries as of 02/2011.

The design principle of PhenomiR is to use established ontologies and resources.

For annotation of diseases, the manufacturers use information from the OMIM Morbid Map, bioprocesses are described with terms from Gene Ontology (GO) and for annotation of tissue/cell culture information, the Tissue Ontology is used.

Rationale behind the PhenomiR database --

In order to provide a comprehensive overview of differentially regulated miRNA expression data in diseases and general biological processes, the manufacturers generated the PhenomiR database.

The manufacturers aim at high data quality by manual annotation by experienced Biocurators.

PhenomiR provides an in-depth annotation of the studies, Not only including information like the mode of miRNA expression (up or down) and the miRNA detection method, but also data such as the quantitative fold-change of miRNA expression, the sample size and the origin of the samples (patients or cell culture) analyzed, which are Not currently available from other existing resources.

This comprehensive repository allows for a large-scale statistical analysis of aspects such as genomic localization of deregulated miRNAs or the influence of sample origin.

Using PhenomiR data from cell culture studies and patient studies, the manufacturers found that, depending on the disease type, independent information from cell culture studies is in conflict with conclusions drawn from patient studies.

Furthermore, a systematic analysis of 94 diseases shows that deregulated microRNA clusters are significantly overrepresented in the majority of investigated diseases (approximately 90%) compared to singular microRNA gene products.

PhenomiR database contents --

PhenomiR provides a repository that offers all the scattered information about miRNA expression in a structured and uniform format.

This allows users to perform individual queries for specific miRNAs and diseases as well as to use the complete dataset for large-scale statistical analyses.

All information in PhenomiR is extracted from published experiments and has been manually curated (as stated above...). The literature reference for each database entry is annotated as a PubMed identifier and is hyper-linked to PubMed in the web front-end.

Each individual entry of the database refers to an instance of a publication describing a specific disease or bioprocess. This dataset includes over 12,192 data points, each representing one deregulated miRNA in an experiment.

A design principle of PhenomiR is to use well-established ontologies and resources.

As miRBase, is the primary resource for miRNA annotation and nomenclature, the manufacturers use the miRBase identifiers and nomenclature for annotation of miRNAs.

In order to enable convenient analysis of the dataset, miRNA designations from previous nomenclature releases were mapped to miRBase release 12.0.

For annotation of diseases the manufacturers use information from the Online Mendelian Inheritance in Man (OMIM) Morbid Map (as stated above...). The OMIM Morbid Map is an alphabetical list of diseases described in OMIM, including their corresponding cytogenetic locations.

In contrast to disease vocabularies like Disease Ontology (DO) or MeSH (Medical Subject Heading) disease categories, the widely popular OMIM classification scheme contains additional information about the disease, such as clinical features, population genetics and genes that are experimentally shown to be involved in the respective disease.

If No appropriate OMIM disease term is available for the annotation of a disease (currently the case for 20.7% of the studies), the manufacturers introduce additional terms like ‘dermatomyositis’ and ‘thyroid carcinoma, medullary’.

In addition to the OMIM terms, PhenomiR annotates Morbid Map-associated higher-level disease classes, such as cancer or cardiovascular.

In this system, each annotated disease from the Morbid Map is associated with one of 22 disease classes. miRNA expression analysis of biological processes are predominantly performed for developmental processes and responses to conditions like folate starvation.

For the annotation of biological processes the manufacturers assign terms from the Gene Ontology (GO). Cell lines or tissues that were used as samples in the analysis are annotated using the Brenda Tissue Ontology (BTO).

In addition to the sample information, PhenomiR provides the experimental methods used for miRNA expression analyses: to a large extent, expression studies of miRNAs have been performed with microarrays (29% of all miRNA phenotype correlations).

Other methods, such as RT-PCR (47%) and Northern blot (10%), are also used to reconfirm the results for selected miRNAs. Information about differential expression of miRNAs in PhenomiR is given as the qualitative attributes ‘miRNA over-expression’ or ‘miRNA down-regulation’.

In most articles (75%) authors also publish quantitative results. This information allows discrimination between marginally and significantly deregulated miRNAs. If such information is available, quantitative data (as fold-change) are additionally annotated in PhenomiR.

Data content from miRNA expression studies curated in PhenomiR show a high heterogeneity in the amplitude of fold-change and the available measurements.

Note: The manufacturers do Not set arbitrary thresholds for the numbers of deregulated miRNAs or the fold-changes but present the data as they are provided by the scientists, leaving possible filtering and thresholding or weighting to any later analysis.

Search options and predefined datasets --

In order to obtain an overview of the PhenomiR dataset, the web page (located on the manufacturer’s web-site) links to three (3) lists that display:

1) All entries;

2) All diseases; and

3) All annotated miRNAs.

In addition, statistical information about the number of database entries, most frequently annotated miRNAs, and so on, are provided on the front page of the manufacturer’s web-site.

For queries, PhenomiR offers two (2) search options, a ‘General search’ as well as a ‘Specific search’.

1) The ‘General search’ performs simultaneous queries across several attributes like ‘miRNA name’, ‘disease’ or ‘gene name’.

This is optimized for searches where comprehensiveness rather than specificity is required. The results can be displayed either as respective entries or associated miRNAs.

2) The ‘Specific search’ allows the selection of individual annotated attributes shown in a pull-down menu. Additionally, specific searches can be combined by using the logic operators AND, OR and NOT.

As in the ‘General search’, results can be displayed as a list of database entries. Another way to depict the results is to generate a list of all miRNAs found in any of the corresponding studies. Results of both search options are linked to their respective entries.

To demonstrate the additional value of the comprehensive annotation in PhenomiR, the manufacturers also investigated the influence of differentially regulated genomic microRNAs on diseases from a large-scale statistical point of view.

