Abstract deepBase is a platform (database) for annotating and discovering small and long ncRNAs (microRNAs, siRNAs, piRNAs...) from next generation sequencing data.

deepBase allows the mapping, storage, retrieval, analysis, integration, annotation, mining and visualization of next generation sequencing data from different technological platforms, tissues and cell lines of different organisms.

deepBase also provides an integrative, interactive and versatile web graphical interface to display multidimensional data, and facilitate transcriptomic research and the discovery of novel ncRNAs.

The current release of deepBase contains deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven (7) organisms: human, mouse, chicken, Ciona intestinalis, Drosophila melanogaster, Caenhorhabditis elegans, and Arabidopsis thaliana.

By analyzing ~14.6 million unique reads that perfectly mapped to more than 284 million genomic loci, the manufacturers annotated and identified ~380,000 unique ncRNA-associated small RNAs (nasRNAs), ~1.5 million unique promoter-associated small RNAs (pasRNAs), ~4.0 million unique exon-associated small RNAs (easRNAs), and ~6 million unique repeat-associated small RNAs (rasRNAs).

Furthermore, ~2,038 miRNA and ~1,889 snoRNA candidates were predicted by miRDeep and snoSeeker (see below…).

All of the mapped reads can be grouped into about 1.2 million RNA clusters.

For the purpose of comparative analysis, deepBase provides an integrative, interactive, and versatile display.

A convenient search option, related publications and other useful information are also provided for further investigation.

deepBase web interface --

deepBase provides a variety of interfaces and graphical visualizations to facilitate analysis of the massive and heterogeneous small RNA data sets from different tissues, cell lines, and technology platforms.

The manufacturers have also developed a new visualization tool, deepView genome browser, to provide a quick overview of a particular region in the genome and for visually correlating various types of features.

The deepView browser in deepBase provides an integrated view of mapped reads, known and predicted ncRNAs, protein-coding genes and RNA clusters, and their expression peaks.

Clicking a prediction or gene of interest launches a multiple-alignment trace viewer that displays all traces of genes or links to external resources such as:

1) The National Center for Biotechnology Information (NCBI);

2) UCSC;

3) miRBase; and

4) The Arabidopsis Information Resource (TAIR), to obtain more comprehensive information.

The libView browsers provide the graphical comparisons of multiple libraries for the distribution of length and the 5'-terminal nucleotide of small RNAs.

The manufacturers also provide the nasView graphical browser to facilitate the comparisons of multiple small RNA libraries of ncRNAs, including miRNAs, snoRNAs, tRNAs, rRNAs, snRNAs, scRNAs, Mt_tRNAs and misc_RNAs.

The expression profiles for ncRNAs are also provided to test for a differential expression pattern among different tissues and cell lines.

For small RNAs derived from diverse RNAs, RNA clusters and predicted ncRNAs, the database provides the sequence, genomic location, RNA secondary structures, references, and annotations.

deepBase provides a variety of search functions, including keyword function for searching small RNA, ncRNA, and RNA cluster information, and a BLAST function for performing searches against sets of small RNA sequences.

The search results are linked to the full database records.

deepBase future directions and current additional tools --

Next-generation sequencing technologies have played a vital role in improving ones understanding of functional genomics.

As new genome builds and genome-wide high-throughput deep sequencing data from different species, cell lines, tissues and conditions become available, the manufacturers will continuously maintain and update the deepBase database.

The Automatic Mapping, Annotating and Mining Tools (AutoMAMT) in deepBase are run on the manufacturer’s high-performance computer servers.

The manufacturers have updated the deepBase for human genome (hg19 version) using AutoMAMT.

At present, deepBase has integrated an additional 52 small RNA libraries which are annotated and mapped to the latest human assemble version (hg19).

The manufacturers will continue to extend the volume on the current system and improve the performance of the manufacturer’s computer servers for storing new sequencing data.

The stand-alone graphical user interface (GUI) software tools, called “deeptools” will be continuously released via deepBase.

Current deeptools are:

1) GalaxyView genome browser - facilitates data quality assessment, data analyses, visual validation, and hypothesis generation, visual examination and large-scale interactive analysis of millions of deep sequencing reads.

2) deepGenome - deepGenome is a menu-driven or mouse-driven easy-to-use deep sequencing data analysis tool. It enables you to integrate raw and processed mixed-type deep sequencing reads from different technologies.

3) snoSeeker - snoSeeker is a computational package which includes two (2) novel snoRNA-searching programs, CDseeker and ACAseeker, specific to the detection of C/D snoRNAs and H/ACA snoRNAs, respectively.

Based on new algorithms, these programs can detect both guide and orphan snoRNA genes in a genome-wide analysis.

Bench biologists can use these stand-alone GUI software tools to manipulate and analyze their own data, or data downloaded from deepBase, locally on their personal computers.

The integration of transcriptome datasets from the deepBase database with other deep sequencing research, such as genomic mRNA-Seq, methylC-Seq and ChIP-Seq, will contribute to functional annotation of the genome and to a deeper understanding of genomic and cellular dynamics and features.

