Genomic HyperBrowser

Category Genomics>Genetic Data Analysis/Tools and Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract The Genomic HyperBrowser is an integrated, open-source system for genome analysis.

The manufacturers have developed a novel statistical methodology and a robust software system for comparative analysis of sequence-level genomic data, enabling integrative systems biology, at the intersection of genomics, computational science, and statistics.

The manufacturers focus on inferential investigations, where two genomic annotations, or tracks, are compared in order to find significant deviation from null-model behavior.

Tracks may be defined by the researcher or extracted from the sizable library provided with the system. The system is open-ended, facilitating extensions by the user community.

Resolving complexity - system architecture --

The Genomic HyperBrowser is continually evolving, supporting 28 different analyses for significance testing, as well as 62 different descriptive statistics.

The system currently hosts 184,500 tracks (as of Sept. 2010). Most of these represent literature-based information, mostly utilized in network-based approaches.

As natural language based text mining allows for the identification of a wide variety of biological entities, the manufacturer's have generated tracks representing genomic locations associated with terms for the complete Gene Ontology (GO) tree, all Medical Subject Heading (MeSH) terms, chemicals, and anatomy.

The system is implemented in Python, a high-level programming language that allows fast and robust software development. Interoperability with standard file formats in the field is provided by parallel storage of original file formats and preprocessed vector representations.

To reduce the memory footprint of analyses on genome-wide data, an iterative divide-and-conquer algorithm is automatically carried out when applicable.

A further increase in speed is achieved by memoizing intermediate results to disk, automatically retrieving them when needed for the same or different analyses on the same track(s) at any subsequent time, by any user.

The system provides a web-based user interface with a low entry point. In order to simplify the task of making choices, a step-wise approach has been implemented, displaying only the relevant options at each stage.

This guided approach hides unnecessary complexities from the researcher, while confronting the user with important design choices as needed.

The manufacturers rely on a dynamic system to infer appropriate options, aiding maintenance. The list of selectable tracks is based on scans of available files on disk.

The list of relevant questions is based on short runs of all implemented analyses, using a minimal part of the actual data from the selected tracks.

For each analysis, a set of relevant options is defined. The dynamics of the system also provides automatic removal of analyses that fail to run, enhancing system robustness.

The complexities of the software solutions are hidden in the backbone of the system, simplifying coding of statistical modules. Each module declares the data types it supports and which results are needed from other modules.

The backbone automatically checks whether the selected tracks meet the requirements, and if so, makes sure the intermediate computations are carried out in the correct order.

Redundant computations are avoided through the use of a RAM-based memoization scheme.

The system also provides a component-based framework for Monte Carlo tests, where any test statistic can be combined with any relevant randomization algorithm, simplifying development.

In addition, a framework for writing unit and integration tests is included.

Step-by-step guide to HyperBrowser analysis --

One of the main goals of the Genomic HyperBrowser is to facilitate sophisticated statistical analyses.

A range of textual guides and screen-casts are available in the Help section at the manufacturer's web-site, demonstrating execution of various analyses, how to work with private data, and more.

What distinguishes the Genomic HyperBrowser from other available systems --

According to the manufacturer, the following aspects distinguish their Genomic HyperBrowser from currently available systems.

First, the manufacturer focuses on genomic information of a sequential nature, that is, with specific base-pair locations on a genome, and thus Not restricted to only genes.

Second, it focuses on the comparison of pairs of genomic tracks, possibly taking others into account through the concept of intensity tracks.

Third, all comparisons are performed using formal statistical testing.

Fourth, the manufacturers provide analyses on any scale, from genome-wide studies to miniature investigations on particular loci.

Fifth, the manufacturers offer flexible choices of null models for exploration and choice where relevant.

Finally, the manufacturers provide a user interface where the user describes the data and the null models, while the system based on this chooses the appropriate statistical test.

Comparing this to the EpiGRAPH and Galaxy frameworks, which the manufacturers believe are the closest existing systems, the manufacturer's find that both require substantial technical expertise when choosing the correct analysis and options.

EpiGRAPH is focused on a specific type of scenario that, according to the manufacturer's cataloguing, amounts to the comparison of unmarked points or segments versus categorically marked segments (with a mark being case or control).

Galaxy provides a simple user interface, is rich in tools for manipulating and analyzing datasets of diverse formats, but has little support for formal statistical testing.

Note: The Genomic HyperBrowser was built tightly connected to the Galaxy Framework and can make use of all the tools provided within Galaxy.

The manufacturers provide tools for abstraction and cataloguing of what they believe are typical questions of broad interest.

The abstractions of genomic data, the proposing of prototype investigations and the careful attention given to null models simplifies statistical inference for a range of possible research topics.

The manufacturer's approach invites researchers to build relevant null models in a controlled manner, so that specific biological assumptions can be realistically represented by preservation, randomness and intensity based confounders.

In addition, time used for repetitive tasks like file parsing and calculation of descriptive statistics may be significantly reduced.

The manufacturer's system is highly extensible. The software is open source (as stated above...), inviting the community to add new investigations and tools.

Attention has been given to component-based coding and simple interfaces, facilitating extensions of the system.

Use Case - Differential disease Regulome --

A prime example of the Genomic HyperBrowser is the Differential disease Regulome, mapping Transcription Factors against all human diseases.

A paper on the method has been accepted by BMC Genomics and is available from the journal's web site.

Geir K Sandve, Sveinung Gundersen, Halfdan Rydbeck, Ingrid K Glad, et al; "The differential disease Regulome"; BMC Genomics 2011, 12:353

The Differential disease Regulome can be viewed in its main version via the manufacturer's web-site.

Different versions of the disease Regulome are available for browsing, as well as several other similar regulatory maps.

According to the manufacturers it is also easy to interactively generate a Regulome on a predefined sample dataset (which can also be easily changed to other suited data sources).

System Requirements

Contact manufacturer.

Manufacturer

Manufacturer Web Site Genomic HyperBrowser

Price Contact manufacturer.

G6G Abstract Number 20783

G6G Manufacturer Number 104359