Bioverse

Category Cross-Omics>Knowledge Bases/Databases/Tools

Abstract The Bioverse is an object-oriented framework for exploring the relationships among the molecular, genomic, proteomic, systems, and organismal worlds.

The manufacturer uses computational techniques to perform sophisticated analyses on genomic sequence data to annotate and understand relationships between the sequence, structure, and function of DNA, RNA, proteins and metabolites, at both the molecular and the genomic/systems levels.

A key paradigm in the Bioverse framework is to use sensitive 'homology detection' based on sequence, structure and function to transfer information across organisms.

This will result in an extremely complicated 'weighted graph' that connects different (all, in the long term) sequence, structure, and functional data, that will be searchable for biologically relevant sub- graphs/networks in a sophisticated manner (for example, "which proteins in a particular organism are co-expressed together in every known experiment and have similar three-dimensional structures?").

Time-dependent evolution of these networks will also be possible.

The manufacturer (meaning the scientific community) are in a unique position to do this given that the entire genomes of several organisms have been sequenced.

The manufacturer's top level categorization of the data therefore begins with a particular organism's genome.

The data for functional elements encoded by a given organism's genome is sorted according to three (3) categories representing different objects observed in biological systems: 1) Sequence; 2) Structure; and 3) Function.

For each object category, different sorts of information can be obtained. These include:

1) The form or shape of the object - This is represented by some type of summary for the form: for sequence, it is the linear (primary) sequence; for structure, it is both two-dimensional (secondary) and three- dimensional (tertiary) structure; for function, it is just a list of words that is highly context dependent.

2) The evidence for the object - This includes both experimental results, like the x-ray crystallography structure of a protein or an expression array experiment identifying genes co-expressed together under certain conditions, and theoretical predictions such as secondary structure and gene finding predictions.

A list of types and sources of data will be made available.

3) The properties of the object - This includes some observations and calculations about the object.

For example, sequence length, amino acid composition, secondary structure composition, volume, etc. A very special property of the object is:

3a) The confidence value for the object - If applicable, an object will be assigned a confidence value based on the evidence available for the object.

An object that relies on other objects will be represented by a confidence value that relies on other confidence values (a "meta- confidence").

A list of methods to assign confidence values will be made available.

3b) The similarity and contextual relationships between the object to other objects, within the organism and with the rest of the Bioverse - The former kind includes objects that are similar across and between genomes; and the latter kind includes objects that work in the context of, i.e., interact with, the object of interest.

This includes sequence, structure, and functional relationships which can, for example, be used to identify paralogs and orthologs, protein complexes, pathways, and deduce other evolutionary and systems relationships.

A list of methods to deduce relationships will be made available.

3c) The properties of the relationships between the object to other objects, within the organism and with the rest of the Bioverse.

3d) The confidence value for the relationship between two objects - This is based on the degree/strength of the relationship.

The data presented is from the most abstract (related to the biology) to the least abstract (related to the information theory/technology).

For example, an object's function is presented first and further exploration will lead to different methods used to determine/predict that function (sequence motif comparison, structure comparison, a particular microarray experiment, a literature source, etc.).

Users are able to browse by sequence as well as search the database. They can also provide whole-genome sequences to be annotated using the manufacturer's methods and be included as part of the Bioverse.

Current status of the databases (how many sequences, which organisms, etc.) are also available.

Since Bioverse is Not complete, i.e., the manufacturer doesn't have all sequences encoded by all organismal genomes, to facilitate the internal cross-linking the manufacturer has also created special databases consisting of sequences with particular properties (for example, sequences with known three-dimensional structures, sequences in SWISS-PROT, etc.).

Special biologically interesting collections will be provided.

Application Programming Interface -- Bioverse data is available through an Application Programming Interface (API) for programming custom applications that utilize its data.

The API can be accessed directly over the web with a browser, or with an XML-RPC or JSONRPC library in a programming environment like Python, Perl, Java, C/C++, JavaScript, etc.

System Requirements

Web based.

Manufacturer

Manufacturer Web Site Bioverse

Price Contact manufacturer.

G6G Abstract Number 20324

G6G Manufacturer Number 102871