BioExtract Server

Category Cross-Omics>Workflow Knowledge Bases/Systems/Tools

Abstract The BioExtract Server is a Web-based data integration application designed to consolidate, analyze, and serve data from heterogeneous biomolecular databases in the form of a mash-up.

It offers a central distribution point for uniformly formatted data from various data sources.

The basic operations of the BioExtract Server allow researchers, via their Web browsers, to: specify data sources; select cleaning and analytic tools; flexibly query the sources with a full range of relational operators; determine download formats for their resulting extracts; save workflows; and name and keep query results persistent for reuse.

As a researcher works with the system, their “steps” are saved in the background. At any time these steps can be preserved long-term as a workflow simply by providing a workflow name and description.

Database System --

The BioExtract Server provides researchers with the ability to select from a list of data sources to be queried. The data sources available to researchers through the BioExtract Server are classified as either data sources or previously defined datasets.

A previously defined dataset is a ‘group of records’ resulting from a previously executed query. All functionality in the BioExtract Server that can be applied to data sources may also be applied to these datasets (e.g., query, export, analyze).

By having the ability to save the results of a query to datasets, researchers may share and subsequently search persistent subsets of data.

The BioExtract Server’s data sources are distributed and implemented as relational databases, proprietary ‘field stream’ data warehouses, or data sources hosted by Web servers.

Each of these implementations provides researchers with particular advantages and therefore, it is important that all of these implementation types be included within the BioExtract Server.

Some of the Advantages of the field stream database system include:

1) Faster data retrieval because data is stored in “streams” instead of multiple tables, thus eliminating table joins.

2) ‘Stem tree’ indexing system which provide GREP capabilities (wildcards, missing characters etc.) on text fields - (GREP is a command line ‘text search utility’ originally written for UNIX).

3) Dynamic creation of streams allowing the researcher to add fields to existing field stream databases.

4) Template based meta-data definitions allowing the researcher to specify the database schema.

BioExtract Server Architecture --

The BioExtract Server has been implemented using a multi-tiered J2EE architecture. Sun’s Java 2 Enterprise Edition (J2EE) Platform provides the ability to develop, deploy, and execute applications in a distributed environment.

This architecture also provides:

The tiers making up the BioExtract Server are the client tier, the middle tier and the backend database server tier.

1) Client tier - The researcher interacts with the BioExtract Server via a Web browser. The system administrator enters data sources and researcher groups into the system. To access the system, researchers are automatically logged into the server with the “guest” id.

2) Middle tier - The middle tier is implemented through the development of a J2EE application and deployed to an application server. It handles the communication between the backend database servers and client processes. All client requests are processed through the middle tier.

The list of available databases, database groups, data sets, researchers, researcher groups, and researcher workflows are all managed at this level.

3) Backend database server tier - Through Remote Method Invocation (RMI), the middle tier application accesses data stored in the field stream databases and stand-alone analytic tools.

These databases and analytic tools may reside on the same machine or may be distributed across an intranet or the Internet. BioExtract Server Functionality --

The BioExtract Server provides the researcher with the ability to select from a list of distributed databases or data sets, query selected databases or data sets, apply cleaning and analytic tools to query results, view results, export or save results, and save researcher workflows.

1) Pick Sources - After logging onto the server, the researcher has access to multiple databases and previously saved data subsets.

The list of available databases and data sets varies based on the researcher’s identity. Multiple distributed databases or data sets may be searched using a single query.

2) Query - The query capabilities within the BioExtract Server are in the form of FIELD®OPERATOR®CONDITION.

Fields represent the features, qualifiers, annotations, and other text fields within the records.

Operators contextually depend on whether the field selected is numeric or text. For numeric fields, the operators include the relational operators. For text, the operator is always EQ (=), but the condition that can be specified supports GREP options (wildcards, missing characters, etc.).

By saving the results of a query to a data set, the researcher is able to subsequently query that data set. The set of query fields available for searching is the union of the available search fields for each selected data source.

3) Analyze Data - The researcher is provided with a list of analytic tools that may be applied to a result set. Based on the researcher’s tool selection, the data may be processed against a number of algorithms to automatically identify, correct, and annotate the data for many of the most common problems found in sequence mis-alignments or putative sequence identifications.

Depending on the analytic tool selected, the input into the tool may be entered directly, may be based on the current query result set, or may be the output from a previously executed tool.

4) View Results - After the researcher has executed a query, the results may be viewed in the detail screen of the BioExtract Server or by linking to the original data source.

5) Export - The results of a query may be exported locally by the researcher. Presently, the exports are in American Standard Code for Information Interchange (ASCII) with the format specified by the researcher based on a list of available formats.

6) Workflows - As the researcher works with the BioExtract server, “steps” are saved in the form of a workflow. Examples of steps might include executing a query, saving a result set, or running an analytic tool.

The researcher has the option of saving workflows, modifying workflows, executing workflows, or executing a single step contained within a workflow.

BioExtract Server Tools and Documentation --

BioExtract Server provides an extensive list of Bioinformatic Tools: Alignment Tools, Edit Tools, Information Tools, Nucleic Tools, Phylogeny Tools, Protein Tools, and Similarity Search Tools.

BioExtract Server provides an extensive Help section that covers and explains: Query, Extracts, Tools, Workflows, Groups, Tutorials and More, and FAQs. BioExtract Server also provides a very interesting Demo Workflow.

System Requirements

Web-based.

Manufacturer

Manufacturer Web Site BioExtract Server

Price Contact manufacturer.

G6G Abstract Number 20524

G6G Manufacturer Number 104141