Abstract The Distributed Annotation System (DAS) defines a communication protocol used to exchange annotations (see ‘Genome annotation’ below...) on genomic or protein sequences.

It is motivated by the idea that such annotations should Not be provided by single centralized databases, but should instead be spread over multiple sites.

Data distribution, performed by DAS servers, is separated from visualization, which is done by DAS clients.

The advantages of this system are that control over the data is retained by data providers, data is freed from the constraints of specific organizations and the normal issues of release cycles, Application Programming Interface (API) updates and data duplication are avoided.

DAS is a client-server system in which a single client integrates information from multiple servers.

It allows a single machine to gather up sequence annotation information from multiple distant web sites, collate the information, and display it to the user in a single view. Little coordination is needed among the various information providers.

DAS is heavily used in the genome bioinformatics community. Over the last years the manufacturers have also seen growing acceptance in the protein sequence and structure communities.

Genome annotation --

Genome annotation is the process of attaching biological information to sequences. It consists of two (2) main steps:

1) Identifying elements on the genome, a process called 'Gene Finding'; and

2) Attaching 'biological information' to these elements.

Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise.

Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

The basic level of annotation is using the Basic Local Alignment Search Tool (BLAST) for finding similarities, and then annotating genomes based on that.

However, nowadays more and more additional info is added to the annotation platform. The additional info allows manual annotators to deconvolute discrepancies between genes that are given the same annotation.

For example, the SEED database uses genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their Subsystems approach.

The Ensembl database relies on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.

What DAS can do --

A DAS-enabled website or application can aggregate complex and high- volume data from external providers in an efficient manner.

For the biologist, this means the ability to "plug in" the latest data, possibly including a user's own data.

For the application developer, this means protection from 'data format' changes and the ability to add new data with minimal development cost.

Here are some examples of DAS-enabled applications or websites for end users:

1) Ensembl - uses DAS to pull in genomic, gene and protein annotations. It also provides data via DAS.

2) Gbrowse - is a generic genome browser, and is both a consumer and provider of DAS (see G6G Abstract Number 20310).

3) Integrated Genome Browser (IGB) - is a desktop application for viewing genomic data.

4) SPICE - is an application for projecting protein annotations onto 3D structures.

5) Dasty2 - is a web-based viewer for protein annotations.

6) Jalview - is a multiple alignment editor.

7) PeppeR - is a graphical viewer for 3D electron microscopy data.

8) DASMI - is an integration portal for protein interaction data (see G6G Abstract Number 20487).

9) DASher - is a Java-based viewer for protein annotations.

10) EpiC - presents structure-function summaries for antibody design.

11) STRAP - is a STRucture-based sequence Alignment Program.

Note: Hundreds of DAS servers are currently running worldwide, including those provided by the European Bioinformatics Institute, Ensembl, the Sanger Institute, UCSC, WormBase, FlyBase, TIGR, and UniProt.

For a listing of all available DAS sources please visit the 'DAS Registry' (see below...).

Versions of DAS --

The original DAS specification was written by Lincoln Stein, Sean Eddy, and Robin Dowell. It is widely adopted and well supported, particularly throughout Europe, and is the basis for a large number of existing clients and servers.

The protocol has been developed incrementally since its inception and thus there are several successive versions of the specification, each expanding on the last while retaining a focus on backwards compatibility.

The current specification of DAS is version 1.53 and the upcoming version is 1.6.

Though mature, the protocol continues to be extended to cater for the needs of the DAS community via extensions to the specification. Together, these extensions form an "extended specification".

The current version is 1,53E and the upcoming version is 1.6E.

Note: The DAS/2 project is an entirely separate specification which although based on the DAS architecture is Not backwards compatible with existing servers and clients.

DAS Registry - Publishing and Discovery of DAS Servers --

The DAS registration server provides a repository where people can publish and share their DAS data sources (i.e., server locations and available data) with the community.

The DAS/1 and DAS/2 specifications do Not define how to publish or discover DAS data sources (servers).

Due to the success of DAS/1 and the large number of sources that are spread around the world it is Not easy to keep track of these.

The DAS registry provides a Web interface so users can search for available DAS/1 or DAS/2 data sources.

For DAS-clients, an XML interface is available that allows a programmatic way to retrieve data source listings, and several DAS clients now make use of this.

The BioDAS web site is hosted by the Open Bioinformatics Foundation.

Thanks to The Bioteam for providing and maintaining the bio servers.

