Abstract ENDEAVOUR is a software application for the computational prioritization of 'candidates genes' through genomic data fusion, based on a set of training genes.

This web server allows users to prioritize candidate genes with respect to their biological processes or diseases of interest.

It is provided with an intuitive four-step wizard (see below) and an online manual. It is currently available for five organisms (H. sapiens, M. musculus, R. norvegicus, C. elegans, and D. melanogaster).

ENDEAVOUR relies on the similarity between the candidates and the models built with the training genes.

This approach has been validated experimentally, by extensive leave- one-out cross-validations, and by analysis of recently reported cases from the literature.

Additionally, several independent laboratories have used ENDEAVOUR to propose novel disease genes or to optimize the analysis of medium- throughput experiments.

Importantly, the cross-validation revealed the added value of combining several complementary data sources.

With 26 distinct data sources (51 in total) covering most aspects of the knowledge available on genes and gene products (functional annotations, protein interactions, expression profiles, regulatory information, sequence-based data and literature mining), ENDEAVOUR exploits the most comprehensive collection of publicly available knowledge.

ENDEAVOUR four-step wizard --

The four-step wizard guides the user through the preparation of the prioritization.

The first step is to choose one of the available organisms.

The second step is to specify the training set. The user can input a mixture of chromosomal bands, chromosomal intervals, gene symbols, EnsEMBL gene identifiers, KEGG identifiers, Gene Ontology identifiers or OMIM disease names.

Each input has to be prefixed according to its type. The genes corresponding to the input are retrieved and loaded into the application.

The third step is to select the data sources to be used to build the models.

The data sources available depend on the organism chosen in the first step. Some of these are species specific (e.g. gene expression data sets) while others are more generic (e.g. Gene Ontology annotations).

The last (4th) step lets the user specify the 'candidate genes' applying the same rules as in the second step.

The user launches the prioritization by using a dedicated button. The computation time is dependent on the number of data sources used, the number of candidates and the load on the manufacturer's servers.

The application can handle the prioritization of hundreds of genes (e.g. the average computation time for 400 candidates using 10 data sources is 19.14 seconds over 100 repeats). Warnings and errors, such as unrecognized gene identifiers, are displayed in the console located in the middle of the main windows.

The results are displayed at the bottom of the main page in three (3) panels --

The first panel contains the 'sprint plot', a graphical representation of the rankings with one column per data source plus an additional one for the global ranking.

The genes are represented as boxes and the top ranking boxes are colored for better interpretation of the results.

The second panel contains the raw scores and ranks for each gene in each data source. The user can sort the columns according to the global ranking or to any ranking per data source.

The third panel allows one to export the results as a tab-separated values (TSV) spreadsheet or as an XML file. The user can also save the sprint plot using several picture formats (i.e. PNG, JPG and GIF).

ENDEAVOUR additional info --

ENDEAVOUR is designed as a generic prioritization tool and is equally useful for the prioritization of candidate disease genes as for candidate members of 'biological pathways' and processes.

The manufacturer has designed the web server so that the organism- specific versions use the same method for each generic data source (e. g. Gene Ontology annotations).

Note: The key strength of ENDEAVOUR resides in the fact that a lot of data sources are available and the user can select the ones that best correspond to the biological question under study.

As an alternative to the web-based application (web server), one can use the manufacturer's Java Web Start client.

This application includes a few additional features, such as a full description of the models created, a full genome screening service in which the whole genome of the given organism can be prioritized and the possibility for users to make use of their own microarray data sets.

A SOAP service is also available to allow integration in workflows.

ENDEAVOUR comes with an online manual. A subsection describes the concept of gene prioritization through genomic data fusion.

Another subsection contains the answers to frequently asked questions and gives more details on how to perform a prioritization and how to interpret the results.

Finally, a step-by-step example is given together with the corresponding screenshots.

