Abstract WhichGenes is a web-based tool for gathering, building, storing and exporting gene sets for application in gene set enrichment analysis (GSEA).

WhichGenes offers a very simple interface to extract always-updated gene lists from multiple databases and unstructured biological data sources.

While the user can specify new gene sets of interest by following a simple four-step wizard, the tool is able to run several queries in parallel.

Every time a new set is generated, it is automatically added to the private gene-set cart and the user is notified by an e-mail containing a direct link to the new set stored in the server.

WhichGenes provides functionalities to edit, delete and rename existing sets as well as the capability of generating new ones by combining previous existing sets (intersection, union and difference operators).

The user can export his sets configuring the output format and selecting among multiple gene identifiers.

In addition to the user-friendly environment, WhichGenes allows programmers to access its functionalities in a programmatic way through a Representational State Transfer web service.

It also allows researchers to elaborate on custom hypotheses in the form of lists of genes in order to further use them as input to existing Gene Set Analysis (GSA) tools.

Gene set data sources --

WhichGenes retrieves and integrates different data for constructing target gene sets by accessing diverse sources of biological knowledge. Each gene set is created by running a given query over the user specified data source.

There are two (2) types of queries depending on the source of info:

1) a free-text query data source and 2) a catalog-based query data source.

In the first case, the user specifies a query using a text box, i.e. writing ‘leukemia’ in order to retrieve those genes related to this disease from ‘GeneCards’.

In the second case, catalog-based queries force the user to select one or more terms from a fixed catalog displayed in a list or a tree.

Catalog-based queries are used, for example, to retrieve genes annotated with a particular GO term that are involved in a desired pathway or found in a specific chromosome location.

Currently, WhichGenes can extract genes from:

For Homo sapiens --

1) GeneCards Disease Genes - Uses Gene Cards disease genes query to retrive a gene list (see G6G Abstract Number 20170).

2) Gene Ontology - Genes annotated with a Gene Ontology (GO) (via AmiGO - AmiGO is the official web-based set tools for searching and browsing the Gene Ontology database).

3) KEGG Pathways - Genes involved in KEGG Pathways.

4) Biocarta Pathways - Genes involved in Biocarta Pathways (see G6G Abstract Number 20264).

5) Reactome Pathways - Genes involved in Reactome Pathways (see G6G Abstract Number 20267).

6) MSigDB Positional GeneSets - GeneSets of the MSigDB's C1 Collection (GSEA). (C1 Collection - Positional gene sets - Gene sets corresponding to each human chromosome and each cytogenetic band that has at least one gene).

7) TargetScan microRNA targets - Uses TargetScan to retrieve genes which are microRNA targets.

8) Ensembl genes in bands - Uses Ensembl to provide genes in given locations or ranges.

9) miRBase Targets - Uses miRBase Targets version 5 to retrieve genes as targets of microRNAs.

10) Chemical CTD - Chemical Interacting genes.

11) Diseases CTD - Genes related to diseases via the Comparative Toxicogenomics Database (CTD).

12) CancerGenes - Genes in Large-scale Sequencing Cancer Genome Projects via CancerGenes.

13) IntAct interacting genes - Uses IntAct to get genes whose proteins interact with the protein of a given gene(s).

14) Sanger Decipher - Genes related to syndromes in Sanger's Decipher DB.

For mus musculus --

1) KEGG Pathways - Genes involved in KEGG Pathways.

2) Biocarta Pathways - Genes involved in Biocarta Pathways.

3) Reactome Pathways - Genes involved in Reactome Pathways.

4) Ensembl genes in bands - Uses Ensembl to provide genes in given locations or ranges.

5) miRBase Targets - Uses miRBase Targets version 5 to retrieve genes as targets of microRNAs.

6) Chemical CTD - Chemical Interacting genes.

7) Diseases CTD - Genes related to diseases via CTD.

WhichGenes provides a set of unique features which are Not currently available in other tools: a very intuitive interface, up-to-date access to supported databases, many sources of information, possibility of downloading several sets of genes in an unique file, support for standard output files for GSEA (.gmt, .gmx), possibility of exploring sub- trees from certain sources (i.e. GO) in order to make more specific queries, etc.

The ‘database-free’ nature of WhichGenes implies that external data sources are accessed just in time, retrieving up-to-date information without using obsolete local mirrors of existing data sources.

The user can Not only retrieve and integrate ‘interesting’ gene lists from multiple repositories automatically, but also combine these sets (with the ‘and’, ‘or’ and ‘difference’ operators) to build more complex hypotheses.

As different namespaces are commonly used, WhichGenes automatically standardizes all the generated sets to HGNC and MGI symbols in order to correctly perform merging operations.

However, WhichGenes can export gene sets in commonly used text- based formats where the final gene namespace can be changed to a more appropriate one, including multiple popular database gene identifiers and several microarray probe set IDs.

For additional product info on Gene Set Enrichment Analysis (GSEA) software see (see G6G Abstract Number 20266).

