Abstract The ExPlain Analysis System promotes biological interpretation of high throughput experiments like microarrays, proteomic data, and ChIP-chip experiments. The intuitive workflow allows systematic creation of experimentally testable hypothesis for both gene transcription regulation and signaling networks. ExPlain is a novel tool for expression data analysis designed to investigate the regulation of genes of interest and uses this to construct a regulatory network model.

ExPlain relies on BIOBASE databases and software technology. It is designed to be installed on a central server, providing access to remote users. ExPlain analyses are initiated by loading an expression data table or selection of data from previous projects. During upload, the contents of table columns can be assigned to various categories so that the researcher is able to choose the most suitable column order.

Once the appropriate dataset has been compiled, ExPlain gateways a series of bioinformatics experiments from promoter to regulatory network analysis. ExPlain automatically retrieves promoter sequences, TRANSFAC matrices, sites or pathway information required for experiments on the corresponding dataset.

For maximal control and efficiency of workflow, ExPlain displays an interactive tree structure that allows the researcher to recover all previous results and also to continue from any node with different parameters. In addition, newly imported data, complete projects or project subsets can be freely recombined by a different set of operations.

ExPlain facilitates the following types of analysis:

1) Functional classification according to Gene Ontology (GO) terms, diseases terms, tissue/organ expression, and signaling pathways.

2) Mapping of putative transcription factor (TF) binding sites on promoters in focus.

3) Construction of promoter modules as combinations of individual binding sites and composite regulatory elements, thus suggesting TFs providing common regulation of differentially expressed genes.

4) Identification of key molecules upstream of TFs (kinases, adaptor proteins, receptors) that might be responsible for the coordinated regulation of the suggested TFs.

Affymetrix GeneChip Compatibility --

The ExPlain analysis system applies a new knowledge driven approach to the analysis of whole complexes of co-expressed genes. The internal Composite Module Analyst (CMA) is a ‘genetic algorithm’ for analysis and prediction of relevant promoters in the identified set of given genes obtained for sources such as Affymetrix GeneChip Arrays. This combinatorial analysis drops false positive rates significantly and enables scientists to find potential causes for specific cellular events.

The power of correct prediction in ExPlain is driven by TRANSFAC, a knowledge base of high quality, expert level, manually curated published scientific literature. TRANSFAC presents data on transcription factors, their experimentally-proven binding sites, and regulated genes.

The Combined Power of BIOBASE Data --

ExPlain utilizes the wealth of high quality functional and structural protein data contained in the manufacturers TRANSFAC, TRANSPATH and HumanPSDM products to expedite the creation of advanced in- silico experiments.

TRANSFAC -- is a unique collection of data on transcription factors, their experimentally verified binding sites, and regulated genes. Positional weight matrices are derived from the compiled binding sites. Matrices and sequences can then be used by MATCH and PATCH respectively, for the matrix-based or pattern-based search of binding sites within regulatory sequences, thereby allowing predictions to be made for hitherto uncharacterized gene promoters. TRANSFAC Professional also contains information on the structure, function and expression patterns of the transcription factors.

TRANSPATH -- provides data about molecules that participate in signal transduction pathways and the reactions in which they are involved, yielding a complex network of interconnected signaling components. The focus is on signaling cascades that change the activities of transcription factors thereby altering gene expression profiles.

TRANSPATH is the repository of choice for disclosing the upstream regulators and downstream targets of regulatory molecules. Using PathwayBuilder, any part of the overall network can be retrieved and visualized starting with any molecule or reaction. Array Analyzer identifies common key regulators in the signaling network.

HumanPSD -- is an up-to-date collection of information on characterized and uncharacterized proteins from human, mouse and rat. Annotation focuses on molecular function, biological role, and expression patterns across cells, tissues and organs, consequences of mutation, relationships to disease, and interactions between proteins and genes. Data quality is assured since key experimental results are reported together with the citation to the original publication. The power of this mammal-centred database is enhanced by extensive interconnections to model organism databases.

TRANSPRO -- a collection of human, mouse and rat promoter sequences within the TRANSFAC Professional Suite. It contains upstream (5’) regulatory sequences of human, mouse and rat genes, together with extensive annotation. For each promoter, TRANSPRO contains 10,000 nucleotides (nt) upstream and 1,000 nt downstream relative to the ‘virtual transcription start site (TSS)’. Any portion of a selected promoter sequence may be retrieved by indicating the desired relative positions with BIOBASE search tools (TRANSPLORER, MATCH, PATCH, and CATCH).

