Abstract A-MADMAN (Annotation-based MicroArray Data Meta-ANalysis tool) is an open source web application for the meta-analysis of Affymetrix data contained in the Gene Expression Omnibus (GEO) - (see G6G Abstract Number 20013).

A-MADMAN allows retrieving, annotating, organizing and analyzing gene expression datasets.

In particular, A-MADMAN addressees several of previously stated open issues in the meta-analysis of gene expression data allowing the integrative analysis of data obtained from different Affymetrix platforms through custom Chip Definition Files (CDFs) and meta-normalization and the sharing of analysis flows and results.

A-MADMAN creation motivation --

Conducting meta-analyses of public available data can be a daunting task for many reasons:

1) Incomplete Annotation of datasets.

2) It can be hard to track the relationship between patient, biological samples and cell intensity (CEL) files (the most granular data object in repositories).

3) Managing a large number of data files and related Meta data can be very tedious and error prone and can limit the reproducibility of analyses.

4) Software to analyze microarray data is widely available and very good open source implementations can be obtained for free but can be cumbersome to use and require some administrative overhead:

A-MADMAN Architecture --

A-MADMAN is a web application that allows retrieving gene expression datasets from GEO, annotating and locally organizing the downloaded samples, and generating an R object (ExpressionSet) which contains the integrated expression levels and all available metadata and sample characteristics.

The gene expression data are obtained through a meta-analysis approach which includes signal generation, probe re-annotation into gene-centered identifiers, merging of expression levels from different experiments and a normalization step.

A-MADMAN generates an ExpressionSet object in which the meta- expression levels from multiple experiments are completed by GEO- derived and user-defined metadata.

The final ExpressionSet contains all the necessary information to perform, directly in R, or any higher level analysis [e.g., SAM (see G6G Abstract Number 20066) or LIMMA] of all downloaded and integrated data.

A-MADMAN web application comprises a console, a job server and a web-application. The console is needed for the first phases of data retrieval, import and database filling.

It performs the automatic download and organization, in a proper and transparent file system hierarchy, of raw data and annotations from the Gene Expression Omnibus, starting from a configuration file listing the accession numbers of GEO series and/or samples to download.

Metadata of GEO records are automatically imported into a local relational database to assist subsequent manual annotation and selection of samples from the web application.

The job server is in charge of the asynchronous execution of jobs which, depending on data size and algorithm, can be computationally intensive and take longer than allowed by a Hypertext Transfer Protocol (HTTP) response-request cycle.

The core of the framework is the web application, whose user friendly front-end facilitates data organization, annotation and analysis.

A-MADMAN features/capabilities --

A-MADMAN aims to lower the bar for starting a meta-analysis study by offering these features:

1) Automatic download and organization in a proper and transparent file system hierarchy of GEO raw data and annotations given a simple configuration file.

2) Automatic import of Meta data from GEO records into a local relational database to assist in subsequent manual annotation and selection of samples.

3) A flexible annotation system based on tags (a la web 2.0).

4) A user friendly 'Assignment interface' to assist the user in matching samples to individuals. (Patient or equivalent statistical unit).

5) Samples to analyze together are selected with an arbitrary complex logical query based on tags, for example: 'young and dystrophy and Not (Becker or limb-girdle)' and placed in a named object called ‘basket’.

6) Analysis are conducted by the R back-end powered by packages of the 'BioConductor project' (downloaded and installed automatically the first time they are required) - (BioConductor is an open source and open development software project for the analysis and comprehension of genomic data).

A basic workflow (named 'vanilla') is provided that comprise the following steps:

System Requirements

A-MADMAN supports a collaborative working style for local or geographically dispersed teams through LAN or Internet deployment options, but can also be used by a single researcher on ones Windows® Personal Computer installing an all-in-one package that bundles all required dependencies except R.

The software, written in python, is based on the popular Django web framework and uses GNU R as a back-end.


