Bayesian Analysis of Gene Expression Levels (BAGEL)

Category Genomics>Gene Expression Analysis/Profiling/Tools

Abstract Bayesian Analysis of Gene Expression Levels (BAGEL) is a program that allows statistical inferences to be made regarding differential gene expression between two or more samples measured on spotted (two-channel) microarrays.

BAGEL makes these inferences from 'normalized ratio data', on a gene- by-gene basis. The advantages of BAGEL include ease of use, straight forward interpretation of results, statistical robustness, flexibility in accepting different experimental designs, and that it is free.

Statistical model --

A number of factors can influence the 'signal intensity' of labeled DNA hybridizing to a microarray spot, such as hybridization efficiency or concentration of target sequences in the spot.

Any such factors that will be shared by samples hybridizing to the same spot will be eliminated by considering the ratio of the two signal intensities.

BAGEL explicitly takes this into account by using ratio measurements, Not single-channel signals, coming from two-dye competitive hybridizations.

BAGEL makes transitive comparisons across ratios, for example inferring the ratio of sample A to sample C across a set of hybridizations that directly compare sample A to sample B and sample B to sample C.

Data that are appropriate for analysis by BAGEL must therefore have the following properties:

1) The data should be collected in such a way that pairs of samples share sources of variation that are non-trivial and that are Not of interest to the researcher; and the relative magnitude of some metric between the two samples is the measurement of interest.

The originally envisioned use for BAGEL, two-channel microarray data, is an obvious example of data with this structure, but in principle BAGEL could be used to analyze any other kind of data that fit these criteria.

For this reason, tiled microarray platforms such as Affymetrix do Not lend themselves easily to BAGEL analysis.

2) All genotypes, tissues, treatments etc. (‘expression nodes’) to be analyzed must be connected to all other nodes through direct or indirect comparisons.

For example, given four (4) nodes, a set of microarray experiments that competitively hybridized node 1 vs. node 2, node 2 vs. node 3, and node 3 vs. node 4 would permit the estimation of relative expression levels of genes across all four nodes.

On the other hand, a set of experiments that competitively hybridized node 1 vs. node 2 and node 3 vs. node 4 would permit the estimation of gene expression levels between nodes 1 and 2 and between 3 and 4, but No estimates could be made regarding comparisons between 1 and 3.

Any experimental design incorporating a reference sample will necessarily fulfill this criterion, as all nodes are connected through their direct comparisons with the reference.

3) There must be a sufficient number of measurements (replicate experiments) to estimate the parameters. In principle, this means as many measurements as nodes, or half as many hybs as nodes (when estimating a single variance parameter).

In practice, requirement 2 will usually necessitate more than this many hybs, and of course, the greater the replication, the more precise the estimates of gene expression.

Experience suggests that an experimental design providing at least three measurements for each node is a good target number for providing reasonable statistical power.

Formally, the 'statistical model' employed by BAGEL assumes that the measured 'fluorescence intensity' for one channel is a function of (1) the true quantity of the labeled mRNA species;

(2) some number of multiplicatively and/or additively confounding factors that are specific to the spot in question but shared by the measured intensity from the other channel; and

(3) some number of unbiased, randomly distributed error terms (for example, reverse transcription or labeling efficiency.).

BAGEL explores the 'likelihood function' derived from either of two (2) ratio formulas (models--Not shown here) for all nodes using a Markov Chain Monte Carlo (MCMC) approach in a Bayesian framework.

This method starts with a random vector of parameters and then changes two of the parameters by small, random steps.

At each step the likelihood of the data of a given model and the new parameter values is calculated. If the new parameters give a better fit to the data, then the new values are accepted.

If the new parameters give a worse fit to the data, then the new values are accepted with a probability proportional to their likelihood.

In this way the markov chain searches the parameter space, finding combinations of relative 'gene expression levels' that produce the greatest likelihood, and samples from the chain are used to construct the Bayesian posterior probability of the parameters given the data.

BAGEL infers relative expression levels and statistical significance from the parameter values it samples from the chain.

Implementation --

BAGEL does Not perform normalization of raw microarray data (for example, to account for systematic differences in signal intensity between the two fluorophores), and an appropriate normalization method should therefore be implemented prior to BAGEL analysis.

Following normalization, the data must be formatted in a way that BAGEL can use.

The appropriate format is a tab-delimited text file that contains (normalized) ratio data for all the relevant genes and hybridizations, as well as some header rows and columns.

System Requirements

BAGEL is available for Windows, Mac OS9, Mac OSX, and Linux.

Manufacturer

Manufacturer Web Site BAGEL

Price Contact manufacturer.

G6G Abstract Number 20402

G6G Manufacturer Number 104033