ARACNE
Category Cross-Omics>Pathway Analysis/Gene Regulatory Networks/Tools
Abstract ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks), a novel algorithm, using microarray expression profiles, specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems.
On synthetic datasets ARACNE achieves extremely low error rates and significantly outperforms established methods, such as Relevance Networks and ‘Bayesian Networks’.
Application to the deconvolution of the ‘genetic networks’ in human B cells demonstrates ARACNE’s ability to infer validated transcriptional targets of the c-MYC proto-oncogene.
The ARACNE algorithm is an information-theoretic method for identifying transcriptional interactions between gene products using ‘microarray expression’ profile data.
Similar to other algorithms, ARACNE predicts potential functional associations among genes, or novel functions for uncharacterized genes, by identifying statistical dependencies between gene products.
However, based on biochemical validation, literature searches and DNA binding site enrichment analysis, ARACNE has also proven effective in identifying bona fide transcriptional targets, even in complex mammalian networks.
Thus the manufacturer envisions that predictions made by ARACNE, especially when supplemented with prior knowledge or additional data sources, can provide appropriate hypotheses for the further investigation of ‘cellular networks’.
Besides ‘gene expression’ profile data, the algorithm’s theoretical basis readily extends to a variety of other high-throughput measurements, such as pathway-specific or genome-wide proteomics, microRNA and metabolomics data.
As these data become readily available, the manufacturer expects that ARACNE might prove increasingly useful in elucidating the underlying interaction models.
For a microarray data set containing ~10,000 probes, reconstructing the network around a single probe completes in several minutes using a desktop computer with a Pentium 4 (class) processor.
Reconstructing a genome-wide network generally requires a computational cluster, especially if the recommended bootstrapping procedure is used.
ARACNE overcomes many limitations --
ARACNE overcomes many limitations of existing algorithms: it has a low polynomial computational complexity; it uses the full dynamic range of the data instead of relying on (arbitrary) discretizations; and it does Not make assumptions about the underlying network topology.
These properties have enabled ARACNE to be successfully applied to a system-wide reconstruction of complex transcriptional networks in human B cells.
In contrast to many methods that have Not been biochemically validated, ARACNE’s predictions have been validated for the MYC proto-oncogene by chromatin immunoprecipitation assays (ChIPs), which have shown that MYC binds in vivo to the regulatory region of 11 out of 12 genes selected among those inferred by the algorithm.
When further combined with literature analysis, over 50% of the MYC targets inferred by the algorithm were validated. More recently, similar results were achieved for other transcription factors (TFs), including BCL6 and NOTCH1.
ARACNE’s performance has also been studied on the reconstruction of synthetic biochemical networks and it has been shown to significantly outperform other algorithms in this setting.
Finally, the theoretical limitations of the algorithm have been characterized to asymptotically reconstruct networks exactly under certain assumptions.
In particular, ARACNE has been shown to have a low false-positive rate, which makes it appealing in terms of further biochemical validation of its predictions.
ARACNE's limitations --
Given the extreme complexity of cellular networks, the manufacturer does Not expect these results to generalize to all cases.
For example, due to the focus on reducing false positives, ARACNE might miss a significant number of targets of a TF that is involved in a large number of feedback or feed-forward loops.
Additionally, ARACNE is sensitive to the ranking of the mutual information (MI) estimates. Thus inhomogeneous noise sources that change the rankings might lead to reconstruction errors.
Furthermore, ARACNE is Not designed to directly reconstruct complex combinatorial regulation patterns involving multiple independent TFs, although it might identify such interactions one TF at a time.
For instance, in the B-cell network, interactions of a gene with multiple TFs are frequent, suggesting a cooperative regulation mechanism. An additional limitation, germane to all ‘microarray expression’ profile analysis methods, is that ARACNE relies on the assumption that the mRNA of a TF is correlated with that of its targets.
This assumption might be violated for many TFs that are post-transcriptionally regulated, or if the cells under investigation have Not reached equilibrium.
Furthermore, as microarray expression profiles only monitor a subset of the interacting species in a biochemical network, many transcriptional interactions might be undetectable.
Due to these limitations, as with any biological assay, predictions made by ARACNE should be used in conjunction with prior knowledge and with additional data (such as promoter region sequence information, ChIP-on-Chip and existing interactomes) to provide a useful tool to biologists attempting to dissect s pecific transcriptional pathways.
ARACNE's process --
For a set of gene expression measurements that characterize a specific cellular system across diverse phenotypic conditions, the method (process) described below can be used to infer candidate direct regulatory relationships between gene products, as well as to predict broader functional relationships.
ARACNE generates a putative transcriptional network in two (2) computational steps.
First, gene pairs that exhibit correlated transcriptional responses are identified by measuring the mutual information (MI) between their mRNA expression profiles.
MI is arguably the best measure of statistical correlation in a non-linear setting. Key elements in this step are determination of the parameters for computation of the MI (i.e., the kernel width of the estimator), and of the MI threshold for statistical independence.
In the second step, ARACNE eliminates those statistical dependencies that might be of an indirect nature, such as between two genes that are separated by intermediate steps in a transcriptional cascade. Such genes will likely have correlated expression profiles, resulting in high MI, and might otherwise be selected as candidate interacting genes.
Indirect interactions are eliminated by applying a well known property of MI called the Data Processing Inequality (DPI). Given a TF, application of the DPI, under appropriate assumptions, will thus generate predictions about which other genes might be its direct transcriptional targets or its upstream transcriptional regulators.
After this step, some additional filtering and post-processing procedures might be applied. The final result is a matrix of candidate interactions, also called an adjacency matrix, which can be used for further network visualization and analysis.
System Requirements
Contact manufacturer.
Manufacturer
- Department of Biomedical Informatics
- And
- Joint Centers for Systems Biology
- Columbia University
- New York, New York 10032
- USA
Manufacturer Web Site ARACNE
Price Contact manufacturer.
G6G Abstract Number 20588
G6G Manufacturer Number 104191