bnlearn

Category Intelligent Software>Bayesian Network Systems/Tools

Abstract bnlearn is an R software package that provides a free implementation of some of the Bayesian network structure learning algorithms that have appeared in literature, enhanced with algorithmic optimizations and support for parallel computing.

Many score functions and conditional independence tests are provided for both independent use and the learning algorithms themselves.

bnlearn is designed to provide the versatility needed to handle experimental data analysis.

It handles both discrete and continuous data, and it supports any combination of the implemented learning algorithms and either network scores (for score-based algorithms) or conditional independence tests (for constraints-based algorithms).

Furthermore, it simplifies the analysis of the learned networks by providing a single object class (bn) for all the algorithms, and a set of utility functions to perform descriptive statistics and basic inference procedures.

bnlearn implements some algorithms for learning the structure of Bayesian networks --

1) Constraint-based algorithms, also known as conditional independence learners, are all optimized derivatives of the Inductive Causation algorithm.

These algorithms use conditional independence tests to detect the Markov Blankets (MBs) of the variables, which in turn are used to compute the structure of the Bayesian network.

2) Score-based learning algorithms are general purpose heuristic optimization algorithms which rank network structures with respect to a goodness-of-fit score.

3) Hybrid algorithms combine aspects of both constraint-based and score-based algorithms, as they use conditional independence tests (usually to reduce the search space) and network scores (to find the optimal network in the reduced space) at the same time.

Several functions for parameter estimation, parametric inference, bootstrap, cross-validation and stochastic simulation are also available.

Furthermore, advanced plotting capabilities are implemented on top of the Rgraphviz and lattice packages.

Available constraint-based learning algorithms --

1) Grow-Shrink (GS) - based on the Grow-Shrink Markov Blanket, the first (and simplest) Markov Blanket detection algorithm used in a structure learning algorithm.

2) Incremental Association (IAMB) - based on the Markov Blanket (MB) detection algorithm of the same name, which is based on a two-phase selection scheme (a forward selection followed by an attempt to remove false positives).

3) Fast Incremental Association (Fast IAMB) - a variant of IAMB which uses speculative stepwise forward selection to reduce the number of conditional independence tests.

4) Interleaved Incremental Association (Inter IAMB) - another variant of IAMB which uses forward stepwise selection to avoid false positives in the Markov Blanket detection phase.

The bnlearn package includes three (3) implementations of each algorithm:

1) An optimized implementation (used when the optimized parameter is set to TRUE), which uses backtracking to roughly halve the number of independence tests.

2) An unoptimized implementation (used when the optimized parameter is set to FALSE) which is better at uncovering possible erratic behavior of the statistical tests.

3) A cluster-aware implementation, which requires a running cluster set-up with the makeCluster function from the Simple Network Of Workstations (SNOW) package.

The computational complexity of these algorithms is polynomial in the number of tests, usually O(N^2) and [O(N^4) in the worst case scenario], where N is the number of variables. Execution time scales linearly with the size of the data set.

Available score-based learning algorithms --

1) Hill-Climbing (HC) - a hill climbing greedy search on the space of the directed graphs. The optimized implementation uses score caching, score decomposability and score equivalence to reduce the number of duplicated tests.

2) Tabu Search (TABU) - a modified hill climbing able to escape local optima by selecting a network that minimally decreases the score function.

Note: Random restart with a configurable number of perturbing operations is implemented for both algorithms.

Available hybrid learning algorithms --

1) Max-Min Hill-Climbing (MMHC) - a hybrid algorithm which combines the Max-Min Parents and Children algorithm (to restrict the search space) and the Hill-Climbing algorithm (to find the optimal network structure in the restricted space).

2) Restricted Maximization (RSMAX2) - a more general implementation of Max-Min Hill-Climbing, which can use any combination of constraint-based and score-based algorithms.

Other (constraint-based) local discovery algorithms --

These types of algorithms learn the structure of the undirected graph underlying the Bayesian network, which is known as the skeleton of the network or the correlation graph.

Therefore, all the arcs are undirected, and No attempt is made to detect their orientation. They are often used in hybrid learning algorithms.

1) Max-Min Parents and Children (MMPC) - a forward selection technique for neighborhood detection based on the maximization of the minimum association measure observed with any subset of the nodes selected in the previous iterations.

All these types of algorithms have three (3) implementations (unoptimized, optimized and cluster-aware) like the other constraint-based algorithms.

Both discrete (multinomial) and continuous (multivariate normal) data sets are supported.

Each constraint-based algorithm can be used with several conditional independence tests:

1) Discrete data (multinomial distribution) --

a) Mutual information (both parametric and permutation tests).
b) Shrinkage-estimator for the mutual information.
c) Pearson's X^2 (both parametric and permutation tests).
d) Akaike Information Criterion test.

2) Continuous data (multivariate normal distribution) --

a) Linear correlation (both parametric and permutation tests).
b) Fisher’s Z (both parametric and permutation tests).
c) Mutual information (both parametric and permutation tests).

And each score-based algorithm can be used with several score functions:

1) Discrete data (multinomial distribution) --

a) The multinomial log-likelihood;
b) The Akaike Information Criterion (AIC);
c) The Bayesian Information Criterion (BIC);
d) A score equivalent Dirichlet posterior density (BDe); and
e) The logarithm of the K2 score.

2) Continuous data (multivariate normal distribution) --

a) The multivariate Gaussian log-likelihood;
b) The corresponding Akaike Information Criterion (AIC);
c) The corresponding Bayesian Information Criterion (BIC); and
d) A score equivalent Gaussian posterior density (BGe).

System Requirements

Contact manufacturer.

Manufacturer

Marco Scutari
Ph.D. School in Statistical Sciences
Department of Statistical Sciences
University of Padova, Italy

Manufacturer Web Site bnlearn

Price Contact manufacturer.

G6G Abstract Number 20699

G6G Manufacturer Number 104271

The G6G Directory of Omics and Intelligent Software

bnlearn