Abraham Mathew
2012-Sep-26 17:06 UTC
[R] Specifying a response variable in a Bayesian network
I'm trying to teach myself about Bayesian Networks and am working with the following data and the bnlearn package. I understand the conceptual aspects of BNs, but I'm not sure how to specify the response variables in R when constructing a dag plot. I've cecked ?hc and done numerous google searches without luck. Can anyone help? library("bnlearn") library("Rgraphviz") dat=data.frame(won=c(1,0,0,1,0,0), sold=c(0,0,0,1,0,0), insured=c(0,0,1,0,0,1), credit=c("POOR","FAIR","GOOD","FAIR","FAIR","GOOD")) dat$won = factor(dat$won) dat$sold = factor(dat$sold) dat$insured = factor(dat$insured) dat$credit = factor(dat$credit) highlight.opts <- list(nodes = c("won","sold","insured","credit"), col = "red", fill = "grey") bn.hc <- hc(dat, score = "aic") graphviz.plot(bn.hc, highlight=highlight.opts) Thanks, Abraham -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]]
Marco Scutari
2012-Sep-27 09:46 UTC
[R] Specifying a response variable in a Bayesian network
Hi Abraham, On Wed, Sep 26, 2012 at 6:06 PM, Abraham Mathew <abmathewks at gmail.com> wrote:> I'm trying to teach myself about Bayesian Networks and am working with the > following data and the bnlearn package. > I understand the conceptual aspects of BNs, but I'm not sure how to specify > the response variables in R when constructing > a dag plot. I've cecked ?hc and done numerous google searches without luck.The idea of a response variable is a bit alien to BNs, which treat all variables in the same way and just explore their dependence structure. That's why there is no built-in way to single out a response variable in any of the structure learning algorithm. You can: 1) let hc() learn the structure of the BN and make inference on the posterior distribution of the variable you are interested in. Since the BN in your example is discrete, you can use either cpquery()/cpdist() from bnlearn or the functions in the gRain package to do that; 2) blacklist any outgoing arc incident on the response, so that the you get the distribution of the response conditional on the explanatory variables (that have a significant effect) as part of the network; 3) you can use one of the BN classifiers in bnlearn, naive.bayes()/tree.bayes(), which handle the concept of a response variable more naturally than general-purpose BNs. Hope it helps, Marco -- Marco Scutari, Ph.D. Research Associate, Genetics Institute (UGI) University College London (UCL), United Kingdom