Leonore Wigger
2013-Apr-10 16:51 UTC
[R] bnlearn: how to compute boot strength with mmhc and a blacklist
Dear R-help list: I have two related questions regarding the functions boot.strength and custom.strength in bnlearn. 1) I am using the following commands (on a set of continuous data, the example here run on fake data): > myblacklist<-data.frame(from=c("x1", "x1", "x1", "x2", "x2", "x2")) , to=c("x2", "n3", "n4", "x1"," n3", "n4")) > result <- boot.strength(data, R=200, algorithm="mmhc", algorithm.args=list(blacklist=myblacklist)) > bootstrength from to strength direction 1 n3 n4 1.000 0.5 2 n3 x1 0.340 0.5 3 n3 x2 0.115 0.5 4 n4 n1 1.000 0.5 5 n4 x1 0.080 0.5 6 n4 x2 1.000 0.5 7 x1 n3 0.340 0.5 8 x1 n4 0.080 0.5 9 x1 x2 0.000 0.0 10 x2 n3 0.115 0.5 11 x2 n4 1.000 0.5 12 x2 x1 0.000 0.0 Question: I have specified a blacklist. I would have expected this to completely disallow the arcs on the blacklist. But the result shows that some of the blacklisted arcs have a strength > 0 (rows 7,8,10,11). It seems that only the arc that was blacklisted in both directions was actually banned (x1-x2, in rows 9 and 12). What is the reason for this? Is there a way to completely disallow all blacklisted arcs, such that their strength is 0.0? Or is there a compelling reason why that should not be done? 2) In the documentation of custom.strength, the following code example is given: start = random.graph(nodes = names(learning.test), num = 50) netlist = lapply(start, function(net) { hc(learning.test, score = "bde", iss = 10, start = net) }) arcs = custom.strength(netlist, nodes = names(learning.test), cpdag = FALSE) This code makes 50 different networks from the same data, then uses them as input for custom.strength. The networks are constructed using the algorithm "hc". A different network is produced every time "hc" is invoked because a random starting network is supplied to the parameter "start". I would like to do the same thing, but use "mmhc" instead of "hc". However, in my hands, the networks that are constructed by "mmhc" are all identical, and I am not sure how to introduce a random element into the construction. Question: Which parameters do I need to give to "mmhc" in order to obtain a different network every time it is run on the same data set? Any help is greatly appreciated! -- Leonore Wigger University of Lausanne [[alternative HTML version deleted]]
Marco Scutari
2013-Apr-12 09:08 UTC
[R] bnlearn: how to compute boot strength with mmhc and a blacklist
Dear Leonore, On Wed, Apr 10, 2013 at 5:51 PM, Leonore Wigger <leonore.wigger at unil.ch> wrote:> Question: I have specified a blacklist. I would have expected this to > completely disallow the arcs on the blacklist. But the result shows that > some of the blacklisted arcs have a strength > 0 (rows 7,8,10,11). It > seems that only the arc that was blacklisted in both directions was > actually banned (x1-x2, in rows 9 and 12). What is the reason for this? > Is there a way to completely disallow all blacklisted arcs, such that > their strength is 0.0? Or is there a compelling reason why that should > not be done?Because by default boot.strength() runs with "cpdag = TRUE". This means that reversible arcs can have positive strength in both directions. You should set "cpdag = FALSE" to get the result you are expecting. In that case the probabilities of the arc directions should be taken with a grain of salt, as they can be influenced by many things (optimized = TRUE/FALSE, order of the variables in the data set) unless you are doing causal modelling.> This code makes 50 different networks from the same data, then uses them > as input for custom.strength. The networks are constructed using the > algorithm "hc". A different network is produced every time "hc" is > invoked because a random starting network is supplied to the parameter > "start". I would like to do the same thing, but use "mmhc" instead of > "hc". However, in my hands, the networks that are constructed by "mmhc" > are all identical, and I am not sure how to introduce a random element > into the construction. Question: Which parameters do I need to give to > "mmhc" in order to obtain a different network every time it is run on > the same data set?This is not surprising, because mmhc() does not have a "start" argument, so it's starting from the same network over and over. There is no way to provide a random seed to mmhc(), so the only way to perturb it is through bootstrap. Marco -- Marco Scutari, Ph.D. Research Associate, Genetics Institute (UGI) University College London (UCL), United Kingdom