Leonore Wigger
2013-Apr-10 16:51 UTC
[R] bnlearn: how to compute boot strength with mmhc and a blacklist
Dear R-help list:
I have two related questions regarding the functions boot.strength and
custom.strength in bnlearn.
1)
I am using the following commands (on a set of continuous data, the
example here run on fake data):
> myblacklist<-data.frame(from=c("x1", "x1",
"x1", "x2", "x2",
"x2")) , to=c("x2", "n3", "n4",
"x1"," n3", "n4"))
> result <- boot.strength(data, R=200, algorithm="mmhc",
algorithm.args=list(blacklist=myblacklist))
> bootstrength
from to strength direction
1 n3 n4 1.000 0.5
2 n3 x1 0.340 0.5
3 n3 x2 0.115 0.5
4 n4 n1 1.000 0.5
5 n4 x1 0.080 0.5
6 n4 x2 1.000 0.5
7 x1 n3 0.340 0.5
8 x1 n4 0.080 0.5
9 x1 x2 0.000 0.0
10 x2 n3 0.115 0.5
11 x2 n4 1.000 0.5
12 x2 x1 0.000 0.0
Question: I have specified a blacklist. I would have expected this to
completely disallow the arcs on the blacklist. But the result shows that
some of the blacklisted arcs have a strength > 0 (rows 7,8,10,11). It
seems that only the arc that was blacklisted in both directions was
actually banned (x1-x2, in rows 9 and 12). What is the reason for this?
Is there a way to completely disallow all blacklisted arcs, such that
their strength is 0.0? Or is there a compelling reason why that should
not be done?
2) In the documentation of custom.strength, the following code example
is given:
start = random.graph(nodes = names(learning.test), num = 50)
netlist = lapply(start, function(net) { hc(learning.test, score =
"bde",
iss = 10, start = net) })
arcs = custom.strength(netlist, nodes = names(learning.test), cpdag = FALSE)
This code makes 50 different networks from the same data, then uses them
as input for custom.strength. The networks are constructed using the
algorithm "hc". A different network is produced every time
"hc" is
invoked because a random starting network is supplied to the parameter
"start". I would like to do the same thing, but use "mmhc"
instead of
"hc". However, in my hands, the networks that are constructed by
"mmhc"
are all identical, and I am not sure how to introduce a random element
into the construction. Question: Which parameters do I need to give to
"mmhc" in order to obtain a different network every time it is run on
the same data set?
Any help is greatly appreciated!
--
Leonore Wigger
University of Lausanne
[[alternative HTML version deleted]]
Marco Scutari
2013-Apr-12 09:08 UTC
[R] bnlearn: how to compute boot strength with mmhc and a blacklist
Dear Leonore, On Wed, Apr 10, 2013 at 5:51 PM, Leonore Wigger <leonore.wigger at unil.ch> wrote:> Question: I have specified a blacklist. I would have expected this to > completely disallow the arcs on the blacklist. But the result shows that > some of the blacklisted arcs have a strength > 0 (rows 7,8,10,11). It > seems that only the arc that was blacklisted in both directions was > actually banned (x1-x2, in rows 9 and 12). What is the reason for this? > Is there a way to completely disallow all blacklisted arcs, such that > their strength is 0.0? Or is there a compelling reason why that should > not be done?Because by default boot.strength() runs with "cpdag = TRUE". This means that reversible arcs can have positive strength in both directions. You should set "cpdag = FALSE" to get the result you are expecting. In that case the probabilities of the arc directions should be taken with a grain of salt, as they can be influenced by many things (optimized = TRUE/FALSE, order of the variables in the data set) unless you are doing causal modelling.> This code makes 50 different networks from the same data, then uses them > as input for custom.strength. The networks are constructed using the > algorithm "hc". A different network is produced every time "hc" is > invoked because a random starting network is supplied to the parameter > "start". I would like to do the same thing, but use "mmhc" instead of > "hc". However, in my hands, the networks that are constructed by "mmhc" > are all identical, and I am not sure how to introduce a random element > into the construction. Question: Which parameters do I need to give to > "mmhc" in order to obtain a different network every time it is run on > the same data set?This is not surprising, because mmhc() does not have a "start" argument, so it's starting from the same network over and over. There is no way to provide a random seed to mmhc(), so the only way to perturb it is through bootstrap. Marco -- Marco Scutari, Ph.D. Research Associate, Genetics Institute (UGI) University College London (UCL), United Kingdom