Hi Marco Thanks for your prompt reply. First, I have been using the parse(eval()) convention because I saw it used in some example code for running cpquery, but am happy to drop this practice. I have tried running the cpquery in the debug mode, and found that it typically returns the following for instances where the conditional probability is returned as 0: > event matches 0 samples out of 0 (p = 0) Am I right in understanding that the Monte Carlo sampling has been unable to create any cases that match the query? If so, why would this be if the evidence used is very typical of an average case in the data used to train the network? Also, I have run predict on this network and get very good correlations between the predicted and actual observations (r-squared 0.8 - 0.9). Why would it be that the network can return near perfect predictions can be so good for a test set while the conditional probabilities remain at zero for when exploring the same data set? I think that I must be missing something in my deployment of the package or the interpretation of the output. Many thanks for your help. Ross On Fri, July 29, 2016 7:34 pm, Marco Scutari wrote:> Hi Ross, > > > first, I have a side question: is there a particular reason why you are > using parse(eval()) in your queries? I know sometimes there is no other > solution if you only use exported functions, but you should try not to. It > makes for brittle code that breaks easily depending on how variables are > scoped. > > On 29 July 2016 at 07:37, <ross.chapman at ecogeonomix.com> wrote: > >> However, if I replace EST=='x' with EST=='z' or EST=='y' I get 0 >> probability of obtaining a value for ABW that is either greater or less >> than the threshold. >> >> For example: >> >> >>> cpquery(fitted,event=(ABW>=11), evidence=eval(parse(text="(EST=='y' & >>> TR>9 >>> >> & BU>15819 & RF>2989)")),n=10^6) >> >> >> [1] 0 >> >> >> and >> >>> cpquery(fitted,event=(ABW<=11), evidence=eval(parse(text="(EST=='y' & >>> TR>9 >>> >> & BU>15819 & RF>2989)")),n=10^6) >> >> >> [1] 0 >> > > From this output, my guess is that the evidence has probability > (exactly or close to) zero. You can check by running cpdist(): if the > evidence has probability zero, no random samples will be returned. Turning > on the debugging output in cpquery() should also highlight what the > problem is. > > If that turns out to be the case, switch from the default method > "ls" to method = "lw" in cpdist(). (Note that the syntax changes > slightly, check the documentation for examples.) > >> My own knowledge from the data is that these classes should both >> typically return a value for ABW that is very much higher than the >> threshold value. > > That may be, but much depends on the specific sample the model was > fitted from. How does the fitted network look like? > > Cheers, > Marco > > > -- > Marco Scutari, Ph.D. > Lecturer in Statistics, Department of Statistics > University of Oxford, United Kingdom > >
Hi Ross, On 31 July 2016 at 09:11, Ross Chapman <ross.chapman at ecogeonomix.com> wrote:> I have tried running the cpquery in the debug mode, and found that it > typically returns the following for instances where the conditional > probability is returned as 0: > > > event matches 0 samples out of 0 (p = 0) > > Am I right in understanding that the Monte Carlo sampling has been unable > to create any cases that match the query? If so, why would this be if the > evidence used is very typical of an average case in the data used to train > the network?Yes, that is what is happening. As to the reason why, I guess that the dependencies in the data may not be adequately represented in the fitted Bayesian network for some reason. What is apparent is that (EST=='y' & TR>9 & BU>15819 & RF>2989) has an associated probability low enough that you do not observe any such sample in rejection sampling. Now, that being the case, you have two options: 1) use a much larger "n" with a smaller "batch = 10^6" to generate a lot more particles; 2) switch to likelihood weighting, i.e. cpquery(fitted,event=(ABW<=11), evidence=list(EST ='y', TR = c(9, max(data$TR)), BU = c(15819, max(data$BU)), RF = c(2989, max(data$RF)), n=10^6, method = "lw") 3) look at the parameters in your fitted network and diagnose why this is happening. Cheers, Marco -- Marco Scutari, Ph.D. Lecturer in Statistics, Department of Statistics University of Oxford, United Kingdom
Hi Marco Thank you very much for your helpful advice. I have tried you suggestion of using method = 'lw' with cpquery and can now obtain conditional probabilities. However, I am still puzzled over the outputs from the predict() and cpquery functions. The network that I am working on has the following coefficients for the node that I am interested in (ABW): Parameters of node ABW (conditional Gaussian distribution) Conditional density: ABW | EST + TR + FFB + RF Coefficients: 0 1 2 (Intercept) -0.480612729 -5.834617332 0.809011487 TR 1.857271045 1.584331230 1.964198638 FFB 0.182533645 0.066891147 0.028620951 RF -0.002822838 0.002155205 -0.001608243 Standard deviation of the residuals: 0 1 2 1.5140402 1.1764351 0.9675918 Discrete parents' configurations: EST 0 K1 1 M1 2 M2 If I run predict() using this fitted network I get ABW results very close to those expected. For example, for test case 1, I get a predicted ABW of 15.022, which is very close to the actual ABW value for this case (14.871). However, running cpquery() using the values for this test case returns a conditional probability of 0 for all levels of ABW observed in the training data. For example, the conditional probability returned for a event where ABW<15 is 0; similarly the conditional probability for an event where ABW is between the minimum and maximum ABW values observed in the data is again zero; while the conditional probability of an ABW event>24 (which is in excess of all observed values) is 1.Why does cpquery not return a high conditional probability for an event which is predicted from the same coefficients? Many thanks for your assistance with these queries. Regards Ross On Mon, August 1, 2016 7:35 pm, Marco Scutari wrote:> Hi Ross, > > > On 31 July 2016 at 09:11, Ross Chapman <ross.chapman at ecogeonomix.com> > wrote: > >> I have tried running the cpquery in the debug mode, and found that it >> typically returns the following for instances where the conditional >> probability is returned as 0: >> >>> event matches 0 samples out of 0 (p = 0) >> >> Am I right in understanding that the Monte Carlo sampling has been >> unable to create any cases that match the query? If so, why would this >> be if the evidence used is very typical of an average case in the data >> used to train the network? > > Yes, that is what is happening. As to the reason why, I guess that the > dependencies in the data may not be adequately represented in the fitted > Bayesian network for some reason. What is apparent is that > (EST=='y' & TR>9 & BU>15819 & RF>2989) has an associated probability > low enough that you do not observe any such sample in rejection sampling. > Now, that being the case, you have two options: > > > 1) use a much larger "n" with a smaller "batch = 10^6" to generate a > lot more particles; 2) switch to likelihood weighting, i.e. > > > cpquery(fitted,event=(ABW<=11), evidence=list(EST ='y', TR = c(9, > max(data$TR)), BU = c(15819, max(data$BU)), RF = c(2989, max(data$RF)), > n=10^6, method = "lw") > > 3) look at the parameters in your fitted network and diagnose why this > is happening. > > Cheers, > Marco > > > -- > Marco Scutari, Ph.D. > Lecturer in Statistics, Department of Statistics > University of Oxford, United Kingdom > >