Alexsandro Cândido de Oliveira Silva
2014-Jun-09 01:27 UTC
[R] NA/NaN values in bnlearn package R
Hello, I am using the bnlearn package in R to handle large amounts of data in Bayesian networks. The variables are discrete and have more than 3 million observations. With bn.fit function I could easily get the conditional probability distribution. However, some variables have unobserved values ??(i.e., NA or NaN). In some variables, unobserved values ??are almost 1 million. This is a lot to just delete them. In tests, I've got this:> nw.fit <-bn.fit (nw, date, method = 'bayes')Error in check.data (date): the data set contains null / NaN / NA values??. So, how could I deal with the data and get the conditional probability distribution? Could someone help me? Regards. Alexsandro C?ndido de Oliveira Silva
On Jun 8, 2014, at 6:27 PM, Alexsandro C?ndido de Oliveira Silva wrote:> Hello, > > I am using the bnlearn package in R to handle large amounts of data in Bayesian networks. The variables are discrete and have more than 3 million observations. > With bn.fit function I could easily get the conditional probability distribution. However, some variables have unobserved values ??(i.e., NA or NaN). In some variables, unobserved values ??are almost 1 million. This is a lot to just delete them. > > In tests, I've got this: > >> nw.fit <-bn.fit (nw, date, method = 'bayes') > Error in check.data (date): the data set contains null / NaN / NA values??. > > So, how could I deal with the data and get the conditional probability distribution? > Could someone help me?You are requested not to crosspost at multiple R mailing lists (and by extension of the reasoning behind that request, crossposting on Rhelp and StackOverflow is also considered discourteous.) Both Rhelp and SO expect yopu to provide enough information to comment intelligently, but you have not lived up to that expectation. -- David Winsemius Alameda, CA, USA
Dear Alexsandro, On 9 June 2014 02:27, Alexsandro C?ndido de Oliveira Silva <acos at dpi.inpe.br> wrote:> So, how could I deal with the data and get the conditional probability > distribution?As of the current release (3.5), all functions in bnlearn require complete data so that error message is expected. However, you can estimate the CPTs from incomplete data using table() and prop.table() and assemble them in a fitted BN with custom.fit(). On the other hand, maybe it would be better to write an EM wrapper around bn.fit() to make the best of the dependence structure of the data? Cheers, Marco -- Marco Scutari, Ph.D. Research Associate, Genetics Institute (UGI) University College London (UCL), United Kingdom
Alexasandro, just for your information the catnet package can handle missing data, as well as perturbed data. it deals with discrete data only which is not a problem for you. peter On Sun, Jun 8, 2014 at 9:27 PM, Alexsandro Cândido de Oliveira Silva < acos@dpi.inpe.br> wrote:> Hello, > > I am using the bnlearn package in R to handle large amounts of data in > Bayesian networks. The variables are discrete and have more than 3 million > observations. > With bn.fit function I could easily get the conditional probability > distribution. However, some variables have unobserved values ??(i.e., NA or > NaN). In some variables, unobserved values ??are almost 1 million. This is > a lot to just delete them. > > In tests, I've got this: > > nw.fit <-bn.fit (nw, date, method = 'bayes') >> > Error in check.data (date): the data set contains null / NaN / NA values??. > > So, how could I deal with the data and get the conditional probability > distribution? > Could someone help me? > > > Regards. > Alexsandro Cândido de Oliveira Silva > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Peter Salzman, PhD Department of Biostatistics and Computational Biology University of Rochester [[alternative HTML version deleted]]