Hi, I'm currently using the R package e1071 to train naive bayes classifiers and came across a bug: When the posterior probabilities of all classes are small, the result from the predict.naiveBayes function become NaNs. This is an issue with the treatment of the log-transformed probabilities inside the predict.naiveBayes function. Here is an example to demonstrate the problem (you might need to increase 'nvar' depending on your machine): -------------------- 8< -------------------- N <- 100 nvar <- 60 varnames <- paste("v", 1:nvar, sep="") dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10, 1))}) colnames(dat) <- varnames out <- rep(c("a","b"), each=N/2) names(dat) <- varnames nb <- naiveBayes(x=dat, y=out) new.dat <- t(rnorm(nvar, 5, 0.1)) colnames(new.dat) <- varnames predict(nb, new.dat, type="raw") -------------------- 8< -------------------- the results of the last line is usually NaNs. As for the solution: To protect agains very small numbers, the e1071:::predict.naiveBayes function takes the probabilities into log-space and adds instead of multiplying probabilities. However, when calculating the posterior probabilities of each class (when type = "raw"), the log of the probabilities are exponentiated, which defeats the purpose of the logspace transformation. I suggest the following change to the code: Towards the end of the predict.naiveBayes function, you currently do: L <- exp(L) L / sum(L) # this is what is returned you can instead use sapply(L, function(lp) {1 / sum(exp(L - lp))}) the above comes from the following equality: x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x))) Best wishes, /Ali Tofigh
On Feb 7, 2012, at 12:43 PM, Ali Tofigh wrote:> Hi, > > I'm currently using the R package e1071 to train naive bayes > classifiers and came across a bug: When the posterior probabilities of > all classes are small, the result from the predict.naiveBayes function > become NaNs.This should be sent to the maintainer of the package. The name of the maintainer can always be found in the DESCRIPTION file. Several of the authors are regular readers of rhelp, but I do not know whether David Meyer is. I'm sure a well-documented bug report, as this appears to be, will be welcomed. -- David.> This is an issue with the treatment of the > log-transformed probabilities inside the predict.naiveBayes function. > Here is an example to demonstrate the problem (you might need to > increase 'nvar' depending on your machine): > > -------------------- 8< -------------------- > N <- 100 > nvar <- 60 > varnames <- paste("v", 1:nvar, sep="") > > dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/ > 2, 10, 1))}) > colnames(dat) <- varnames > > out <- rep(c("a","b"), each=N/2) > names(dat) <- varnames > > nb <- naiveBayes(x=dat, y=out) > > new.dat <- t(rnorm(nvar, 5, 0.1)) > colnames(new.dat) <- varnames > > predict(nb, new.dat, type="raw") > -------------------- 8< -------------------- > > the results of the last line is usually NaNs. As for the solution: > > To protect agains very small numbers, the e1071:::predict.naiveBayes > function takes the probabilities into log-space and adds instead of > multiplying probabilities. However, when calculating the posterior > probabilities of each class (when type = "raw"), the log of the > probabilities are exponentiated, which defeats the purpose of the > logspace transformation. I suggest the following change to the code: > > Towards the end of the predict.naiveBayes function, you currently do: > > L <- exp(L) > L / sum(L) # this is what is returned > > you can instead use > > sapply(L, function(lp) {1 / sum(exp(L - lp))}) > > the above comes from the following equality: > > x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - > log(x))) > > Best wishes, > /Ali Tofigh > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Confirmed & fixed upstream. Thanks, David On 2012-02-07 18:43, Ali Tofigh wrote:> Hi, > > I'm currently using the R package e1071 to train naive bayes > classifiers and came across a bug: When the posterior probabilities of > all classes are small, the result from the predict.naiveBayes function > become NaNs. This is an issue with the treatment of the > log-transformed probabilities inside the predict.naiveBayes function. > Here is an example to demonstrate the problem (you might need to > increase 'nvar' depending on your machine): > > -------------------- 8< -------------------- > N<- 100 > nvar<- 60 > varnames<- paste("v", 1:nvar, sep="") > > dat<- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10, 1))}) > colnames(dat)<- varnames > > out<- rep(c("a","b"), each=N/2) > names(dat)<- varnames > > nb<- naiveBayes(x=dat, y=out) > > new.dat<- t(rnorm(nvar, 5, 0.1)) > colnames(new.dat)<- varnames > > predict(nb, new.dat, type="raw") > -------------------- 8< -------------------- > > the results of the last line is usually NaNs. As for the solution: > > To protect agains very small numbers, the e1071:::predict.naiveBayes > function takes the probabilities into log-space and adds instead of > multiplying probabilities. However, when calculating the posterior > probabilities of each class (when type = "raw"), the log of the > probabilities are exponentiated, which defeats the purpose of the > logspace transformation. I suggest the following change to the code: > > Towards the end of the predict.naiveBayes function, you currently do: > > L<- exp(L) > L / sum(L) # this is what is returned > > you can instead use > > sapply(L, function(lp) {1 / sum(exp(L - lp))}) > > the above comes from the following equality: > > x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x))) > > Best wishes, > /Ali Tofigh > > >-- Priv.-Doz. Dr. David Meyer Department of Information Systems and Operations WU Wirtschaftsuniversit?t Wien Vienna University of Economics and Business Augasse 2-6, 1090 Vienna, Austria Tel: +43-1-313-36-4393 Fax: +43-1-313-36-90-4393 HP: http://ec.wu.ac.at/~meyer