thr3ads.net - R help - [R] predict.naiveBayes() bug in e1071 package [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Ali Tofigh

2012-Feb-07 17:43 UTC

[R] predict.naiveBayes() bug in e1071 package

Hi,

I'm currently using the R package e1071 to train naive bayes
classifiers and came across a bug: When the posterior probabilities of
all classes are small, the result from the predict.naiveBayes function
become NaNs. This is an issue with the treatment of the
log-transformed probabilities inside the predict.naiveBayes function.
Here is an example to demonstrate the problem (you might need to
increase 'nvar' depending on your machine):

-------------------- 8< --------------------
N <- 100
nvar <- 60
varnames <- paste("v", 1:nvar, sep="")

dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10,
1))})
colnames(dat) <- varnames

out <- rep(c("a","b"), each=N/2)
names(dat) <- varnames

nb <- naiveBayes(x=dat, y=out)

new.dat <- t(rnorm(nvar, 5, 0.1))
colnames(new.dat) <- varnames

predict(nb, new.dat, type="raw")
-------------------- 8< --------------------

the results of the last line is usually NaNs. As for the solution:

To protect agains very small numbers, the e1071:::predict.naiveBayes
function takes the probabilities into log-space and adds instead of
multiplying probabilities. However, when calculating the posterior
probabilities of each class (when type = "raw"), the log of the
probabilities are exponentiated, which defeats the purpose of the
logspace transformation. I suggest the following change to the code:

Towards the end of the predict.naiveBayes function, you currently do:

L <- exp(L)
L / sum(L)   # this is what is returned

you can instead use

sapply(L, function(lp) {1 / sum(exp(L - lp))})

the above comes from the following equality:

x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x)))

Best wishes,
/Ali Tofigh

David Winsemius

2012-Feb-07 18:09 UTC

head link

[R] predict.naiveBayes() bug in e1071 package

On Feb 7, 2012, at 12:43 PM, Ali Tofigh wrote:
> Hi,
>
> I'm currently using the R package e1071 to train naive bayes
> classifiers and came across a bug: When the posterior probabilities of
> all classes are small, the result from the predict.naiveBayes function
> become NaNs.
This should be sent to the maintainer of the package. The name of the  
maintainer can always be found in the DESCRIPTION file.  Several of  
the authors are regular readers of rhelp, but I do not know whether  
David Meyer is. I'm sure a well-documented bug report, as this appears  
to be, will be welcomed.

-- 
David.> This is an issue with the treatment of the
> log-transformed probabilities inside the predict.naiveBayes function.
> Here is an example to demonstrate the problem (you might need to
> increase 'nvar' depending on your machine):
>
> -------------------- 8< --------------------
> N <- 100
> nvar <- 60
> varnames <- paste("v", 1:nvar, sep="")
>
> dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/ 
> 2, 10, 1))})
> colnames(dat) <- varnames
>
> out <- rep(c("a","b"), each=N/2)
> names(dat) <- varnames
>
> nb <- naiveBayes(x=dat, y=out)
>
> new.dat <- t(rnorm(nvar, 5, 0.1))
> colnames(new.dat) <- varnames
>
> predict(nb, new.dat, type="raw")
> -------------------- 8< --------------------
>
> the results of the last line is usually NaNs. As for the solution:
>
> To protect agains very small numbers, the e1071:::predict.naiveBayes
> function takes the probabilities into log-space and adds instead of
> multiplying probabilities. However, when calculating the posterior
> probabilities of each class (when type = "raw"), the log of the
> probabilities are exponentiated, which defeats the purpose of the
> logspace transformation. I suggest the following change to the code:
>
> Towards the end of the predict.naiveBayes function, you currently do:
>
> L <- exp(L)
> L / sum(L)   # this is what is returned
>
> you can instead use
>
> sapply(L, function(lp) {1 / sum(exp(L - lp))})
>
> the above comes from the following equality:
>
> x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) -  
> log(x)))
>
> Best wishes,
> /Ali Tofigh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

David Meyer

2012-Feb-08 09:30 UTC

head link

[R] predict.naiveBayes() bug in e1071 package

Confirmed & fixed upstream.

Thanks,
David

On 2012-02-07 18:43, Ali Tofigh wrote:> Hi,
>
> I'm currently using the R package e1071 to train naive bayes
> classifiers and came across a bug: When the posterior probabilities of
> all classes are small, the result from the predict.naiveBayes function
> become NaNs. This is an issue with the treatment of the
> log-transformed probabilities inside the predict.naiveBayes function.
> Here is an example to demonstrate the problem (you might need to
> increase 'nvar' depending on your machine):
>
> -------------------- 8<  --------------------
> N<- 100
> nvar<- 60
> varnames<- paste("v", 1:nvar, sep="")
>
> dat<- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10,
1))})
> colnames(dat)<- varnames
>
> out<- rep(c("a","b"), each=N/2)
> names(dat)<- varnames
>
> nb<- naiveBayes(x=dat, y=out)
>
> new.dat<- t(rnorm(nvar, 5, 0.1))
> colnames(new.dat)<- varnames
>
> predict(nb, new.dat, type="raw")
> -------------------- 8<  --------------------
>
> the results of the last line is usually NaNs. As for the solution:
>
> To protect agains very small numbers, the e1071:::predict.naiveBayes
> function takes the probabilities into log-space and adds instead of
> multiplying probabilities. However, when calculating the posterior
> probabilities of each class (when type = "raw"), the log of the
> probabilities are exponentiated, which defeats the purpose of the
> logspace transformation. I suggest the following change to the code:
>
> Towards the end of the predict.naiveBayes function, you currently do:
>
> L<- exp(L)
> L / sum(L)   # this is what is returned
>
> you can instead use
>
> sapply(L, function(lp) {1 / sum(exp(L - lp))})
>
> the above comes from the following equality:
>
> x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x)))
>
> Best wishes,
> /Ali Tofigh
>
>
>
-- 
Priv.-Doz. Dr. David Meyer
Department of Information Systems and Operations

WU
Wirtschaftsuniversit?t Wien
Vienna University of Economics and Business
Augasse 2-6, 1090 Vienna, Austria
Tel: +43-1-313-36-4393
Fax: +43-1-313-36-90-4393
HP:  http://ec.wu.ac.at/~meyer

Apparently Analagous Threads

Search for more maybe matching threads

R help - Feb 2012 - predict.naiveBayes() bug in e1071 package

[R] predict.naiveBayes() bug in e1071 package

[R] predict.naiveBayes() bug in e1071 package

[R] predict.naiveBayes() bug in e1071 package

Apparently Analagous Threads