Joao Azevedo
2012-Jul-05 15:01 UTC
[R] Different level set when predicting with e1071's Naive Bayes classifier
Hi! I'm using the Naive Bayes classifier provided by the e1071 package ( http://cran.r-project.org/web/packages/e1071) and I've noticed that the predict function has a different behavior when the level set of the columns used for prediction is different from the ones used for fitting. From inspecting the predict.naiveBayes I came to the conclusion that this is due to the conversion of factors to their internal codes using the data.matrix function. For example, consider the following piece of R code:> library(mlbench) > library(e1071) > data(HouseVotes84) > model <- naiveBayes(Class ~ ., data = HouseVotes84) > head(HouseVotes84)Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 1 republican n y n y y y n n n y <NA> y y y n y 2 republican n y n y y y n n n n n y y y n <NA> 3 democrat <NA> y y <NA> y y n n n n y n y y n n 4 democrat n y y n <NA> y n n n n y n y n n y 5 democrat y y y n y y n n n n y <NA> y y y y 6 democrat n y y n y y n n n n n n y y y y> predict(model, HouseVotes84[1,-1])[1] republican Levels: democrat republican> new.data <- data.frame(V1="n", V2="y", V3="n", V4="y", V5="y", V6="y",V7="n", V8="n", V9="n", V10="y", V11=NA_character_, V12="y", V13="y", V14="y", V15="n", V16="y", stringsAsFactors=TRUE)> predict(model, new.data)[1] democrat Levels: democrat republican I haven't used other classification methods in R, so I'm unsure if this is what is expected from the application of the predict function. Is this a bug or the expected behavior? Thanks! -- Joao. [[alternative HTML version deleted]]
Joao Azevedo
2012-Jul-06 10:05 UTC
[R] Different level set when predicting with e1071's Naive Bayes classifier
Hi! I think I had some issues with the charset on the previous message, so I'm sending this again. Sorry for the double post. I'm using the Naive Bayes classifier provided by the e1071 package ( http://cran.r-project.org/web/packages/e1071) and I've noticed that the predict function has a different behavior when the level set of the columns used for prediction is different from the ones used for fitting. From inspecting the predict.naiveBayes I came to the conclusion that this is due to the conversion of factors to their internal codes using the data.matrix function. For example, consider the following piece of R code:> library(mlbench) > library(e1071) > data(HouseVotes84) > model <- naiveBayes(Class ~ ., data = HouseVotes84) > head(HouseVotes84)Class V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 1 republican n y n y y y n n n y <NA> y y y n y 2 republican n y n y y y n n n n n y y y n <NA> 3 democrat <NA> y y <NA> y y n n n n y n y y n n 4 democrat n y y n <NA> y n n n n y n y n n y 5 democrat y y y n y y n n n n y <NA> y y y y 6 democrat n y y n y y n n n n n n y y y y> predict(model, HouseVotes84[1,-1])[1] republican Levels: democrat republican> new.data <- data.frame(V1="n", V2="y", V3="n", V4="y", V5="y", V6="y",V7="n", V8="n", V9="n", V10="y", V11=NA_character_, V12="y", V13="y", V14="y", V15="n", V16="y", stringsAsFactors=TRUE)> predict(model, new.data)[1] democrat Levels: democrat republican I haven't used other classification methods in R, so I'm unsure if this is what is expected from the application of the predict function. Is this a bug or the expected behavior? Thanks! -- Joao. [[alternative HTML version deleted]]
Possibly Parallel Threads
- Naive Bayes Issue - Can't Predict - Error is "Error in log(sapply(attribs...)
- Lattice: Plotting two densities on the same plot(s)?
- e1071's Naive Bayes with Weighted Data
- 10- fold cross validation for naive bayes(e1071)
- Regarding naive baysian classifier in R