Mai Dang
2010-Nov-10 20:48 UTC
[R] randomForest can not handle categorical predictors with more than 32 categories
I received this error Error in randomForest.default(m, y, ...) : Can not handle categorical predictors with more than 32 categories. using below code library(randomForest) library(MASS) memory.limit(size=12999) x <- read.csv("D:/train_store_title_view.csv", header=TRUE) x <- na.omit(x) set.seed(131) sales.rf <- randomForest(sales ~ ., data=x, mtry=3, importance=TRUE) My machine (i7) running on 64 bit R with 12 gigs of RAM. Would anyone know how to avoid this error ? Thank You for your reply, Mai Dang [[alternative HTML version deleted]]
Mattia Prosperi
2010-Nov-10 20:51 UTC
[R] randomForest can not handle categorical predictors with more than 32 categories
try to transform the attributes that have more than 32 levels into dummy binary variables. 2010/11/10 Mai Dang <mdmining at gmail.com>:> I received this error > Error in randomForest.default(m, y, ...) : > Can not handle categorical predictors with more than 32 categories. > > using below code > > library(randomForest) > library(MASS) > memory.limit(size=12999) > x <- read.csv("D:/train_store_title_view.csv", header=TRUE) > x <- na.omit(x) > set.seed(131) > sales.rf <- randomForest(sales ~ ., data=x, mtry=3, > importance=TRUE) > > My machine (i7) running on 64 bit R with 12 gigs of RAM. > > Would anyone know how to avoid this error ? > Thank You for your reply, > > Mai Dang > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Erik Iverson
2010-Nov-10 22:06 UTC
[R] randomForest can not handle categorical predictors with more than 32 categories
Well, the error message seems relatively straightforward. When you run str(x) (you did not provide the data) you should see 1 or more components are factors that have more than 32 levels. Apparently you can't include those predictors in a call to randomForest. You might find the following line of code useful: which(sapply(x, function(y) nlevels(y) > 32)) Mai Dang wrote:> I received this error > Error in randomForest.default(m, y, ...) : > Can not handle categorical predictors with more than 32 categories. > > using below code > > library(randomForest) > library(MASS) > memory.limit(size=12999) > x <- read.csv("D:/train_store_title_view.csv", header=TRUE) > x <- na.omit(x) > set.seed(131) > sales.rf <- randomForest(sales ~ ., data=x, mtry=3, > importance=TRUE) > > My machine (i7) running on 64 bit R with 12 gigs of RAM. > > Would anyone know how to avoid this error ? > Thank You for your reply, > > Mai Dang > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.