Hi, Can someone please offer me some guidance? I imported some data. One of the columns called "JOBTITLE" when imported was imported as a factor column with 416 levels. I subset the data in such a way that only 4 levels have data in "JOBTITLE" and tried running randomForest but it complained about "JOBTITLE" having more than 32 categories. I know that is the limit in randomForest but I guess I don't understand enough about factors because I thought by subsetting the data this no longer would be an issue. BTW I can run randomForest on this dataset if I exclude "JOBTITLE". So I then converted that column to a character vector:> TRAINSET$JOBTITLE<-as.character(TRAINSET$JOBTITLE)I ran Random Forest and got the below error. Why isn't this working? What do I need to do to get this working?> library(randomForest) > FOREST_model <- randomForest(as.factor(TARGET)~., data=trainset, mtry=4, ntree=1000,+ importance=TRUE, do.trace=100) Error in randomForest.default(m, y, ...) : NA/NaN/Inf in foreign function call (arg 1) In addition: Warning message: In data.matrix(x) : NAs introduced by coercion Your help will be greatly appreciated. Dan [[alternative HTML version deleted]]
Andrew Robinson
2013-Jan-15 02:06 UTC
[R] Random Forest Error for Factor to Character column
After you subset the data, did you redeclare the factor? If not then R still thinks it has the potential for all those levels. TRAINSET$JOBTITLE <- factor(TRAINSET$JOBTITLE) I hope this helps Andrew On Tuesday, January 15, 2013, Lopez, Dan wrote:> Hi, > > Can someone please offer me some guidance? > > I imported some data. One of the columns called "JOBTITLE" when imported > was imported as a factor column with 416 levels. > > I subset the data in such a way that only 4 levels have data in "JOBTITLE" > and tried running randomForest but it complained about "JOBTITLE" having > more than 32 categories. I know that is the limit in randomForest but I > guess I don't understand enough about factors because I thought by > subsetting the data this no longer would be an issue. BTW I can run > randomForest on this dataset if I exclude "JOBTITLE". > > So I then converted that column to a character vector: > > TRAINSET$JOBTITLE<-as.character(TRAINSET$JOBTITLE) > > I ran Random Forest and got the below error. Why isn't this working? What > do I need to do to get this working? > > > library(randomForest) > > FOREST_model <- randomForest(as.factor(TARGET)~., data=trainset, mtry=4, > ntree=1000, > + importance=TRUE, do.trace=100) > > Error in randomForest.default(m, y, ...) : > NA/NaN/Inf in foreign function call (arg 1) > In addition: Warning message: > In data.matrix(x) : NAs introduced by coercion > > Your help will be greatly appreciated. > > Dan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org <javascript:;> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Andrew Robinson Director (A/g), ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ FAwR: http://www.ms.unimelb.edu.au/~andrewpr/FAwR/ SPuR: http://www.ms.unimelb.edu.au/spuRs/ [[alternative HTML version deleted]]