jamessc
2010-Nov-18 17:35 UTC
[R] predict() an rpart() model: how to ignore missing levels in a factor
I am using an algorigm to split my data set into two random sections repeatedly and constuct a model using rpart() on one, test on the other and average out the results. One of my variables is a factor(crop) where each crop type has a code. Some crop types occur infrequently or singly. when the data set is randomly split, it may be that the first data set has a crop type which is not present in the second and so using predict() I get the error: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev attr(object, : factor 'factor(c2001)' has new level(s) 13, 24, 35 where c2001 is the crop. I would like the predict function to ignore these records. is there a command which will allow this as part of the predict() function? With those with a small number of records (eg. 3-4), I would hope some of the models would have the right balance to allow a prediction to be made. -- View this message in context: http://r.789695.n4.nabble.com/predict-an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html Sent from the R help mailing list archive at Nabble.com.
Jonathan P Daily
2010-Nov-18 18:40 UTC
[R] predict() an rpart() model: how to ignore missing levels in a factor
I don't think that, considering the mechanism behind recursive partitioning, that there is any way for you to ignore the crop factor if it is not in the original test set. What decision should be made if, for instance, the next split in a decision tree were on crops and output was 5 for apples, 6 for bananas, and you had an instance of jicamas? It can't ignore the crop factor at that point since the next decision hinges on it. What I think you can do, however, is pre-trim your test set by testing whether each factor is present in the first set with something like (UNTESTED):> test.set <- test.set[test.set$crop %in% original.set$crop,]-------------------------------------- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly r-help-bounces at r-project.org wrote on 11/18/2010 12:35:41 PM:> [image removed] > > [R] predict() an rpart() model: how to ignore missing levels in a factor > > jamessc > > to: > > r-help > > 11/18/2010 12:37 PM > > Sent by: > > r-help-bounces at r-project.org > > > I am using an algorigm to split my data set into two random sections > repeatedly and constuct a model using rpart() on one, test on the otherand> average out the results. > > One of my variables is a factor(crop) where each crop type has a code.Some> crop types occur infrequently or singly. when the data set is randomly > split, it may be that the first data set has a crop type which is not > present in the second and so using predict() I get the error: > > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev > attr(object, : > factor 'factor(c2001)' has new level(s) 13, 24, 35 > > where c2001 is the crop. I would like the predict function to ignorethese> records. is there a command which will allow this as part of thepredict()> function? With those with a small number of records (eg. 3-4), I wouldhope> some of the models would have the right balance to allow a prediction tobe> made. > -- > View this message in context: http://r.789695.n4.nabble.com/predict- >an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.