thr3ads.net - R help - [R] predict() an rpart() model: how to ignore missing levels in a factor [Nov 2010]

If this information is useful, please help other people find it:
Share via:

jamessc

2010-Nov-18 17:35 UTC

[R] predict() an rpart() model: how to ignore missing levels in a factor

I am using an algorigm to split my data set into two random sections
repeatedly and constuct a model using rpart() on one, test on the other and
average out the results.

One of my variables is a factor(crop) where each crop type has a code. Some
crop types occur infrequently or singly. when the data set is randomly
split, it may be that the first data set has a crop type which is not
present in the second and so using predict() I get the error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
attr(object,  :
  factor 'factor(c2001)' has new level(s) 13, 24, 35

where c2001 is the crop. I would like the predict function to ignore these
records. is there a command which will allow this as part of the predict()
function? With those with a small number of records (eg. 3-4), I would hope
some of the models would have the right balance to allow a prediction to be
made.
-- 
View this message in context:
http://r.789695.n4.nabble.com/predict-an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html
Sent from the R help mailing list archive at Nabble.com.

Jonathan P Daily

2010-Nov-18 18:40 UTC

head link

[R] predict() an rpart() model: how to ignore missing levels in a factor

I don't think that, considering the mechanism behind recursive 
partitioning, that there is any way for you to ignore the crop factor if 
it is not in the original test set. What decision should be made if, for 
instance, the next split in a decision tree were on crops and output was 5 
for apples, 6 for bananas, and you had an instance of jicamas? It can't 
ignore the crop factor at that point since the next decision hinges on it.

What I think you can do, however, is pre-trim your test set by testing 
whether each factor is present in the first set with something like 
(UNTESTED):
> test.set <- test.set[test.set$crop %in% original.set$crop,]--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

r-help-bounces at r-project.org wrote on 11/18/2010 12:35:41 PM:
> [image removed] 
> 
> [R] predict() an rpart() model: how to ignore missing levels in a factor
> 
> jamessc 
> 
> to:
> 
> r-help
> 
> 11/18/2010 12:37 PM
> 
> Sent by:
> 
> r-help-bounces at r-project.org
> 
> 
> I am using an algorigm to split my data set into two random sections
> repeatedly and constuct a model using rpart() on one, test on the other 
and> average out the results.
> 
> One of my variables is a factor(crop) where each crop type has a code. 
Some> crop types occur infrequently or singly. when the data set is randomly
> split, it may be that the first data set has a crop type which is not
> present in the second and so using predict() I get the error:
> 
> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev 
> attr(object,  : 
>   factor 'factor(c2001)' has new level(s) 13, 24, 35
> 
> where c2001 is the crop. I would like the predict function to ignore 
these> records. is there a command which will allow this as part of the 
predict()> function? With those with a small number of records (eg. 3-4), I would 
hope> some of the models would have the right balance to allow a prediction to 
be> made.
> -- 
> View this message in context: http://r.789695.n4.nabble.com/predict-
> 
an-rpart-model-how-to-ignore-missing-levels-in-a-factor-tp3049218p3049218.html> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more possibly parallel threads

R help - Nov 2010 - predict() an rpart() model: how to ignore missing levels in a factor

[R] predict() an rpart() model: how to ignore missing levels in a factor

[R] predict() an rpart() model: how to ignore missing levels in a factor

Maybe Matching Threads