thr3ads.net - R help - [R] rpart package: why does predict.rpart require values for "unused" predictors? [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Jason Roberts

2012-Aug-01 22:17 UTC

[R] rpart package: why does predict.rpart require values for "unused" predictors?

After fitting and pruning an rpart model, it is often the case that one or
more of the original predictors is not used by any of the splits of the
final tree. It seems logical, therefore, that values for these
"unused"
predictors would not be needed for prediction. But when predict() is called
on such models, all predictors seem to be required. Why is that, and can it
be easily circumvented?

Consider this example:
> model <- rpart(Mileage ~ Weight + Disp. + HP, car.test.frame)
> modeln= 60 

node), split, n, deviance, yval
      * denotes terminal node

1) root 60 1354.58300 24.58333  
  2) Disp.>=134 35  154.40000 21.40000  
    4) Weight>=3087.5 22   61.31818 20.40909 *
    5) Weight< 3087.5 13   34.92308 23.07692 *
  3) Disp.< 134 25  348.96000 29.04000  
    6) Disp.>=97.5 16  101.75000 27.12500 *
    7) Disp.< 97.5 9   84.22222 32.44444 *> newdata <- data.frame(Disp.=car.test.frame$Disp.,
Weight=car.test.frame$Weight)> predict(model, newdata=newdata)Error in eval(expr, envir, enclos) : object 'HP' not found

In this model, Disp. and Weight were used in splits, but HP was not. Thus I
expected to be able to perform predictions by providing values for just
Disp. and Weight, but predict() failed when I tried that, complaining that
HP was not also provided.

Thanks for any help you can provide. My apologies if I simply do not
understand how this works.

Best regards,

Jason

Jean V Adams

2012-Aug-02 12:33 UTC

head link

[R] rpart package: why does predict.rpart require values for "unused" predictors?

Jason,

In the help file for predict.rpart it says, "The predictors referred to in 
the right side of formula(object) must be present by name in newdata."
?predict.rpart

So, that's just the way it is.  There are a couple ways to work around 
this, if you wish.  You could create a data frame with all NAs for the 
unused predictor(s).  For example,
newdata2 <- data.frame(Disp.=car.test.frame$Disp., 
Weight=car.test.frame$Weight, HP=as.numeric(rep(NA, 
dim(car.test.frame)[1])))
predict(model, newdata=newdata2)

Or, you could refit the model using only the "important" factors.  For
example,
model2 <- rpart(Mileage ~ Weight + Disp., car.test.frame)
predict(model2, newdata=newdata)

Jean


"Jason Roberts" <jason.roberts@duke.edu> wrote on 08/01/2012
05:17:38 PM:> 
> After fitting and pruning an rpart model, it is often the case that one 
or> more of the original predictors is not used by any of the splits of the
> final tree. It seems logical, therefore, that values for these
"unused"
> predictors would not be needed for prediction. But when predict() is 
called> on such models, all predictors seem to be required. Why is that, and can 
it> be easily circumvented?
> 
> Consider this example:
> 
> > model <- rpart(Mileage ~ Weight + Disp. + HP, car.test.frame)
> > model
> n= 60 
> 
> node), split, n, deviance, yval
>       * denotes terminal node
> 
> 1) root 60 1354.58300 24.58333 
>   2) Disp.>=134 35  154.40000 21.40000 
>     4) Weight>=3087.5 22   61.31818 20.40909 *
>     5) Weight< 3087.5 13   34.92308 23.07692 *
>   3) Disp.< 134 25  348.96000 29.04000 
>     6) Disp.>=97.5 16  101.75000 27.12500 *
>     7) Disp.< 97.5 9   84.22222 32.44444 *
> > newdata <- data.frame(Disp.=car.test.frame$Disp.,
> Weight=car.test.frame$Weight)
> > predict(model, newdata=newdata)
> Error in eval(expr, envir, enclos) : object 'HP' not found
> 
> In this model, Disp. and Weight were used in splits, but HP was not. 
Thus I> expected to be able to perform predictions by providing values for just
> Disp. and Weight, but predict() failed when I tried that, complaining 
that> HP was not also provided.
> 
> Thanks for any help you can provide. My apologies if I simply do not
> understand how this works.
> 
> Best regards,
> 
> Jason
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Aug 2012 - rpart package: why does predict.rpart require values for "unused" predictors?

[R] rpart package: why does predict.rpart require values for "unused" predictors?

[R] rpart package: why does predict.rpart require values for "unused" predictors?

Seemingly Similar Threads