thr3ads.net - R help - [R] specifying model terms when using predict [Jan 2009]

If this information is useful, please help other people find it:
Share via:

VanHezewijk, Brian

2009-Jan-16 20:20 UTC

[R] specifying model terms when using predict

I've recently encountered an issue when trying to use the predict.glm
function.

 

I've gotten into the habit of using the dataframe$variablename method of
specifying terms in my model statements.  I thought this unambiguous
notation would be acceptable in all situations but it seems models
written this way are not accepted by the predict function.  Perhaps
others have encountered this problem as well.

 

The code below illustrates the issue.

 

 

######

## linear model example

 

# this works

 x<-1:100

 y<-2*x

 

 lm1<-glm(y~x)

 pred1<-predict(lm1,newdata=data.frame(x=101:150))

 

## so does this

 x<-1:100

 y<-2*x

 orig.df<-data.frame(x1=x,y1=y)

 

 lm1<-glm(y1~x1,data=orig.df)

 pred1<-predict(lm1,newdata=data.frame(x1=101:150))

 

## this does not run

 x<-1:100

 y<-2*x

 orig.df<-data.frame(x1=x,y1=y)

 

 lm1<-glm(orig.df$y1~orig.df$x1,data=orig.df)

 pred1<-predict(lm1,newdata=data.frame(x1=101:150))

 

 

The final statement generates the following warning:

 

Warning message:

'newdata' had 50 rows but variable(s) found have 100 rows

 

 

Hope this is of some help.

 

 

 

Brian Van Hezewijk 

 


	[[alternative HTML version deleted]]

Marc Schwartz

2009-Jan-16 21:30 UTC

head link

[R] specifying model terms when using predict

on 01/16/2009 02:20 PM VanHezewijk, Brian wrote:> I've recently encountered an issue when trying to use the predict.glm
> function.
> 
>  
> 
> I've gotten into the habit of using the dataframe$variablename method
of
> specifying terms in my model statements.  I thought this unambiguous
> notation would be acceptable in all situations but it seems models
> written this way are not accepted by the predict function.  Perhaps
> others have encountered this problem as well.
<snip>

The bottom line is "don't do that".  :-)

When the predict.*() functions look for the variable names, they use the
names as specified in the formula that was used in the initial creation
of the model object.

As per ?predict.glm:

Note

Variables are first looked for in newdata and then searched for in the
usual way (which will include the environment of the formula used in the
fit). A warning will be given if the variables found are not of the same
length as those in newdata if it was supplied.

As per your example, using:

 x <- 1:100

 y <- 2 * x

 orig.df <- data.frame(x1 = x, y1 = y)

 lm1 <- glm(orig.df$y1 ~ orig.df$x1, data = orig.df)

 pred1 <- predict(lm1, newdata = data.frame(x1 = 101:150))

When predict.glm() tries to locate the variable "orig.df$x1" in the
data
frame passed to 'newdata', it cannot be found. The correct name in the
model is "orig.df$x1", not "x1" as you used above.

Thus, since it cannot find that variable in 'newdata', it begins to look
elsewhere for a variable called "orig.df$x1". Guess what?  It finds it
in the global environment as a column the original dataframe 'orig.df'.

Since that column has a length of 100 and the data frame that you passed
to newdata only has 50, you get an error.

Warning message:

'newdata' had 50 rows but variable(s) found have 100 rows

There is a "method" to the madness and good reason why the modeling
functions and others that take a formula argument also have a 'data'
argument to specify the location of the variables to be used.

HTH,

Marc Schwartz

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2009 - specifying model terms when using predict

[R] specifying model terms when using predict

[R] specifying model terms when using predict

Reasonably Related Threads