I want to predict values from an existing lm (linear model, e.g. lm.obj) result in R using a new set of predictor variables (e.g. newdata). However, it seems that because my linear models was made by calling scale() on the target predictor that predict exits with an error, "Error in scale(xxA, center = 9.7846094491829, scale 0.959413568556403) : object 'xxA' not found". By debugging predict, I can see that the error occurs in a call to model.frame. By debugging model frame I can see the error occurs with this command: variables <- eval(predvars, data, env); it seems likely that the error is because predvars looks like this: list(scale(xxA, center = 10.2058714830537, scale = 0.984627257169526), scale(xxB, center = 20.4491690881149, scale = 1.13765718273923)) An example case: dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(10,20)) dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) xVar <- "scale(xxA)" traceVar <- "scale(xxB)" DVname <- "out" lm.obj <- lm.res.scale <- lm(out ~ scale(xxA)*scale(xxB),data=dat) my.data <- lm.obj$model #load the data from the lm object X1 <- my.data[,xVar] X2 <- my.data[,traceVar] DV <- lm.obj$model[,DVname] newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1)) newdata$X1 <- newdata$X1 * sd(my.data[,xVar]) newdata$X2 <- newdata$X2 * sd(my.data[,traceVar]) names(newdata) <- c(xVar,traceVar) #have to rename to original variable names for predict to work newdata$Y <- predict(lm.obj,newdata) Is there something I could do before passing newdata or lm.obj to predict() that would prevent the error? From the help file it looks like I might be able to do something with the terms, argument but I haven't quite figured out what I would need to do. Alternatively, is there a fix for model.frame that would prevent the error? Should predict() behave this way? Thanks for your time, Russell S. Pierce
Hi Russell, There may be some subtleties that I'm not picking up on, but the obvious problem is that the names of the predictors in newdata do not match the names of the predictors in dat. names(newdata) <- names(dat)[1:2] newdata$Y <- predict(lm.obj,newdata) does work on my machine. Best, Ista On Fri, Jan 28, 2011 at 4:37 PM, Russell Pierce <rpier001 at ucr.edu> wrote:> I want to predict values from an existing lm (linear model, e.g. > lm.obj) result in R using a new set of predictor variables (e.g. > newdata). ?However, it seems that because my linear models was made by > calling scale() on the target predictor that predict exits with an > error, "Error in scale(xxA, center = 9.7846094491829, scale > 0.959413568556403) : object 'xxA' not found". ?By debugging predict, I > can see that the error occurs in a call to model.frame. ?By debugging > model frame I can see the error occurs with this command: ?variables > <- eval(predvars, data, env); it seems likely that the error is > because predvars looks like this: > > ? ?list(scale(xxA, center = 10.2058714830537, scale = 0.984627257169526), > ? ?scale(xxB, center = 20.4491690881149, scale = 1.13765718273923)) > > An example case: > > ? ?dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(10,20)) > ? ?dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) > ? ?xVar <- "scale(xxA)" > ? ?traceVar <- "scale(xxB)" > ? ?DVname <- "out" > ? ?lm.obj <- lm.res.scale <- lm(out ~ scale(xxA)*scale(xxB),data=dat) > ? ?my.data <- lm.obj$model #load the data from the lm object > ? ?X1 <- my.data[,xVar] > ? ?X2 <- my.data[,traceVar] > ? ?DV <- lm.obj$model[,DVname] > ? ?newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1)) > ? ?newdata$X1 <- newdata$X1 * sd(my.data[,xVar]) > ? ?newdata$X2 <- newdata$X2 * sd(my.data[,traceVar]) > ? ?names(newdata) <- c(xVar,traceVar) #have to rename to original > variable names for predict to work > ? ?newdata$Y <- predict(lm.obj,newdata) > > Is there something I could do before passing newdata or lm.obj to > predict() that would prevent the error? ?From the help file it looks > like I might be able to do something with the terms, argument but I > haven't quite figured out what I would need to do. Alternatively, is > there a fix for model.frame that would prevent the error? ?Should > predict() behave this way? > > Thanks for your time, > > Russell S. Pierce > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
On Fri, Jan 28, 2011 at 6:26 PM, Russell Pierce <rpier001 at ucr.edu> wrote:> Thanks for your response Ista, > I'm looking at the results in newdata following your command. ?I agree that > predict ran, but I don't think it did what I expect it to do.?I may be > mistaken, but shouldn't the mean of dat$out be close to the mean of > newdata$Y? ?Shouldn't the values in newdata$Y (assuming predict is working > as expected) be similar values to: > coef(lm.res.scale)[1]+coef(lm.res.scale)[2]*newdata[,1]+coef(lm.res.scale)[3]*newdata[,2]+coef(lm.res.scale)[4]*newdata[,1]*newdata[,2] > ?I don't think so. The values calculated by predict.lm actually look like this: check.predictions <- data.frame(by.hand = coef(lm.res.scale)[1]+ coef(lm.res.scale)[2]*(newdata[,1]-mean(dat[,1]))/sd(dat[,1])+ coef(lm.res.scale)[3]*(newdata[,2]-mean(dat[,2]))/sd(dat[,2])+ coef(lm.res.scale)[4]*(newdata[,1]-mean(dat[,1]))/sd(dat[,1])*(newdata[,2]-mean(dat[,2]))/sd(dat[,2]), pre.lm = predict(lm.obj, newdata)) In other words, predict.lm assumes that the new data is on the same scale as the original data. That is exactly what I would expect. HTH, Ista> ----------------------------------- > Russell S. Pierce, M.A. > Visual Cognition Lab > Department of Psychology > University of California, Riverside > 900 University Avenue > Riverside, CA 92521 > Lab Phone: (951) 827-7399 > > > On Fri, Jan 28, 2011 at 2:31 PM, Ista Zahn <izahn at psych.rochester.edu> > wrote: >> >> Hi Russell, >> There may be some subtleties that I'm not picking up on, but the >> obvious problem is that the names of the predictors in newdata do not >> match the names of the predictors in dat. >> >> names(newdata) <- names(dat)[1:2] >> ? newdata$Y <- predict(lm.obj,newdata) >> >> does work on my machine. >> >> Best, >> Ista >> >> On Fri, Jan 28, 2011 at 4:37 PM, Russell Pierce <rpier001 at ucr.edu> wrote: >> > I want to predict values from an existing lm (linear model, e.g. >> > lm.obj) result in R using a new set of predictor variables (e.g. >> > newdata). ?However, it seems that because my linear models was made by >> > calling scale() on the target predictor that predict exits with an >> > error, "Error in scale(xxA, center = 9.7846094491829, scale >> > 0.959413568556403) : object 'xxA' not found". ?By debugging predict, I >> > can see that the error occurs in a call to model.frame. ?By debugging >> > model frame I can see the error occurs with this command: ?variables >> > <- eval(predvars, data, env); it seems likely that the error is >> > because predvars looks like this: >> > >> > ? ?list(scale(xxA, center = 10.2058714830537, scale >> > 0.984627257169526), >> > ? ?scale(xxB, center = 20.4491690881149, scale = 1.13765718273923)) >> > >> > An example case: >> > >> > ? ?dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(10,20)) >> > ? ?dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) >> > ? ?xVar <- "scale(xxA)" >> > ? ?traceVar <- "scale(xxB)" >> > ? ?DVname <- "out" >> > ? ?lm.obj <- lm.res.scale <- lm(out ~ scale(xxA)*scale(xxB),data=dat) >> > ? ?my.data <- lm.obj$model #load the data from the lm object >> > ? ?X1 <- my.data[,xVar] >> > ? ?X2 <- my.data[,traceVar] >> > ? ?DV <- lm.obj$model[,DVname] >> > ? ?newdata <- expand.grid(X1=c(-1,0,1),X2=c(-1,0,1)) >> > ? ?newdata$X1 <- newdata$X1 * sd(my.data[,xVar]) >> > ? ?newdata$X2 <- newdata$X2 * sd(my.data[,traceVar]) >> > ? ?names(newdata) <- c(xVar,traceVar) #have to rename to original >> > variable names for predict to work >> > ? ?newdata$Y <- predict(lm.obj,newdata) >> > >> > Is there something I could do before passing newdata or lm.obj to >> > predict() that would prevent the error? ?From the help file it looks >> > like I might be able to do something with the terms, argument but I >> > haven't quite figured out what I would need to do. Alternatively, is >> > there a fix for model.frame that would prevent the error? ?Should >> > predict() behave this way? >> > >> > Thanks for your time, >> > >> > Russell S. Pierce >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org > >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Another option is set.seed(10) dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20)) dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) lm.mod <- lm(out ~ I(scale(xxA))*I(scale(xxB)), data=dat) newdata <- data.frame(xxA=c(-1,0,1),xxB=c(-1,0,1)) preds <- predict(lm.mod, newdata) Best, Ista On Sat, Jan 29, 2011 at 5:12 PM, Russell Pierce <rpier001 at ucr.edu> wrote:> Just in case someone else stumbles onto this thread and is facing a > similar issue: ?The quick solution for me turned out to be using Glm > and Predict in the rms package. ?Thanks go to Joshua and Ista for > helping me out with this issue. ?Double thanks go to Joshua for > suggesting I take a closer look at the rms package. > > library(rms) > dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20)) > dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) > rms.res <- Glm(out ~ scale(xxA)*scale(xxB),data=dat) > newdata <- as.data.frame(Predict(rms.res,xxA=c(-1,0,1),xxB=c(-1,0,1))[,1:3]) > > ----------------------------------- > Russell S. Pierce, M.A. > Visual Cognition Lab > Department of Psychology > University of California, Riverside > 900 University Avenue > Riverside, CA 92521 > Lab Phone: (951) 827-7399 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
R-help list and interested parties, On Cross Validated mpiktas correctly noted that both the I() and rms Glm/Predict solution produce incorrect results (http://stats.stackexchange.com/questions/6684/how-can-one-use-the-predict-function-on-a-lm-object-where-the-ivs-have-been-dynam/6718#6718). As far as I can tell, the short version is that both I and rms leave scale() in the formula for the lm object, so predict and Predict() run scale on the provided newdata prior to generating the actual prediction. So, for now, there appears no easy way to do this the way I hoped. Time for me to get down to writing functions. Best, Russell S. Pierce, M.A. Visual Cognition Lab Department of Psychology University of California, Riverside 900 University Avenue Riverside, CA 92521 Lab Phone: (951) 827-7399 On Sat, Jan 29, 2011 at 9:12 AM, Russell Pierce <rpier001 at ucr.edu> wrote:> Just in case someone else stumbles onto this thread and is facing a > similar issue: ?The quick solution for me turned out to be using Glm > and Predict in the rms package. ?Thanks go to Joshua and Ista for > helping me out with this issue. ?Double thanks go to Joshua for > suggesting I take a closer look at the rms package. > > library(rms) > dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20)) > dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) > rms.res <- Glm(out ~ scale(xxA)*scale(xxB),data=dat) > newdata <- as.data.frame(Predict(rms.res,xxA=c(-1,0,1),xxB=c(-1,0,1))[,1:3]) > > ----------------------------------- > Russell S. Pierce, M.A. > Visual Cognition Lab > Department of Psychology > University of California, Riverside > 900 University Avenue > Riverside, CA 92521 > Lab Phone: (951) 827-7399 >
On Sun, Jan 30, 2011 at 5:59 PM, Russell Pierce <rpier001 at ucr.edu> wrote:> R-help list and interested parties, > > On Cross Validated mpiktas correctly noted that both the I() and rms > Glm/Predict solution produce incorrect resultsYou probably meant it this way anyways, but I would say it produces undesired results ("incorrect" seems a bit unfair to the developers---predict() scales the data in a model built from a scaled object)> (http://stats.stackexchange.com/questions/6684/how-can-one-use-the-predict-function-on-a-lm-object-where-the-ivs-have-been-dynam/6718#6718). > ?As far as I can tell, the short version is that both I and rms leave > scale() in the formula for the lm object, so predict and Predict() run > scale on the provided newdata prior to generating the actual > prediction. ?So, for now, there appears no easy way to do this the way > I hoped. ?Time for me to get down to writing functions.or just scale outside of the formula, which can be done in a couple lines of code and equally computationally efficient (though possibly at a minor memory loss). Josh> > Best, > > Russell S. Pierce, M.A. > Visual Cognition Lab > Department of Psychology > University of California, Riverside > 900 University Avenue > Riverside, CA 92521 > Lab Phone: (951) 827-7399 > > On Sat, Jan 29, 2011 at 9:12 AM, Russell Pierce <rpier001 at ucr.edu> wrote: >> Just in case someone else stumbles onto this thread and is facing a >> similar issue: ?The quick solution for me turned out to be using Glm >> and Predict in the rms package. ?Thanks go to Joshua and Ista for >> helping me out with this issue. ?Double thanks go to Joshua for >> suggesting I take a closer look at the rms package. >> >> library(rms) >> dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20)) >> dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) >> rms.res <- Glm(out ~ scale(xxA)*scale(xxB),data=dat) >> newdata <- as.data.frame(Predict(rms.res,xxA=c(-1,0,1),xxB=c(-1,0,1))[,1:3]) >> >> ----------------------------------- >> Russell S. Pierce, M.A. >> Visual Cognition Lab >> Department of Psychology >> University of California, Riverside >> 900 University Avenue >> Riverside, CA 92521 >> Lab Phone: (951) 827-7399
russell.s.pierce at gmail.com
2011-Jan-31 03:04 UTC
[R] User error in calling predict/model.frame
You are right, of course. Unanticipated. The work arounds are pretty straightforward, I just lacked the imagination to see why they were necessary. Best, Russell Sent via BlackBerry -----Original Message----- From: Joshua Wiley <jwiley.psych at gmail.com> Date: Sun, 30 Jan 2011 18:22:29 To: Russell Pierce<rpier001 at ucr.edu> Cc: r-help<r-help at r-project.org> Subject: Re: [R] User error in calling predict/model.frame On Sun, Jan 30, 2011 at 5:59 PM, Russell Pierce <rpier001 at ucr.edu> wrote:> R-help list and interested parties, > > On Cross Validated mpiktas correctly noted that both the I() and rms > Glm/Predict solution produce incorrect resultsYou probably meant it this way anyways, but I would say it produces undesired results ("incorrect" seems a bit unfair to the developers---predict() scales the data in a model built from a scaled object)> (http://stats.stackexchange.com/questions/6684/how-can-one-use-the-predict-function-on-a-lm-object-where-the-ivs-have-been-dynam/6718#6718). > ?As far as I can tell, the short version is that both I and rms leave > scale() in the formula for the lm object, so predict and Predict() run > scale on the provided newdata prior to generating the actual > prediction. ?So, for now, there appears no easy way to do this the way > I hoped. ?Time for me to get down to writing functions.or just scale outside of the formula, which can be done in a couple lines of code and equally computationally efficient (though possibly at a minor memory loss). Josh> > Best, > > Russell S. Pierce, M.A. > Visual Cognition Lab > Department of Psychology > University of California, Riverside > 900 University Avenue > Riverside, CA 92521 > Lab Phone: (951) 827-7399 > > On Sat, Jan 29, 2011 at 9:12 AM, Russell Pierce <rpier001 at ucr.edu> wrote: >> Just in case someone else stumbles onto this thread and is facing a >> similar issue: ?The quick solution for me turned out to be using Glm >> and Predict in the rms package. ?Thanks go to Joshua and Ista for >> helping me out with this issue. ?Double thanks go to Joshua for >> suggesting I take a closer look at the rms package. >> >> library(rms) >> dat <- data.frame(xxA = rnorm(20,10), xxB = rnorm(20,20)) >> dat$out <- with(dat,xxA+xxB+xxA*xxB+rnorm(20,20)) >> rms.res <- Glm(out ~ scale(xxA)*scale(xxB),data=dat) >> newdata <- as.data.frame(Predict(rms.res,xxA=c(-1,0,1),xxB=c(-1,0,1))[,1:3]) >> >> ----------------------------------- >> Russell S. Pierce, M.A. >> Visual Cognition Lab >> Department of Psychology >> University of California, Riverside >> 900 University Avenue >> Riverside, CA 92521 >> Lab Phone: (951) 827-7399