Daniel McGlinn
2009-Feb-19 01:52 UTC
[R] partial residuals & the output of residuals.lm(..., type="partial")
Dear list, I would like to know how the function residuals.lm calculates the partial residuals from an lm object with more than one predictor variable. In other words what is residuals.lm(...,type="partial") doing behind the scenes? According to the help file for residuals.lm (?residuals.lm), "The partial residuals are a matrix with each column formed by omitting a term from the model". Unfortunately, I cannot seem to recreate the results of the function "residuals.lm" by simply dropping a variable from a model and then calculating the raw residuals of the updated model. Can anyone see what I am overlooking? It may be helpful to others if I mention that the usage of residuals.lm(...,type='partial') by the function termplot is what motivated me to look at this function more closely. Below is a simple example to illustrate my question: set.seed(12) x1 <- runif(100) x2 <- runif(100) y <- .13+.25*x1+.70*x2+runif(100) mod <- lm(y~x1+x2) ##let's only consider the partial residuals when x2 is dropped from the model plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response')) abline(0,1) ##1:1 line ##why do the points not all fall on the 1:1 line? Thanks, Dan -- Daniel J. McGlinn Department of Botany, Oklahoma State University http://ecology.okstate.edu/Libra/
Daniel McGlinn
2009-Feb-19 02:57 UTC
[R] partial residuals & the output of residuals.lm(..., type="partial")
Dear list, After thinking about it a little more I solved my question of why I was calculating different residuals when using residuals.lm(...,type="partial") and when dropping a single term and recalculating the residuals. This is because the two variables are in a sense competing with one another in the full model if they are not completely orthogonal to one another. For example, with my hypothetical example from before if you make x1 and x2 and more correlated then the discrepancy between the two sets of residuals increases, but the problem can be solved if you make sure to use the same coefficients from the full model when computing the raw residuals without the other variable. Dan Here is the example that shows that more correlated x-variables make the problem even worse and a solution to my original question. set.seed(12) x1<-runif(100) x2<-x1+runif(100) ##this will make x1 and x2 more strongly correlated than in the first example (see original message) y<-.13+.25*x1+.70*x2+runif(100) mod<-lm(y~x1+x2) plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response')) abline(0,1) ##note how the degree of scatter increases around the 1:1 line ##here is the solution to the problem ##calculate the residuals by hand and make sure to use the estimated coefficients from the full model calc.resids<-y-cbind(rep(1,100),x1)%*%coef(mod)[-3] ##as before I will drop the influence of x2 from the model prediction ##center the calculated residuals calc.resids<- calc.resids-mean(calc.resids) plot(residuals(mod,type="partial")[,2],calc.resids) abline(0,1)##now all the points fall right on the line -------- Original Message -------- Subject: partial residuals & the output of residuals.lm(...,type="partial") From: Daniel McGlinn <daniel.mcglinn at okstate.edu> To: r-help at r-project.org <r-help at r-project.org> Date: 2/18/2009 7:52 PM> Dear list, > > I would like to know how the function residuals.lm calculates the > partial residuals from an lm object with more than one predictor > variable. In other words what is residuals.lm(...,type="partial") doing > behind the scenes? According to the help file for residuals.lm > (?residuals.lm), "The partial residuals are a matrix with each column > formed by omitting a term from the model". Unfortunately, I cannot seem > to recreate the results of the function "residuals.lm" by simply > dropping a variable from a model and then calculating the raw residuals > of the updated model. Can anyone see what I am overlooking? It may be > helpful to others if I mention that the usage of > residuals.lm(...,type='partial') by the function termplot is what > motivated me to look at this function more closely. Below is a simple > example to illustrate my question: > > set.seed(12) > > x1 <- runif(100) > x2 <- runif(100) > > y <- .13+.25*x1+.70*x2+runif(100) > > mod <- lm(y~x1+x2) > > ##let's only consider the partial residuals when x2 is dropped from the > model > plot(residuals(mod,type="partial")[,2],residuals(update(mod,.~.-x2),type='response')) > abline(0,1) ##1:1 line > ##why do the points not all fall on the 1:1 line? > > Thanks, > Dan > >-- Daniel J. McGlinn Department of Botany, Oklahoma State University 117 LSE Stillwater OK 74078 USA 405-612-1780 http://ecology.okstate.edu/Libra/