Paul Johnson
2011-Dec-14 06:30 UTC
[Rd] termplot & predict.lm. some details about calculating predicted values with "other variables set at the mean"
I'm making some functions to illustrate regressions and I have been staring at termplot and predict.lm and residuals.lm to see how this is done. I've wondered who wrote predict.lm originally, because I think it is very clever. I got interested because termplot doesn't work with interactive models:> m1 <- lm(y ~ x1*x2) > termplot(m1)Error in `[.data.frame`(mf, , i) : undefined columns selected Digging into that, I realized some surprising implications of nonlinear formulas. This issue arises when there are math functions in the regression formula. The question focuses on what we mean by the mean of "x" when we are discussing predictions and deviations. Suppose one fits: m1 <- lm (y ~ x1 + log(x2), data=dat) I had thought the partial residual was calculated with reference to the log of the mean of x2. But that's not right. It is calculated with reference to mean(log(x2)). That seems misleading, termplot shows a graph illustrating the effect of x2 on the horizontal axis (not "log(x2)"). I should not say misleading. Rather, it is unexpected. I think users who want the reference value in the plot of x2 to be the mean of x2 have a legitimate concern here. With a more elaborate formula, the mismatch gets more confusing. Suppose the regression formula is m2 <- lm (y ~ x1 + poly(x2,3), data=dat) The model frame has these variables: y x1 poly(x2, 3).1 poly(x2, 3).2 poly(x2, 3).3 and the partial residual calculation for variable x1, which I had expected would be based on a polynomial transformation of mean(x2), is the weighted sum of the means of the 3 polys. Can you help me see this more clearly? (Or less wrongly?) Perhaps you think I don't understand partial residuals in termplot, but I am pretty sure I do. I made notes about it. See slides 54 and 55 in here: http://pj.freefaculty.org/guides/Rcourse/regression-tableAndPlot-1/regression-tableAndPlot.pdf -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
Paul Johnson
2011-Dec-14 06:39 UTC
[Rd] termplot & predict.lm. some details about calculating predicted values with "other variables set at the mean"
I'm making some functions to illustrate regressions and I have been staring at termplot and predict.lm and residuals.lm to see how this is done. I've wondered who wrote predict.lm originally, because I think it is very clever. I got interested because termplot doesn't work with interactive models:> m1 <- lm(y ~ x1*x2) > termplot(m1)Error in `[.data.frame`(mf, , i) : undefined columns selected Digging into that, I realized some surprising implications of nonlinear formulas. This issue arises when there are math functions in the regression formula. The question focuses on what we mean by the mean of "x" when we are discussing predictions and deviations. Suppose one fits: m1 <- lm (y ~ x1 + log(x2), data=dat) I had thought the partial residual was calculated with reference to the log of the mean of x2. But that's not right. It is calculated with reference to mean(log(x2)). That seems misleading, termplot shows a graph illustrating the effect of x2 on the horizontal axis (not "log(x2)"). I should not say misleading. Rather, it is unexpected. I think users who want the reference value in the plot of x2 to be the mean of x2 have a legitimate concern here. With a more elaborate formula, the mismatch gets more confusing. Suppose the regression formula is m2 <- lm (y ~ x1 + poly(x2,3), data=dat) The model frame has these variables: y x1 poly(x2, 3).1 poly(x2, 3).2 poly(x2, 3).3 and the partial residual calculation for variable x1, which I had expected would be based on a polynomial transformation of mean(x2), is the weighted sum of the means of the 3 polys. Can you help me see this more clearly? (Or less wrongly?) Perhaps you think I don't understand partial residuals in termplot, but I am pretty sure I do. I made notes about it. See slides 54 and 55 in here: http://pj.freefaculty.org/guides/Rcourse/regression-tableAndPlot-1/regression-tableAndPlot.pdf -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas