msa@biostat.mgh.harvard.edu
2005-Apr-04 18:01 UTC
[Rd] Problems with predict.lm: incorrect SE estimate (PR#7772)
Full_Name: Marek Ancukiewicz Version: 2.01 OS: Linux Submission from: (NULL) (132.183.12.87) It seems that the the standard error of prediction of the linear regression, caclulated with predict.lm is incorrect. Consider the following example where the standard error is first calculated with predict.lm, then using delta method. and finally, using the formula rms*sqrt(1+1/n+(xp-x0)^2/Sxx). Marek Ancukiewicz> n <- 10 > x <- 1:n > y <- x > y[c(2,4,6)] <- y[c(2,4,6)] + 0.1 > y[c(3,5,7)] <- y[c(3,5,7)] - 0.1 > a <- lm(y~x) > rms <- sqrt(sum(a$residuals^2)/(n-2)) > s <- covmat(a)*rms^2 > xp <- 3 > xm <- mean(x) > summary(a)Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -0.10909 -0.07500 0.01091 0.06955 0.10182 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.020000 0.058621 0.341 0.742 x 0.996364 0.009448 105.463 7.3e-14 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.08581 on 8 degrees of freedom Multiple R-Squared: 0.9993, Adjusted R-squared: 0.9992 F-statistic: 1.112e+04 on 1 and 8 DF, p-value: 7.3e-14> print(predict(a,new=data.frame(x=xp),se.fit=T))$fit [1] 3.009091 $se.fit [1] 0.0359752 $df [1] 8 $residual.scale [1] 0.08581163> print(se.delta.method <- sqrt(s[1,1]+xp^2*s[2,2]+2*xp*s[1,2] + rms^2))[1] 0.09304758> print(se.ss.formula <- rms*sqrt(1+1/n+(xp-xm)^2/sum((x-xm)^2)))[1] 0.09304758
murdoch@stats.uwo.ca
2005-Apr-04 18:19 UTC
[Rd] Problems with predict.lm: incorrect SE estimate (PR#7772)
On Mon, 4 Apr 2005 18:01:05 +0200 (CEST), msa@biostat.mgh.harvard.edu wrote :>Full_Name: Marek Ancukiewicz >Version: 2.01 >OS: Linux >Submission from: (NULL) (132.183.12.87) > > >It seems that the the standard error of prediction of the linear regression, >caclulated with predict.lm is incorrect. Consider the following example where >the standard error is first calculated with predict.lm, then using delta >method. and finally, using the formula rms*sqrt(1+1/n+(xp-x0)^2/Sxx).Your formula is incorrect. You've got the formula for the so called "prediction error" (i.e. the stddev of the difference between the prediction and a new observation) rather than the "standard error" (i.e. the stddev of the prediction). Duncan Murdoch> >Marek Ancukiewicz > >> n <- 10 >> x <- 1:n >> y <- x >> y[c(2,4,6)] <- y[c(2,4,6)] + 0.1 >> y[c(3,5,7)] <- y[c(3,5,7)] - 0.1 >> a <- lm(y~x) >> rms <- sqrt(sum(a$residuals^2)/(n-2)) >> s <- covmat(a)*rms^2 >> xp <- 3 >> xm <- mean(x) >> summary(a) > >Call: >lm(formula = y ~ x) > >Residuals: > Min 1Q Median 3Q Max >-0.10909 -0.07500 0.01091 0.06955 0.10182 > >Coefficients: > Estimate Std. Error t value Pr(>|t|) >(Intercept) 0.020000 0.058621 0.341 0.742 >x 0.996364 0.009448 105.463 7.3e-14 *** >--- >Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > >Residual standard error: 0.08581 on 8 degrees of freedom >Multiple R-Squared: 0.9993, Adjusted R-squared: 0.9992 >F-statistic: 1.112e+04 on 1 and 8 DF, p-value: 7.3e-14 > >> print(predict(a,new=data.frame(x=xp),se.fit=T)) >$fit >[1] 3.009091 > >$se.fit >[1] 0.0359752 > >$df >[1] 8 > >$residual.scale >[1] 0.08581163 > >> print(se.delta.method <- sqrt(s[1,1]+xp^2*s[2,2]+2*xp*s[1,2] + rms^2)) >[1] 0.09304758 >> print(se.ss.formula <- rms*sqrt(1+1/n+(xp-xm)^2/sum((x-xm)^2))) >[1] 0.09304758 > >______________________________________________ >R-devel@stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel
msa@biostat.mgh.harvard.edu
2005-Apr-04 19:07 UTC
[Rd] Problems with predict.lm: incorrect SE estimate (PR#7772)
Thanks! I did not realize that predict.lm calculates the standard error of the prediction (which goes asumptotically to zero) rather then prediction error. The formula for standard erro should omits leding 1 under square root and with delta method, one should omit rms^2 component. Then everything agrees nicely with predict.lm output. Sorry about bothering, I did not get a good hit this time. Marek Ancukiewicz> From: Duncan Murdoch <murdoch@stats.uwo.ca> > Cc: R-bugs@biostat.ku.dk > Date: Mon, 04 Apr 2005 12:19:32 -0400 > > On Mon, 4 Apr 2005 18:01:05 +0200 (CEST), msa@biostat.mgh.harvard.edu > wrote : > > >Full_Name: Marek Ancukiewicz > >Version: 2.01 > >OS: Linux > >Submission from: (NULL) (132.183.12.87) > > > > > >It seems that the the standard error of prediction of the linear regression, > >caclulated with predict.lm is incorrect. Consider the following example where > >the standard error is first calculated with predict.lm, then using delta > >method. and finally, using the formula rms*sqrt(1+1/n+(xp-x0)^2/Sxx). > > Your formula is incorrect. You've got the formula for the so called > "prediction error" (i.e. the stddev of the difference between the > prediction and a new observation) rather than the "standard error" > (i.e. the stddev of the prediction). > > Duncan Murdoch > > > >Marek Ancukiewicz > > > >> n <- 10 > >> x <- 1:n > >> y <- x > >> y[c(2,4,6)] <- y[c(2,4,6)] + 0.1 > >> y[c(3,5,7)] <- y[c(3,5,7)] - 0.1 > >> a <- lm(y~x) > >> rms <- sqrt(sum(a$residuals^2)/(n-2)) > >> s <- covmat(a)*rms^2 > >> xp <- 3 > >> xm <- mean(x) > >> summary(a) > > > >Call: > >lm(formula = y ~ x) > > > >Residuals: > > Min 1Q Median 3Q Max > >-0.10909 -0.07500 0.01091 0.06955 0.10182 > > > >Coefficients: > > Estimate Std. Error t value Pr(>|t|) > >(Intercept) 0.020000 0.058621 0.341 0.742 > >x 0.996364 0.009448 105.463 7.3e-14 *** > >--- > >Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > >Residual standard error: 0.08581 on 8 degrees of freedom > >Multiple R-Squared: 0.9993, Adjusted R-squared: 0.9992 > >F-statistic: 1.112e+04 on 1 and 8 DF, p-value: 7.3e-14 > > > >> print(predict(a,new=data.frame(x=xp),se.fit=T)) > >$fit > >[1] 3.009091 > > > >$se.fit > >[1] 0.0359752 > > > >$df > >[1] 8 > > > >$residual.scale > >[1] 0.08581163 > > > >> print(se.delta.method <- sqrt(s[1,1]+xp^2*s[2,2]+2*xp*s[1,2] + rms^2)) > >[1] 0.09304758 > >> print(se.ss.formula <- rms*sqrt(1+1/n+(xp-xm)^2/sum((x-xm)^2))) > >[1] 0.09304758 > > > >______________________________________________ > >R-devel@stat.math.ethz.ch mailing list > >https://stat.ethz.ch/mailman/listinfo/r-devel >