kehler@mathstat.dal.ca
2005-Jul-20 15:28 UTC
[R] predict.lm - standard error of predicted means?
Simple question. For a simple linear regression, I obtained the "standard error of predicted means", for both a confidence and prediction interval: x<-1:15 y<-x + rnorm(n=15) model<-lm(y~x) predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="confidence")$se.fit 1 2 0.2708064 0.7254615 predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="prediction")$se.fit 1 2 0.2708064 0.7254615 I was surprised to find that the standard errors returned were in fact the standard errors of the sampling distribution of Y_hat: sqrt(MSE(1/n + (x-x_bar)^2/SS_x)), not the standard errors of Y_new (predicted value): sqrt(MSE(1 + 1/n + (x-x_bar)^2/SS_x)). Is there a reason this quantity is called the "standard error of predicted means" if it doesn't relate to the prediction distribution? Turning to Neter et al.'s Applied Linear Statistical Models, I note that if we have multiple observations, then the standard error of the mean of the predicted value: sqrt(MSE(1/m + 1/n + (x-x_bar)^2/SS_x)), reverts to the standard error of the sampling distribution of Y-hat, as m, the number of samples, gets large. Still, this doesn't explain the result for small sample sizes. Using R.2.1 for Windows
kehler at mathstat.dal.ca writes:> Simple question. > > For a simple linear regression, I obtained the "standard error of > predicted means", for both a confidence and prediction interval: > > x<-1:15 > y<-x + rnorm(n=15) > model<-lm(y~x) > predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="confidence")$se.fit > 1 2 > 0.2708064 0.7254615 > > predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="prediction")$se.fit > 1 2 > 0.2708064 0.7254615 > > > I was surprised to find that the standard errors returned were in fact the > standard errors of the sampling distribution of Y_hat: > > sqrt(MSE(1/n + (x-x_bar)^2/SS_x)), > > not the standard errors of Y_new (predicted value): > > sqrt(MSE(1 + 1/n + (x-x_bar)^2/SS_x)). > > Is there a reason this quantity is called the "standard error of predicted > means" if it doesn't relate to the prediction distribution?Yes. Yhat is the predicted mean and se.fit is its standard deviation. It doesn't change its meaning because you desire another kind of prediction interval.> Turning to Neter et al.'s Applied Linear Statistical Models, I note that > if we have multiple observations, then the standard error of the mean of > the predicted value: > > sqrt(MSE(1/m + 1/n + (x-x_bar)^2/SS_x)), > > reverts to the standard error of the sampling distribution of Y-hat, as m, > the number of samples, gets large. Still, this doesn't explain the result > for small sample sizes.You can make completely similar considerations regarding the standard errors of and about an estimated mean: sigma*sqrt(1+1/n) vs. sigma*sqrt(1/m + 1/n) vs. sigma*sqrt(1/n). SEM is still the latter quantity even if you are interested in another kind of prediction limit. -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907