kehler@mathstat.dal.ca
2005-Jul-20 15:28 UTC
[R] predict.lm - standard error of predicted means?
Simple question.
For a simple linear regression, I obtained the "standard error of
predicted means", for both a confidence and prediction interval:
x<-1:15
y<-x + rnorm(n=15)
model<-lm(y~x)
predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="confidence")$se.fit
1 2
0.2708064 0.7254615
predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="prediction")$se.fit
1 2
0.2708064 0.7254615
I was surprised to find that the standard errors returned were in fact the
standard errors of the sampling distribution of Y_hat:
sqrt(MSE(1/n + (x-x_bar)^2/SS_x)),
not the standard errors of Y_new (predicted value):
sqrt(MSE(1 + 1/n + (x-x_bar)^2/SS_x)).
Is there a reason this quantity is called the "standard error of predicted
means" if it doesn't relate to the prediction distribution?
Turning to Neter et al.'s Applied Linear Statistical Models, I note that
if we have multiple observations, then the standard error of the mean of
the predicted value:
sqrt(MSE(1/m + 1/n + (x-x_bar)^2/SS_x)),
reverts to the standard error of the sampling distribution of Y-hat, as m,
the number of samples, gets large. Still, this doesn't explain the result
for small sample sizes.
Using R.2.1 for Windows
kehler at mathstat.dal.ca writes:> Simple question. > > For a simple linear regression, I obtained the "standard error of > predicted means", for both a confidence and prediction interval: > > x<-1:15 > y<-x + rnorm(n=15) > model<-lm(y~x) > predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="confidence")$se.fit > 1 2 > 0.2708064 0.7254615 > > predict.lm(model,newdata=data.frame(x=c(10,20)),se.fit=T,interval="prediction")$se.fit > 1 2 > 0.2708064 0.7254615 > > > I was surprised to find that the standard errors returned were in fact the > standard errors of the sampling distribution of Y_hat: > > sqrt(MSE(1/n + (x-x_bar)^2/SS_x)), > > not the standard errors of Y_new (predicted value): > > sqrt(MSE(1 + 1/n + (x-x_bar)^2/SS_x)). > > Is there a reason this quantity is called the "standard error of predicted > means" if it doesn't relate to the prediction distribution?Yes. Yhat is the predicted mean and se.fit is its standard deviation. It doesn't change its meaning because you desire another kind of prediction interval.> Turning to Neter et al.'s Applied Linear Statistical Models, I note that > if we have multiple observations, then the standard error of the mean of > the predicted value: > > sqrt(MSE(1/m + 1/n + (x-x_bar)^2/SS_x)), > > reverts to the standard error of the sampling distribution of Y-hat, as m, > the number of samples, gets large. Still, this doesn't explain the result > for small sample sizes.You can make completely similar considerations regarding the standard errors of and about an estimated mean: sigma*sqrt(1+1/n) vs. sigma*sqrt(1/m + 1/n) vs. sigma*sqrt(1/n). SEM is still the latter quantity even if you are interested in another kind of prediction limit. -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907