I am studying statistics using R and a book "Understandable Statistics", by Brase and Brase. The book has two worked examples for calculating a confidence interval around a predicted value from a linear model. The answers to the two examples in the book differ from those I get from R. The regression line, the standard error, and the predicted value in R and the book all agree for the examples. Hence I gather that R and the book use different formula to calculate the confidence interval. Could someone explain why the difference exists, and which function(s) in R I might use to get the answers in the book, and (perhaps) an explanation as to which method to use in various situations). The example:> x<-c(10,20,30,40,50,60,70) > y<-c(17,21,25,28,33,40,49) > dat <- data.frame(temp=x,amnt=y)temp amnt 1 10 17 2 20 21 3 30 25 4 40 28 5 50 33 6 60 40 7 70 49 being a table of temperatures (temp) and the corresponding amounts of copper sulfate that disolve in 100g of water at that temperature. The regression line:> mod <- lm(amnt ~ temp,dat) > summary(mod)Call: lm(formula = amnt ~ temp, data = dat) Residuals: 1 2 3 4 5 6 7 1.7857 0.7143 -0.3571 -2.4286 -2.5000 -0.5714 3.3571 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.14286 1.98463 5.111 0.00374 ** temp 0.50714 0.04438 11.428 8.98e-05 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 2.348 on 5 degrees of freedom Multiple R-Squared: 0.9631, Adjusted R-squared: 0.9558 F-statistic: 130.6 on 1 and 5 DF, p-value: 8.985e-05 The .95 confidence interval for a temperature of 45 degrees:>foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fitT)> foo$fit fit lwr upr [1,] 32.96429 30.61253 35.31604 $se.fit [1] 0.9148715 $df [1] 5 $residual.scale [1] 2.348252 The book gives the confidence interval as 26.5 <= y <= 39.5. The book defines the confidence interval calculation thus: yp - E <= y <= yp + E Where E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx) yp is the predicted value from the regression line tc is the value from Student's t distribution for a confidence level, c, using n-2 degrees of freedom, sC is the standard error of estimate SSx is Sum(x^2)-[Sum(x)]^2/n n is the number of data pairs. So that even though the model, predicted value, standard error all agree, R gives a much smaller confidence interval than the book does. Thanks for any advice/help. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
You are looking for some (most?) statisticians call ``prediction interval'' ==> just give "prediction" instead of "confidence" :> predict(mod,data.frame(temp = 45),level = .95,+ interval = "prediction", se.fit = TRUE) $fit fit lwr upr [1,] 32.96429 26.48597 39.4426 $se.fit [1] 0.9148715 $df [1] 5 $residual.scale [1] 2.348252>>>>> "Fred" == Fred Mellender <fredm at frontiernet.net> >>>>> on Fri, 15 Nov 2002 11:43:28 -0500 writes:Fred> I am studying statistics using R and a book Fred> "Understandable Statistics", by Brase and Brase. The Fred> book has two worked examples for calculating a Fred> confidence interval around a predicted value from a Fred> linear model. The answers to the two examples in the Fred> book differ from those I get from R. The regression Fred> line, the standard error, and the predicted value in R Fred> and the book all agree for the examples. Hence I Fred> gather that R and the book use different formula to Fred> calculate the confidence interval. Could someone Fred> explain why the difference exists, and which Fred> function(s) in R I might use to get the answers in the Fred> book, and (perhaps) an explanation as to which method Fred> to use in various situations). Fred> The example: >> x<-c(10,20,30,40,50,60,70) y<-c(17,21,25,28,33,40,49) dat >> <- data.frame(temp=x,amnt=y) Fred> temp amnt 1 10 17 2 20 21 3 30 25 4 40 28 5 50 33 6 Fred> 60 40 7 70 49 Fred> being a table of temperatures (temp) and the Fred> corresponding amounts of copper sulfate that disolve Fred> in 100g of water at that temperature. Fred> The regression line: >> mod <- lm(amnt ~ temp,dat) summary(mod) Fred> Call: lm(formula = amnt ~ temp, data = dat) Fred> Residuals: 1 2 3 4 5 6 7 1.7857 0.7143 -0.3571 -2.4286 Fred> -2.5000 -0.5714 3.3571 Fred> Coefficients: Estimate Std. Error t value Pr(>|t|) Fred> (Intercept) 10.14286 1.98463 5.111 0.00374 ** temp Fred> 0.50714 0.04438 11.428 8.98e-05 *** --- Signif. codes: Fred> 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Fred> Residual standard error: 2.348 on 5 degrees of freedom Fred> Multiple R-Squared: 0.9631, Adjusted R-squared: 0.9558 Fred> F-statistic: 130.6 on 1 and 5 DF, p-value: 8.985e-05 Fred> The .95 confidence interval for a temperature of 45 Fred> degrees: >> Fred> foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fit Fred> T) >> foo Fred> $fit fit lwr upr [1,] 32.96429 30.61253 35.31604 Fred> $se.fit [1] 0.9148715 Fred> $df [1] 5 Fred> $residual.scale [1] 2.348252 Fred> The book gives the confidence interval as 26.5 <= y < Fred> 39.5. The book defines the confidence interval Fred> calculation thus: Fred> yp - E <= y <= yp + E Fred> Where E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx) yp Fred> is the predicted value from the regression line tc is Fred> the value from Student's t distribution for a Fred> confidence level, c, using n-2 degrees of freedom, sC Fred> is the standard error of estimate SSx is Fred> Sum(x^2)-[Sum(x)]^2/n n is the number of data pairs. Fred> So that even though the model, predicted value, Fred> standard error all agree, R gives a much smaller Fred> confidence interval than the book does. Fred> Thanks for any advice/help. Fred> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- Fred> r-help mailing list -- Read Fred> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send Fred> "info", "help", or "[un]subscribe" (in the "body", not Fred> the subject !) To: r-help-request at stat.math.ethz.ch Fred> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 15 Nov 2002, Fred Mellender wrote:> I am studying statistics using R and a book "Understandable Statistics", by > Brase and Brase. The book has two > worked examples for calculating a confidence interval around a predicted > value from a linear model.<snip>> The book gives the confidence interval as 26.5 <= y <= 39.5. The book > defines the confidence interval calculation thus: > > yp - E <= y <= yp + E > > Where > E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx)You asked R for a confidence interval for the predicted mean at x. If you want a prediction interval at x you need interval="prediction" not interval="confidence". -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"Fred Mellender" <fredm at frontiernet.net> writes:> The book gives the confidence interval as 26.5 <= y <= 39.5. The book > defines the confidence interval calculation thus: > > yp - E <= y <= yp + E > > Where > E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx) > yp is the predicted value from the regression line > tc is the value from Student's t distribution for a confidence > level, c, using n-2 degrees of freedom, > sC is the standard error of estimate > SSx is Sum(x^2)-[Sum(x)]^2/n > n is the number of data pairs. > > So that even though the model, predicted value, standard error all agree, R > gives a much smaller confidence > interval than the book does. > > Thanks for any advice/help.The book is giving you a prediction interval, aka a tolerance interval. Some people use the term "confidence interval" a bit too sloppily. predict() will give you the other kind of interval if you ask it to. Vice versa, E = tc*sC *sqrt(1/n + (x-xBar)^2/SSx) would give you the confidence interval for the predicted mean, I think. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
The problem is with your quote from the book. That formula is not a confidence interval, it is a tolerance interval, Use predict.lm with interval="prediction" to get it. I suggest you get a better (or at least more understandable) book! On Fri, 15 Nov 2002, Fred Mellender wrote:> I am studying statistics using R and a book "Understandable Statistics", by > Brase and Brase. The book has two > worked examples for calculating a confidence interval around a predicted > value from a linear model. The answers > to the two examples in the book differ from those I get from R. The > regression line, the standard error, and the > predicted value in > R and the book all agree for the examples. Hence I gather that R and the > book use different formula to calculate > the confidence interval. Could someone explain why the difference exists, > and which function(s) in R I might use > to get the answers in the book, and (perhaps) an explanation as to which > method to use in various situations). > > The example: > > > x<-c(10,20,30,40,50,60,70) > > y<-c(17,21,25,28,33,40,49) > > dat <- data.frame(temp=x,amnt=y) > temp amnt > 1 10 17 > 2 20 21 > 3 30 25 > 4 40 28 > 5 50 33 > 6 60 40 > 7 70 49 > > being a table of temperatures (temp) and the corresponding amounts of copper > sulfate that disolve in 100g of water > at that temperature. > > The regression line: > > > mod <- lm(amnt ~ temp,dat) > > summary(mod) > > Call: > lm(formula = amnt ~ temp, data = dat) > > Residuals: > 1 2 3 4 5 6 7 > 1.7857 0.7143 -0.3571 -2.4286 -2.5000 -0.5714 3.3571 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 10.14286 1.98463 5.111 0.00374 ** > temp 0.50714 0.04438 11.428 8.98e-05 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 2.348 on 5 degrees of freedom > Multiple R-Squared: 0.9631, Adjusted R-squared: 0.9558 > F-statistic: 130.6 on 1 and 5 DF, p-value: 8.985e-05 > > The .95 confidence interval for a temperature of 45 degrees: > > > foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fit> T) > > foo > $fit > fit lwr upr > [1,] 32.96429 30.61253 35.31604 > > $se.fit > [1] 0.9148715 > > $df > [1] 5 > > $residual.scale > [1] 2.348252 > > The book gives the confidence interval as 26.5 <= y <= 39.5. The book > defines the confidence interval calculation thus: > > yp - E <= y <= yp + E > > Where > E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx) > yp is the predicted value from the regression line > tc is the value from Student's t distribution for a confidence > level, c, using n-2 degrees of freedom, > sC is the standard error of estimate > SSx is Sum(x^2)-[Sum(x)]^2/n > n is the number of data pairs. > > So that even though the model, predicted value, standard error all agree, R > gives a much smaller confidence > interval than the book does. > > Thanks for any advice/help. > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley wrote:> You asked R for a confidence interval for the predicted mean at x. If you > want a prediction interval at x you need interval="prediction" not > interval="confidence".Slightly OT question to this thread: How can I get critical values for given distribution density? E.g. function f which would give me 2.228 for ft(p=0.05,df=10) (i.e., t for student distribution with given level probability). Sorry for newbie question. Matej -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._