Maciej BliziĆski
2006-Sep-17 18:22 UTC
[R] Standard error of coefficient in linear regression
Hello R users, I have a substantial question about statistics, not about R itself, but I would love to have an answer from an R user, in form of an example in R syntax. I have spent whole Sunday searching in Google and browsing the books. I've been really close to the answer but there are at least three standard errors you can talk about in the linear regression and I'm really confused. The question is: How exactly are standard errors of coefficients calculated in the linear regression? Here's an example from a website I've read [1]. A company wants to know if there is a relationship between its advertising expenditures and its sales volume. =======================================================> exped <- c(4.2, 6.1, 3.9, 5.7, 7.3, 5.9)> sales <- c(27.1, 30.4, 25.0, 29.7, 40.1, 28.8) > S <- data.frame(exped, sales) > summary(lm(sales ~ exped, data = S))Call: lm(formula = sales ~ exped, data = S) Residuals: 1 2 3 4 5 6 1.7643 -1.9310 0.7688 -1.1583 3.3509 -2.7947 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.8725 5.2394 1.884 0.1326 exped 3.6817 0.9295 3.961 0.0167 * --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 2.637 on 4 degrees of freedom Multiple R-Squared: 0.7968, Adjusted R-squared: 0.7461 F-statistic: 15.69 on 1 and 4 DF, p-value: 0.01666 ======================================================= I can calculate the standard error of the estimate, according to the equation [2]...> S.m <- lm(sales ~ exped, data = S) > S$pred <- predict(S.m) > S$ye <- S$sales - S$pred > S$ye2 <- S$ye ^ 2 > Se <- sqrt(sum(S$ye2)/(length(S$sales) - 1 - 1)) > Se[1] 2.636901 ...which matches the "Residual standard error" and I'm on the right track. Next step would be to use the equation [3] to calculate the standard error of the regression coefficient (here: exped). The equation [3] uses two variables, meaning of which I can't really figure out. As the calculated value Sb is scalar, all the parameters need also to be scalars. I've already calculated Se, so I'm missing x and \bar{x}. The latter could be the estimated coefficient. What is x then? Regards, Maciej [1] http://www.statpac.com/statistics-calculator/correlation-regression.htm [2] http://www.answers.com/topic/standard-error-of-the-estimate [3] http://www.answers.com/topic/standard-error-of-the-regression-coefficient -- Maciej Blizi?ski <m.blizinski at wit.edu.pl> http://automatthias.wordpress.com
Dimitrios Rizopoulos
2006-Sep-17 19:13 UTC
[R] Standard error of coefficient in linear regression
these standard errors and other quantities are calculated as by products of the QR decomposition used in lm.fit(). A simple way (but not efficient) to obtain them is: exped <- c(4.2, 6.1, 3.9, 5.7, 7.3, 5.9) sales <- c(27.1, 30.4, 25.0, 29.7, 40.1, 28.8) S <- data.frame(exped, sales) lmfit <- lm(sales ~ exped, data = S) X <- model.matrix(lmfit) sigma2 <- sum((sales - fitted(lmfit))^2) / (nrow(X) - ncol(X)) sqrt(sigma2) sqrt(diag(solve(crossprod(X))) * sigma2) I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Maciej Blizi?ski <m.blizinski at wit.edu.pl>:> Hello R users, > > I have a substantial question about statistics, not about R itself, but > I would love to have an answer from an R user, in form of an example in > R syntax. I have spent whole Sunday searching in Google and browsing the > books. I've been really close to the answer but there are at least three > standard errors you can talk about in the linear regression and I'm > really confused. The question is: > > How exactly are standard errors of coefficients calculated in the linear > regression? > > Here's an example from a website I've read [1]. A company wants to know > if there is a relationship between its advertising expenditures and its > sales volume. > > =======================================================>> exped <- c(4.2, 6.1, 3.9, 5.7, 7.3, 5.9) >> sales <- c(27.1, 30.4, 25.0, 29.7, 40.1, 28.8) >> S <- data.frame(exped, sales) >> summary(lm(sales ~ exped, data = S)) > > Call: > lm(formula = sales ~ exped, data = S) > > Residuals: > 1 2 3 4 5 6 > 1.7643 -1.9310 0.7688 -1.1583 3.3509 -2.7947 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 9.8725 5.2394 1.884 0.1326 > exped 3.6817 0.9295 3.961 0.0167 * > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 2.637 on 4 degrees of freedom > Multiple R-Squared: 0.7968, Adjusted R-squared: 0.7461 > F-statistic: 15.69 on 1 and 4 DF, p-value: 0.01666 > =======================================================> > I can calculate the standard error of the estimate, according to the > equation [2]... > >> S.m <- lm(sales ~ exped, data = S) >> S$pred <- predict(S.m) >> S$ye <- S$sales - S$pred >> S$ye2 <- S$ye ^ 2 >> Se <- sqrt(sum(S$ye2)/(length(S$sales) - 1 - 1)) >> Se > [1] 2.636901 > > ...which matches the "Residual standard error" and I'm on the right > track. Next step would be to use the equation [3] to calculate the > standard error of the regression coefficient (here: exped). The equation > [3] uses two variables, meaning of which I can't really figure out. As > the calculated value Sb is scalar, all the parameters need also to be > scalars. I've already calculated Se, so I'm missing x and \bar{x}. The > latter could be the estimated coefficient. What is x then? > > Regards, > Maciej > > [1] http://www.statpac.com/statistics-calculator/correlation-regression.htm > [2] http://www.answers.com/topic/standard-error-of-the-estimate > [3] http://www.answers.com/topic/standard-error-of-the-regression-coefficient > > -- > Maciej Blizi?ski <m.blizinski at wit.edu.pl> > http://automatthias.wordpress.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
I believe that your confusion is due to a typo in the formula in [3], it is missing a sumation sign (and a subscript on x if you want to be picky). To get the denominator you subtract the mean of your x variable from all the x-values, square the differences, then sum them up (the missing sumation sign) and take the square root. This is essentially the standard deviation of your x variable but without dividing by (n-1). If you want to do this in R (a good thing while learning, there are better ways for actual analysis) you could use code like:> x.e <- exped - mean(exped) > x.e2 <- x.e^2 > sx2 <- sqrt(sum(x.e2)) > sb <- Se/sx2 # where Se is your residual standard error from belowHope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Maciej Blizinski Sent: Sunday, September 17, 2006 12:22 PM To: R - help Subject: [R] Standard error of coefficient in linear regression Hello R users, I have a substantial question about statistics, not about R itself, but I would love to have an answer from an R user, in form of an example in R syntax. I have spent whole Sunday searching in Google and browsing the books. I've been really close to the answer but there are at least three standard errors you can talk about in the linear regression and I'm really confused. The question is: How exactly are standard errors of coefficients calculated in the linear regression? Here's an example from a website I've read [1]. A company wants to know if there is a relationship between its advertising expenditures and its sales volume. =======================================================> exped <- c(4.2, 6.1, 3.9, 5.7, 7.3, 5.9) sales <- c(27.1, 30.4, 25.0,> 29.7, 40.1, 28.8) S <- data.frame(exped, sales) summary(lm(sales ~ > exped, data = S))Call: lm(formula = sales ~ exped, data = S) Residuals: 1 2 3 4 5 6 1.7643 -1.9310 0.7688 -1.1583 3.3509 -2.7947 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.8725 5.2394 1.884 0.1326 exped 3.6817 0.9295 3.961 0.0167 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.637 on 4 degrees of freedom Multiple R-Squared: 0.7968, Adjusted R-squared: 0.7461 F-statistic: 15.69 on 1 and 4 DF, p-value: 0.01666 ======================================================= I can calculate the standard error of the estimate, according to the equation [2]...> S.m <- lm(sales ~ exped, data = S) > S$pred <- predict(S.m) > S$ye <- S$sales - S$pred > S$ye2 <- S$ye ^ 2 > Se <- sqrt(sum(S$ye2)/(length(S$sales) - 1 - 1)) Se[1] 2.636901 ...which matches the "Residual standard error" and I'm on the right track. Next step would be to use the equation [3] to calculate the standard error of the regression coefficient (here: exped). The equation [3] uses two variables, meaning of which I can't really figure out. As the calculated value Sb is scalar, all the parameters need also to be scalars. I've already calculated Se, so I'm missing x and \bar{x}. The latter could be the estimated coefficient. What is x then? Regards, Maciej [1] http://www.statpac.com/statistics-calculator/correlation-regression.htm [2] http://www.answers.com/topic/standard-error-of-the-estimate [3] http://www.answers.com/topic/standard-error-of-the-regression-coefficient -- Maciej Blizi?ski <m.blizinski at wit.edu.pl> http://automatthias.wordpress.com ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.