thr3ads.net - R help - [R] formula used by R to compute the t-values in a linear regression [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Samuel Le

2011-Aug-01 13:27 UTC

[R] formula used by R to compute the t-values in a linear regression

Hello,



I was wondering if someone knows the formula used by the function lm to compute
the t-values.



I am trying to implement a linear regression myself. Assuming that I have K
variables, and N observations, the formula I am using is:

For the k-th variable, t-value= b_k/sigma_k



With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1)
_kk is its standard deviation.



I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))



With sigma: the estimated standard deviation of the residuals,

Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)



With:

N: number of observations

K: number of variables



This formula comes from my old course of econometrics.

For some reason it doesn't match the t-value produced by R (I am off by
about 1%). I can match the other results produced by R (coefficients of the
regression, r squared, etc.).



I would be grateful if someone could provide some clarifications.



Samuel


	[[alternative HTML version deleted]]

David Winsemius

2011-Aug-01 13:44 UTC

head link

[R] formula used by R to compute the t-values in a linear regression

On Aug 1, 2011, at 9:27 AM, Samuel Le wrote:
> Hello,
>
>
> I was wondering if someone knows the formula used by the function lm  
> to compute the t-values.
>
> I am trying to implement a linear regression myself. Assuming that I  
> have K variables, and N observations, the formula I am using is:
>
> For the k-th variable, t-value= b_k/sigma_k
>
> With b_k is the coefficient for the k-th variable, and sigma_k  
> =(t(x) x )^(-1) _kk is its standard deviation.
>
> I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))
>
> With sigma: the estimated standard deviation of the residuals,
>
> Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)
>
> With:
>
> N: number of observations
>
> K: number of variables
>
> This formula comes from my old course of econometrics.
>
> For some reason it doesn't match the t-value produced by R (I am off  
> by about 1%). I can match the other results produced by R  
> (coefficients of the regression, r squared, etc.).
Usually such a small difference results from using different degrees  
of freedom. Have you reduced the df's appropriately after considering  
the number of other estimated parameters? Just quoting code from you  
econometrics reference is not enough to answer the question. We would  
need to see code... as the message states at the end of every posting.)
>
> I would be grateful if someone could provide some clarifications.
>
>
>
> Samuel
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

peter dalgaard

2011-Aug-01 13:45 UTC

head link

[R] formula used by R to compute the t-values in a linear regression

On Aug 1, 2011, at 15:27 , Samuel Le wrote:
> Hello,
> 
> 
> 
> I was wondering if someone knows the formula used by the function lm to
compute the t-values.
> 
> 
> 
> I am trying to implement a linear regression myself. Assuming that I have K
variables, and N observations, the formula I am using is:
> 
> For the k-th variable, t-value= b_k/sigma_k
> 
> 
> 
> With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x
)^(-1) _kk is its standard deviation.
> 
> 
> 
> I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2))
> 
> 
> 
> With sigma: the estimated standard deviation of the residuals,
> 
> Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2)
> 
> 
> 
> With:
> 
> N: number of observations
> 
> K: number of variables
> 
> 
> 
> This formula comes from my old course of econometrics.
> 
> For some reason it doesn't match the t-value produced by R (I am off by
about 1%). I can match the other results produced by R (coefficients of the
regression, r squared, etc.).
> 
> 
> 
> I would be grateful if someone could provide some clarifications.

AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k
involves matrix inversion. Also, even for K=1, beware that textbook formulas
like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and
easily loses multiple digits of precision, so software tends to use rather more
careful algorithms.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
"D?den skal tape!" --- Nordahl Grieg

S Ellison

2011-Aug-01 14:15 UTC

head link

[R] formula used by R to compute the t-values in a linear regression

> -----Original Message-----
> [mailto:r-help-bounces at r-project.org] On Behalf Of Samuel Le
> Subject: [R] formula used by R to compute the t-values in a 
> linear regression
> I was wondering if someone knows the formula used by the 
> function lm to compute the t-values.
Typing 
summary.lm

I found the standard error and t calculation (for around line 58-62 of the
resulting listing.
    resvar <- rss/rdf
    R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
    se <- sqrt(diag(R) * resvar)
    est <- z$coefficients[Qr$pivot[p1]]
    tval <- est/se

You can also find (rather further up) that the degrees of freedom df used are
taken directly from the linear model $df (z$df in the function). Others noted
that incorrect df often cause problems, so checking that you're using the
correct df is possible by inspecting the lm summary.

The standard errors are apparently (as is usual for a least squares problem, I
think) taken from the diagonal of  the inverse of the hessian, multiplied by the
residual variance. Unfortunately I could not get at the hessian calculation
quite as easily (it looks like it uses a function that's not exported from
stats) so that's left as an exercise in browsing source code ...

S Ellison


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2011 - formula used by R to compute the t-values in a linear regression

[R] formula used by R to compute the t-values in a linear regression

[R] formula used by R to compute the t-values in a linear regression

[R] formula used by R to compute the t-values in a linear regression

[R] formula used by R to compute the t-values in a linear regression

Maybe Matching Threads