thr3ads.net - R help - [R] lm without intercept [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Jan

2011-Feb-18 10:49 UTC

[R] lm without intercept

Hi,

I am not a statistics expert, so I have this question. A linear model
gives me the following summary:

Call:
lm(formula = N ~ N_alt)

Residuals:
    Min      1Q  Median      3Q     Max 
-110.30  -35.80  -22.77   38.07  122.76 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  13.5177   229.0764   0.059   0.9535  
N_alt         0.2832     0.1501   1.886   0.0739 .
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 56.77 on 20 degrees of freedom
  (16 observations deleted due to missingness)
Multiple R-squared: 0.151, Adjusted R-squared: 0.1086 
F-statistic: 3.558 on 1 and 20 DF,  p-value: 0.07386 

The regression is not very good (high p-value, low R-squared). 
The Pr value for the intercept seems to indicate that it is zero with a
very high probability (95.35%). So I repeat the regression forcing the
intercept to zero:

Call:
lm(formula = N ~ N_alt - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-110.11  -36.35  -22.13   38.59  123.23 

Coefficients:
      Estimate Std. Error t value Pr(>|t|)    
N_alt 0.292046   0.007742   37.72   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

Residual standard error: 55.41 on 21 degrees of freedom
  (16 observations deleted due to missingness)
Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848 
F-statistic:  1423 on 1 and 21 DF,  p-value: < 2.2e-16 

1. Is my interpretation correct?
2. Is it possible that just by forcing the intercept to become zero, a
bad regression becomes an extremely good one?
3. Why doesn't lm suggest a value of zero (or near zero) by itself if
the regression is so much better with it?

Please excuse my ignorance.

Jan Rheinl?nder

Achim Zeileis

2011-Feb-18 11:25 UTC

head link

[R] lm without intercept

On Fri, 18 Feb 2011, Jan wrote:
> Hi,
>
> I am not a statistics expert, so I have this question. A linear model
> gives me the following summary:
>
> Call:
> lm(formula = N ~ N_alt)
>
> Residuals:
>    Min      1Q  Median      3Q     Max 
> -110.30  -35.80  -22.77   38.07  122.76 
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|) 
> (Intercept)  13.5177   229.0764   0.059   0.9535 
> N_alt         0.2832     0.1501   1.886   0.0739 .
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
>
> Residual standard error: 56.77 on 20 degrees of freedom
>  (16 observations deleted due to missingness)
> Multiple R-squared: 0.151, Adjusted R-squared: 0.1086 
> F-statistic: 3.558 on 1 and 20 DF,  p-value: 0.07386 
>
> The regression is not very good (high p-value, low R-squared).
Yes.
> The Pr value for the intercept seems to indicate that it is zero with a
> very high probability (95.35%).
Not quite. Consult your statistics textbook for the correct interpretation 
of p-values. Under the null hypothesis of a true intercept of zero, it is 
very likely to observe an intercept as large as 13.52 or larger.
> So I repeat the regression forcing the intercept to zero:
Do you have a good interpretation for that?
> Call:
> lm(formula = N ~ N_alt - 1)
>
> Residuals:
>    Min      1Q  Median      3Q     Max 
> -110.11  -36.35  -22.13   38.59  123.23 
>
> Coefficients:
>      Estimate Std. Error t value Pr(>|t|) 
> N_alt 0.292046   0.007742   37.72   <2e-16 ***
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 
>
> Residual standard error: 55.41 on 21 degrees of freedom
>  (16 observations deleted due to missingness)
> Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848 
> F-statistic:  1423 on 1 and 21 DF,  p-value: < 2.2e-16 
>
> 1. Is my interpretation correct?
> 2. Is it possible that just by forcing the intercept to become zero, a
> bad regression becomes an extremely good one?
> 3. Why doesn't lm suggest a value of zero (or near zero) by itself if
> the regression is so much better with it?
The model without intercept needs to be interpreted differently. The 
p-value pertains to a regression with intercept zero and slope 0.292 
against a model with both intercept zero and slope zero. If I had to 
guess, I would say this is not a very meaningful comparison for your data. 
The same is true for the R-squared (see also ?summary.lm for its 
definition in the case without intercept).

hth,
Z
> Please excuse my ignorance.
>
> Jan Rheinl?nder
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Dennis Murphy

2011-Feb-18 12:27 UTC

head link

[R] lm without intercept

Hi:

On Fri, Feb 18, 2011 at 2:49 AM, Jan <jrheinlaender@gmx.de> wrote:
> Hi,
>
> I am not a statistics expert, so I have this question. A linear model
> gives me the following summary:
>
> Call:
> lm(formula = N ~ N_alt)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -110.30  -35.80  -22.77   38.07  122.76
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  13.5177   229.0764   0.059   0.9535
> N_alt         0.2832     0.1501   1.886   0.0739 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 56.77 on 20 degrees of freedom
>  (16 observations deleted due to missingness)
> Multiple R-squared: 0.151, Adjusted R-squared: 0.1086
> F-statistic: 3.558 on 1 and 20 DF,  p-value: 0.07386
>
> The regression is not very good (high p-value, low R-squared).
> The Pr value for the intercept seems to indicate that it is zero with a
> very high probability (95.35%). So I repeat the regression forcing the
> intercept to zero:
>
That's not the interpretation of a p-value. What it means is: *given that
the null hypothesis beta0 = 0 is true*, the probability of observing a value
of the t-statistic *more extreme than the observed value of 0.059* is about
0.9535. The presumption that H_0 is true for the purpose of the test allows
one to derive a 'reference distribution' (in this case, the
t-distribution
with error degrees of freedom) against which one can compare the observed
value of the t-statistic. The second part of the emphasized statement
provides a context for which the p-value can be correctly interpreted in
relation to the reference distribution of the test statistic when H_0 is
true.

You're evidently trying to interpret the p-value as the probability that the
null hypothesis is true. No.

You can conclude, however, that there is not enough sample evidence to
contradict the null hypothesis beta0 = 0 due to the magnitude of the
p-value.

> Call:
> lm(formula = N ~ N_alt - 1)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -110.11  -36.35  -22.13   38.59  123.23
>
> Coefficients:
>      Estimate Std. Error t value Pr(>|t|)
> N_alt 0.292046   0.007742   37.72   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 55.41 on 21 degrees of freedom
>  (16 observations deleted due to missingness)
> Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848
> F-statistic:  1423 on 1 and 21 DF,  p-value: < 2.2e-16
>
> 1. Is my interpretation correct?
>2. Is it possible that just by forcing the intercept to become zero,
a> bad regression becomes an extremely good one?
>No.
> 3. Why doesn't lm suggest a value of zero (or near zero) by itself if
> the regression is so much better with it?
>Because computer programs don't read minds. You may want a zero intercept;
someone else may not. And your perception that the 'regression is so much
better' with a zero intercept is in error.

If you plotted your data, you would realize that whether you fit the
'best'
least squares model or one with a zero intercept, the fit is not going to be
very good, and you would have deduced that the 0.985 R^2 returned from the
no-intercept model is an illusion. It is mathematically correct, however,
given the linear model theory behind it and the definition of R^2 as the
ratio of the model sum of squares (SS) to the total SS. If you want to have
more fun, sum the residuals from the zero-intercept fit, and then ask
yourself why they don't add to zero.

You need to educate yourself on the difference between regression with and
without intercepts. In particular, the R^2 in the with-intercept model uses
mean corrections before computing sums of squares; in the no-intercept
model, mean corrections are not applied. Since R^2 is a ratio of sums of
squares, this distinction matters. (If my use of 'mean correction' is
confusing, Y is not mean-corrected, but Y - Ybar is. Ditto for X.)

Try this:
plot(N_alt, N, pch = 16)
abline(coef(lm(N ~ N_alt)))
abline(c(0, coef(lm(N ~ N_alt + 0))), lty = 'dashed')

Do the data cluster tightly around the dashed line?

HTH,
Dennis

PS: A Google search on 'linear regression zero intercept' might be
beneficial. Here are a couple of hits from such a search:
http://www.bios.unc.edu/~truong/b663/pdf/noint.pdf
http://tltc.ttu.edu/cs/colleges__schools/rawls_college_of_business/f/42/p/288/470.aspx

Please excuse my ignorance.>
> Jan Rheinländer
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jay Emerson

2011-Feb-18 13:02 UTC

head link

[R] lm without intercept

No, this is a cute problem, though: the definition of R^2 changes
without the intercept, because the
"empty" model used for calculating the total sums of squares is always
predicting 0 (so the total sums
of squares are sums of squares of the observations themselves, without
centering around the sample
mean).

Your interpretation of the p-value for the intercept in the first
model is also backwards: 0.9535 is extremely
weak evidence against the hypothesis that the intercept is 0.  That
is, the intercept might be near zero, but
could also be something veru different.  With a standard error of 229,
your 95% confidence interval
for the intercept (if you trusted it based on other things) would have
a margin of error of well over 400.  If you
told me that an intercept of, say 350 or 400 were consistent with your
knowledge of the problem, I wouldn't
blink.

This is a very small data set: if you sent an R command such as:

x <- c(x1, x2, ..., xn)
y <- c(y1, y2, ..., yn)

you might even get some more interesting feedback.  One of the many
good intro stats textbooks might
also be helpful as you get up to speed.

Jay
---------------------------------------------
Original post:

Message: 135
Date: Fri, 18 Feb 2011 11:49:41 +0100
From: Jan <jrheinlaender at gmx.de>
To: "R-help at r-project.org list" <r-help at r-project.org>
Subject: [R] lm without intercept
Message-ID: <1298026181.2847.19.camel at jan-laptop>
Content-Type: text/plain; charset="UTF-8"

Hi,

I am not a statistics expert, so I have this question. A linear model
gives me the following summary:

Call:
lm(formula = N ~ N_alt)

Residuals:
   Min      1Q  Median      3Q     Max
-110.30  -35.80  -22.77   38.07  122.76

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept)  13.5177   229.0764   0.059   0.9535
N_alt         0.2832     0.1501   1.886   0.0739 .
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 56.77 on 20 degrees of freedom
 (16 observations deleted due to missingness)
Multiple R-squared: 0.151, Adjusted R-squared: 0.1086
F-statistic: 3.558 on 1 and 20 DF,  p-value: 0.07386

The regression is not very good (high p-value, low R-squared).
The Pr value for the intercept seems to indicate that it is zero with a
very high probability (95.35%). So I repeat the regression forcing the
intercept to zero:

Call:
lm(formula = N ~ N_alt - 1)

Residuals:
   Min      1Q  Median      3Q     Max
-110.11  -36.35  -22.13   38.59  123.23

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
N_alt 0.292046   0.007742   37.72   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 55.41 on 21 degrees of freedom
 (16 observations deleted due to missingness)
Multiple R-squared: 0.9855, Adjusted R-squared: 0.9848
F-statistic:  1423 on 1 and 21 DF,  p-value: < 2.2e-16

1. Is my interpretation correct?
2. Is it possible that just by forcing the intercept to become zero, a
bad regression becomes an extremely good one?
3. Why doesn't lm suggest a value of zero (or near zero) by itself if
the regression is so much better with it?

Please excuse my ignorance.

Jan Rheinl?nder


-- 
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

Maybe Matching Threads

Search for more maybe matching threads

R help - Feb 2011 - lm without intercept

[R] lm without intercept

[R] lm without intercept

[R] lm without intercept

[R] lm without intercept

Maybe Matching Threads