thr3ads.net - R help - [R] Don't understand removing constant on 1-way ANOVA [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Brian McGill

2008-Feb-07 22:56 UTC

[R] Don't understand removing constant on 1-way ANOVA

I am playing with the a 1-way anova with and without the "-1" option.

I have a simple cooked up example below but it behaves the same on a more 
complex real example.

 From what I can tell:
1) the estimated means of the different levels are correctly estimated 
either way (although reported as means with the -1 and as contrasts without 
the -1 as expected)
2) the residuals are identical (in this contrived example they differ 
slightly due to numeric instability but in a more real-world example they 
truly are identical)
3) BUT the r2/F/p-value are different (in my real-world example they are 
drastically different)

How can a model that gets the same parameter estimates on the same data 
leading to the same residuals get different r2/F/p-value?

I suspect it depends on the difference in the model.matrix (see below) but 
this just confused me how it got the same parameter estimates without 
really clearing up why the r2's are different.

Any help on this is greatly appreciated!

 > x<-as.factor(c(1,1,1,2,2,2))
 > y<-c(1.1,1.0,0.9,2.0,2.1,1.9)
 > summary(lm(y~x))

Call:
lm(formula = y ~ x)

Residuals:
          1          2          3          4          5          6
  1.000e-01 -4.980e-16 -1.000e-01  8.538e-18  1.000e-01 -1.000e-01

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.00000    0.05774   17.32 6.52e-05 ***
x2           1.00000    0.08165   12.25 0.000255 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Residual standard error: 0.1 on 4 degrees of freedom
Multiple R-Squared: 0.974,      Adjusted R-squared: 0.9675
F-statistic:   150 on 1 and 4 DF,  p-value: 0.0002552

 > summary(lm(y~x-1))

Call:
lm(formula = y ~ x - 1)

Residuals:
          1          2          3          4          5          6
  1.000e-01  5.027e-16 -1.000e-01  4.405e-20  1.000e-01 -1.000e-01

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
x1  1.00000    0.05774   17.32 6.52e-05 ***
x2  2.00000    0.05774   34.64 4.14e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Residual standard error: 0.1 on 4 degrees of freedom
Multiple R-Squared: 0.9973,     Adjusted R-squared: 0.996
F-statistic:   750 on 2 and 4 DF,  p-value: 7.073e-06

 > m2nc=lm(y~x-1)
 > m2wc=lm(y~x)
 > resid(m2nc)
             1             2             3             4             5 
        6
  1.000000e-01  5.026734e-16 -1.000000e-01  4.404571e-20  1.000000e-01 
-1.000000e-01
 > resid(m2wc)
             1             2             3             4             5 
        6
  1.000000e-01 -4.980012e-16 -1.000000e-01  8.538092e-18  1.000000e-01 
-1.000000e-01
 > model.matrix(m2nc)
   x1 x2
1  1  0
2  1  0
3  1  0
4  0  1
5  0  1
6  0  1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"

 > model.matrix(m2wc)
   (Intercept) x2
1           1  0
2           1  0
3           1  0
4           1  1
5           1  1
6           1  1
attr(,"assign")
[1] 0 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"




Brian McGill
Dept of Biology
McGill University
Stewart Biology Bldg
1205 ave Docteur Penfield
Montreal, QC H3A 1B1
CANADA

(514) 398-6417

http://www.brianmcgill.org
mail@brianmcgill.org

	[[alternative HTML version deleted]]

Berwin A Turlach

2008-Feb-08 06:08 UTC

head link

[R] Don't understand removing constant on 1-way ANOVA

G'day Brian,

On Thu, 07 Feb 2008 17:56:07 -0500
Brian McGill <brian.mcgill at mcgill.ca> wrote:
> I am playing with the a 1-way anova with and without the "-1"
option.
> 
> [...]
> 
>  From what I can tell:
> 1) the estimated means of the different levels are correctly
> estimated either way (although reported as means with the -1 and as
> contrasts without the -1 as expected)
> 2) the residuals are identical (in this contrived example they differ 
> slightly due to numeric instability but in a more real-world example
> they truly are identical)
> 3) BUT the r2/F/p-value are different (in my real-world example they
> are drastically different)
> 
> How can a model that gets the same parameter estimates on the same
> data leading to the same residuals get different r2/F/p-value?
In an R session, type "help(summary.lm)" and, under the section
"Value", read about the way r.squared is calculated.

Note that if your model specifies a "-1" (or a "+0"), then
the model is
assumed to have no intercept.  Or, rather, that the null model (i.e.
the model when all covariates are removed) is a response of 0 + noise.

This should explain the differences that you see.

R does not check whether a vector of 1s is in the column space of the
design matrix and, of course, does not base the decision on whether the
model has an intercept or not (i.e. whether the null model should be
"mu + noise" or "0 + noise) on this not-performed test.

It is a design issue, presumably to be compatible to S and, obviously,
it also makes implementation of summary.lm easier, I guess. :)

Occasionally, I use "-1" in the formula to construct a specific design
matrix and, hence, get estimates and standard errors for quantities
that are of interest.  It is a bit annoying to have to refit the model
with out the "-1" to get the r2/F that I want, but one has to live
with it.  I would prefer that the decision on whether the null model
has an intercept or not would be based on whether a vector of 1s is in
the column space of the design matrix and not on whether the formula
has a "-1" on it.  But I can also see the argument for the alternative
preference.

But I just noticed, if you so wish, you could be subversive:
> fm1<-lm(y~x-1)
> attr(fm1$terms, "intercept") <- 1
> summary(fm1)
Call:
lm(formula = y ~ x - 1)

Residuals:
         1          2          3          4          5          6 
 1.000e-01  3.608e-16 -1.000e-01 -3.469e-18  1.000e-01 -1.000e-01 

Coefficients:
   Estimate Std. Error t value Pr(>|t|)    
x1  1.00000    0.05774   17.32 6.52e-05 ***
x2  2.00000    0.05774   34.64 4.14e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

Residual standard error: 0.1 on 4 degrees of freedom
Multiple R-Squared: 0.974,	Adjusted R-squared: 0.9675 
F-statistic:   150 on 1 and 4 DF,  p-value: 0.0002552 

Hope this helps.

Cheers,

	Berwin

=========================== Full address ============================Berwin A
Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Feb 2008 - Don't understand removing constant on 1-way ANOVA

[R] Don't understand removing constant on 1-way ANOVA

[R] Don't understand removing constant on 1-way ANOVA

Possibly Parallel Threads