thr3ads.net - R help - [R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y)) [Jan 2011]

If this information is useful, please help other people find it:
Share via:

jochen laubrock

2011-Jan-22 02:03 UTC

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

Dear list,

the following came up in an introductory class. Please help me understand the -1
(or 0+) syntax in formulae: Why do the enumerator dfs, F-statisics etc. differ
between the models lm(y ~ x1) and lm(y ~ x0 + x1 - 1), if x0 is a vector
containing simply ones?

Example:

N  <- 40
x0 <- rep(1,N)
x1 <- 1:N
vare <- N/8
set.seed(4)
e <- rnorm(N, 0, vare^2)

X <- cbind(x0, x1)
beta <- c(.4, 1)
y <- X %*% beta + e

summary(lm(y ~ x1))
# [...]
# Residual standard error: 20.92 on 38 degrees of freedom
# Multiple R-squared: 0.1151,	Adjusted R-squared: 0.09182 
# F-statistic: 4.943 on 1 and 38 DF,  p-value: 0.03222 

summary(lm(y ~ x0 + x1 - 1))        # or summary(lm(y ~ 0 + x0 + x1))
# [...]
# Residual standard error: 20.92 on 38 degrees of freedom
# Multiple R-squared: 0.6888,	Adjusted R-squared: 0.6724
# F-statistic: 42.05 on 2 and 38 DF,  p-value: 2.338e-10 


Thanks in advance,
Jochen


----
Jochen Laubrock, Dept. of Psychology, University of Potsdam,
Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
phone: +49-331-977-2346, fax: +49-331-977-2793

David Winsemius

2011-Jan-22 03:48 UTC

head link

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

On Jan 21, 2011, at 9:03 PM, jochen laubrock wrote:
> Dear list,
>
> the following came up in an introductory class. Please help me  
> understand the -1 (or 0+) syntax in formulae: Why do the enumerator  
> dfs, F-statisics etc. differ between the models lm(y ~ x1) and lm(y  
> ~ x0 + x1 - 1), if x0 is a vector containing simply ones?
You are testing something different. In the first case you are testing  
the difference between the baseline and the second level of x1 (so  
there is only one d.f.), while in the second case you are testing for  
both of the coefficients being zero (so the numerator has 2 d.f.). It  
would be easier to see if you did print() on the fit object. The first  
model would give you an estimate for an "Intercept", which is really  
an estimate for the first level of x1.  Having been taught to think of  
anova as just a special case of regression is helpful here. Look at  
the model first  and only then look at the anova table.

>
> Example:
>
> N  <- 40
> x0 <- rep(1,N)
> x1 <- 1:N
> vare <- N/8
> set.seed(4)
> e <- rnorm(N, 0, vare^2)
>
> X <- cbind(x0, x1)
> beta <- c(.4, 1)
> y <- X %*% beta + e
>
> summary(lm(y ~ x1))
> # [...]
> # Residual standard error: 20.92 on 38 degrees of freedom
> # Multiple R-squared: 0.1151,	Adjusted R-squared: 0.09182
> # F-statistic: 4.943 on 1 and 38 DF,  p-value: 0.03222
>
> summary(lm(y ~ x0 + x1 - 1))        # or summary(lm(y ~ 0 + x0 + x1))
> # [...]
> # Residual standard error: 20.92 on 38 degrees of freedom
> # Multiple R-squared: 0.6888,	Adjusted R-squared: 0.6724
> # F-statistic: 42.05 on 2 and 38 DF,  p-value: 2.338e-10
>
>
> Thanks in advance,
> Jochen
>
>
> ----
> Jochen Laubrock, Dept. of Psychology, University of Potsdam,
> Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
> phone: +49-331-977-2346, fax: +49-331-977-2793
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Bert Gunter

2011-Jan-22 04:18 UTC

head link

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

Well ... as x1 is continuous(numeric), it has no levels. So ...??

Note that the fits are identical for both models. The issue is only
what is the Null that you are testing in the two cases. In the first
case, it is just y = constant, so you are testing the 1 df for x1. In
the second, it is y = 0 (which rarely makes any sense) and you are
testing the 2 df for the two terms (x0 and x1). Etc. etc.

-- Bert

On Fri, Jan 21, 2011 at 7:48 PM, David Winsemius <dwinsemius at
comcast.net> wrote:>
> On Jan 21, 2011, at 9:03 PM, jochen laubrock wrote:
>
>> Dear list,
>>
>> the following came up in an introductory class. Please help me
understand
>> the -1 (or 0+) syntax in formulae: Why do the enumerator dfs,
F-statisics
>> etc. differ between the models lm(y ~ x1) and lm(y ~ x0 + x1 - 1), if
x0 is
>> a vector containing simply ones?
>
> You are testing something different. In the first case you are testing the
> difference between the baseline and the second level of x1 (so there is
only
> one d.f.), while in the second case you are testing for both of the
> coefficients being zero (so the numerator has 2 d.f.). It would be easier
to
> see if you did print() on the fit object. The first model would give you an
> estimate for an "Intercept", which is really an estimate for the
first level
> of x1. ?Having been taught to think of anova as just a special case of
> regression is helpful here. Look at the model first ?and only then look at
> the anova table.
>
>
>>
>> Example:
>>
>> N ?<- 40
>> x0 <- rep(1,N)
>> x1 <- 1:N
>> vare <- N/8
>> set.seed(4)
>> e <- rnorm(N, 0, vare^2)
>>
>> X <- cbind(x0, x1)
>> beta <- c(.4, 1)
>> y <- X %*% beta + e
>>
>> summary(lm(y ~ x1))
>> # [...]
>> # Residual standard error: 20.92 on 38 degrees of freedom
>> # Multiple R-squared: 0.1151, ? Adjusted R-squared: 0.09182
>> # F-statistic: 4.943 on 1 and 38 DF, ?p-value: 0.03222
>>
>> summary(lm(y ~ x0 + x1 - 1)) ? ? ? ?# or summary(lm(y ~ 0 + x0 + x1))
>> # [...]
>> # Residual standard error: 20.92 on 38 degrees of freedom
>> # Multiple R-squared: 0.6888, ? Adjusted R-squared: 0.6724
>> # F-statistic: 42.05 on 2 and 38 DF, ?p-value: 2.338e-10
>>
>>
>> Thanks in advance,
>> Jochen
>>
>>
>> ----
>> Jochen Laubrock, Dept. of Psychology, University of Potsdam,
>> Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
>> phone: +49-331-977-2346, fax: +49-331-977-2793
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

jochen laubrock

2011-Jan-22 04:33 UTC

head link

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

Thank you all (including Dennis), this was elucidating. 

I would have (maybe naively) anticipated that in this somewhat pathological case
of fitting without an intercept and re-introducing it via constant x1, R might
check whether the design matrix includes a column of ones, and adjust the
degrees of freedom accordingly. But now I can see that by explicitly requesting
via the formula interface not to fit a constant, I am implicitly stating my
hypothesis that y==0, even if I re-introduce my suspicion that y==mu via x1
<- 1. If I understood correctly, x1 is treated as a variable in the latter
case, right?



On Jan 22, 2011, at 5:18 , Bert Gunter wrote:
> Well ... as x1 is continuous(numeric), it has no levels. So ...??
> 
> Note that the fits are identical for both models. The issue is only
> what is the Null that you are testing in the two cases. In the first
> case, it is just y = constant, so you are testing the 1 df for x1. In
> the second, it is y = 0 (which rarely makes any sense) and you are
> testing the 2 df for the two terms (x0 and x1). Etc. etc.
> 
> -- Bert
> 
> On Fri, Jan 21, 2011 at 7:48 PM, David Winsemius <dwinsemius at
comcast.net> wrote:
>> 
>> On Jan 21, 2011, at 9:03 PM, jochen laubrock wrote:
>> 
>>> Dear list,
>>> 
>>> the following came up in an introductory class. Please help me
understand
>>> the -1 (or 0+) syntax in formulae: Why do the enumerator dfs,
F-statisics
>>> etc. differ between the models lm(y ~ x1) and lm(y ~ x0 + x1 - 1),
if x0 is
>>> a vector containing simply ones?
>> 
>> You are testing something different. In the first case you are testing
the
>> difference between the baseline and the second level of x1 (so there is
only
>> one d.f.), while in the second case you are testing for both of the
>> coefficients being zero (so the numerator has 2 d.f.). It would be
easier to
>> see if you did print() on the fit object. The first model would give
you an
>> estimate for an "Intercept", which is really an estimate for
the first level
>> of x1.  Having been taught to think of anova as just a special case of
>> regression is helpful here. Look at the model first  and only then look
at
>> the anova table.
>> 
>> 
>>> 
>>> Example:
>>> 
>>> N  <- 40
>>> x0 <- rep(1,N)
>>> x1 <- 1:N
>>> vare <- N/8
>>> set.seed(4)
>>> e <- rnorm(N, 0, vare^2)
>>> 
>>> X <- cbind(x0, x1)
>>> beta <- c(.4, 1)
>>> y <- X %*% beta + e
>>> 
>>> summary(lm(y ~ x1))
>>> # [...]
>>> # Residual standard error: 20.92 on 38 degrees of freedom
>>> # Multiple R-squared: 0.1151,   Adjusted R-squared: 0.09182
>>> # F-statistic: 4.943 on 1 and 38 DF,  p-value: 0.03222
>>> 
>>> summary(lm(y ~ x0 + x1 - 1))        # or summary(lm(y ~ 0 + x0 +
x1))
>>> # [...]
>>> # Residual standard error: 20.92 on 38 degrees of freedom
>>> # Multiple R-squared: 0.6888,   Adjusted R-squared: 0.6724
>>> # F-statistic: 42.05 on 2 and 38 DF,  p-value: 2.338e-10
>>> 
>>> 
>>> Thanks in advance,
>>> Jochen
>>> 
>>> 
>>> ----
>>> Jochen Laubrock, Dept. of Psychology, University of Potsdam,
>>> Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
>>> phone: +49-331-977-2346, fax: +49-331-977-2793
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius, MD
>> West Hartford, CT
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 467-7374
> http://devo.gene.com/groups/devo/depts/ncb/home.shtml
----
Jochen Laubrock, Dept. of Psychology, University of Potsdam,
Karl-Liebknecht-Strasse 24-25, 14476 Potsdam, Germany
phone: +49-331-977-2346, fax: +49-331-977-2793

R help - Jan 2011 - lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))

[R] lm(y ~ x1) vs. lm(y ~ x0 + x1 - 1) with x0 <- rep(1, length(y))