thr3ads.net - R help - [R] Newbie help with ANOVA and lm. [Feb 2010]

If this information is useful, please help other people find it:
Share via:

rkevinburton at charter.net

2010-Feb-27 15:53 UTC

[R] Newbie help with ANOVA and lm.

Would someone be so kind as to explain in English what the ANOVA code (anova.lm)
is doing? I am having a hard time reconciling what the text books have as a
brute force regression and the formula algorithm in 'R'. Specifically I
see:

    p <- object$rank
    if (p > 0L) {
        p1 <- 1L:p
        comp <- object$effects[p1]
        asgn <- object$assign[object$qr$pivot][p1]
        nmeffects <- c("(Intercept)", attr(object$terms,
"term.labels"))
        tlabels <- nmeffects[1 + unique(asgn)]
        ss <- c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
        df <- c(unlist(lapply(split(asgn, asgn), length)), dfr)
    }
    else {
        ss <- ssr
        df <- dfr
        tlabels <- character(0L)
    }
    ms <- ss/df
    f <- ms/(ssr/dfr)
    P <- pf(f, df, dfr, lower.tail = FALSE)
 

I think I understand the check for 'p' being non-zero. 'p' is
essentially the number of terms in the model matrix (including the intercept
term if it exists). So in a mathematical description of a regression that
included the intercept and one term (like dist ~ speed) you would have a model
matrix of a column of '1's and then a column of data. The
'assign' would be a vector containing [0,1]. So then in finding the
degrees of freedom you split the asssign matrix with itself. I am having a hard
time seeing that this ever produces degrees of freedom that are different. So I
get that the vector 'df' would always be something like [2,2,dfr]. But
that is obviously wrong. Would someone care to elighten me on what the code
above is doing?

Thank you.

Kevin

Peter Ehlers

2010-Feb-27 16:55 UTC

head link

[R] Newbie help with ANOVA and lm.

On 2010-02-27 8:53, rkevinburton at charter.net wrote:> Would someone be so kind as to explain in English what the ANOVA code
(anova.lm) is doing? I am having a hard time reconciling what the text books
have as a brute force regression and the formula algorithm in 'R'.
Specifically I see:
>
>      p<- object$rank
>      if (p>  0L) {
>          p1<- 1L:p
>          comp<- object$effects[p1]
>          asgn<- object$assign[object$qr$pivot][p1]
>          nmeffects<- c("(Intercept)", attr(object$terms,
"term.labels"))
>          tlabels<- nmeffects[1 + unique(asgn)]
>          ss<- c(unlist(lapply(split(comp^2, asgn), sum)), ssr)
>          df<- c(unlist(lapply(split(asgn, asgn), length)), dfr)
>      }
>      else {
>          ss<- ssr
>          df<- dfr
>          tlabels<- character(0L)
>      }
>      ms<- ss/df
>      f<- ms/(ssr/dfr)
>      P<- pf(f, df, dfr, lower.tail = FALSE)
>
>
> I think I understand the check for 'p' being non-zero. 'p'
is essentially the number of terms in the model matrix (including the intercept
term if it exists). So in a mathematical description of a regression that
included the intercept and one term (like dist ~ speed) you would have a model
matrix of a column of '1's and then a column of data. The
'assign' would be a vector containing [0,1]. So then in finding the
degrees of freedom you split the asssign matrix with itself. I am having a hard
time seeing that this ever produces degrees of freedom that are different. So I
get that the vector 'df' would always be something like [2,2,dfr]. But
that is obviously wrong. Would someone care to elighten me on what the code
above is doing?
>
split(asgn, asgn) splits the vector (not matrix) 'asgn' into
list components. Then lapply() applies length() to each list
component which gives the associated degrees of freedom.
unlist() removes the list structure, producing a vector of dfs.
For simple regression, this results in c(1,1). The residual
dfs are then tacked on to give the df-vector df=c(1,1,dfr).
For models with an intercept the first component of df should
always be 1. But this is discarded in the output matrix.

With two numerical predictors: y ~ x1 + x2,
you should find that asgn = c(0,1,2) leading to df = c(1,1,1,dfr).

   -Peter Ehlers
> Thank you.
>
> Kevin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
-- 
Peter Ehlers
University of Calgary

Maybe Matching Threads

Search for more reasonably related threads

R help - Feb 2010 - Newbie help with ANOVA and lm.

[R] Newbie help with ANOVA and lm.

[R] Newbie help with ANOVA and lm.

Maybe Matching Threads