thr3ads.net - R help - [R] Some questions about R's modelling algebra [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Hadley Wickham

2010-Jul-02 13:59 UTC

[R] Some questions about R's modelling algebra

Hi all,

In preparation for teaching a class next week, I've been reviewing R's
standard modelling algebra. I've used it for a long time and have a
pretty good intuitive feel for how it works, but would like to
understand more of the technical details. The best (online) reference
I've found so far is the section in "An Introduction to R"
(http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models).
Does anyone have any other suggestions?

I have a few questions about the definitions given in "An Introduction to
R":

 * "M_1 : M_2 - The tensor product of M_1 and M_2. If both terms are
factors, then the ?subclasses? factor."

   From my reading, the usual interpretation of a tensor product when
x and y are vectors is the outer product.  I don't see how that would
work here - how does a matrix work as an predictor in a linear model?
In what sense is the tensor product of x with itself equal to x?

  What is the subclasses factor? Is it interaction(M_1, M_2, sep =
"")?

 * "M_1 %in% M_2 - Similar to M_1:M_2, but with a different coding."

  How is the coding different?

  Where is %in% documented within R?  I'm pretty sure it's a different
action to ?"%in%, and it's not mentioned in ?formula

I have also read G.?N. Wilkinson and C.?E. Rogers. Symbolic
descriptions of factorial models for analysis of variance. Journal of
the Royal Statistical Society. Series C (Applied Statistics),
22:392?399, 1973. - Can anyone comment on any important differences to
R's modelling algebra? What does %in% correspond to in Wilkinson and
Rogers' framework?

Thanks!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

S Ellison

2010-Jul-02 14:13 UTC

head link

[R] Some questions about R's modelling algebra

>>> Hadley Wickham <hadley at rice.edu> 02/07/2010 14:59:53
>>>
> Where is %in% documented within R?  I'm pretty sure it's a
different
>action to ?"%in%, and it's not mentioned in ?formula
?formula in R 2.9.2 says in para 2:
"The %in% operator indicates that the terms on its left are nested
within those on the right. For example a + b %in% a expands to the
formula a + a:b. "



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

RICHARD M. HEIBERGER

2010-Jul-02 15:27 UTC

head link

[R] Some questions about R's modelling algebra

Hadley,

The S language modeling language was designed with Wilkinson and
Rogers in mind.  The notation was changed from their paper to
retain consistency with the parsing rules for ordinary algebra in
S.  I think of ":" as an indicator of an indexing system into the
dummy variables.  It is not an indicator of degrees of freedom.

For simplicity in notation, let A be a factor with a levels and B
be a factor with b levels.  Then A:B implies a set of dummy
variables with at most ab columns indexed by an A level and a B
level.  The degrees of freedom associated with A:B depends on the
linear dependencies of the associated dummy variables with the
dummy variables of other terms in the model.  The excess columns
can be suppressed when the dummy variables are generated or they
can be pivoted out during the analysis.  When we have the special
case A:A, there is only one factor mentioned, so the indexing
scheme is based on just the one factor.  You could generate the
full set of a^2 columns, and then you would discover that they
are all linearly dependent on the first a.


The columns can be labeled either
a1b1 a1b2 a1b3 a2b1 a2b2 a2b3
or
a1b1 a2b1 a1b2 a2b2 a1b3 a2b3

If there is crossing, we would report the a single sum of squares
and degrees of freedom for the interaction.  If there is nesting,
say a/b , then it might make sense to group the dummy variables
say (a1b1 a1b2 a1b3) and (a2b1 a2b2 a2b3) and report simple
effects sum of squares and degrees of freedom for each of the
groups.
The structure of the individual columns depends on the set of
contrasts used for the A and B factors.

Rich

	[[alternative HTML version deleted]]

Thomas Lumley

2010-Jul-02 16:19 UTC

head link

[R] Some questions about R's modelling algebra

On Fri, 2 Jul 2010, Hadley Wickham wrote:
> Hi all,
>
> In preparation for teaching a class next week, I've been reviewing
R's
> standard modelling algebra. I've used it for a long time and have a
> pretty good intuitive feel for how it works, but would like to
> understand more of the technical details. The best (online) reference
> I've found so far is the section in "An Introduction to R"
>
(http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models).
> Does anyone have any other suggestions?
>
> I have a few questions about the definitions given in "An Introduction
to R":
>
> * "M_1 : M_2 - The tensor product of M_1 and M_2. If both terms are
> factors, then the ?subclasses? factor."
>
>   From my reading, the usual interpretation of a tensor product when
> x and y are vectors is the outer product.  I don't see how that would
> work here - how does a matrix work as an predictor in a linear model?
Think of it for a single observation.  x and y specify terms that could be
scalars or could be row vectors (eg ns(x), poly(y,3)), and the terms
in x:y are the products of each term from x with each term from y.    Like
taking the Kronecker product and then reshaping it back into a row vector.

> In what sense is the tensor product of x with itself equal to x?
This is the messy bit.  The 'product'  operator is not the arithmetic
product, because x:x is not the same as x:z even if z=x.

The product of a set of single-column terms is formed by  eliminating any terms
from the set that are syntactically duplicates and then taking the arithmetic
product of the remaining terms.  This is the Right Thing for producing design
matrices, but is a bit of a mess to describe.

So  x:z:log(z) contains no duplicates and produces x*z*log(z).  x:z contains no
duplicates and produces x*z (even if z=x), but x:z:x produces x*z and x:x
produces x.

>  What is the subclasses factor? Is it interaction(M_1, M_2, sep =
"")?
Yes.


You might find the Wilkinson & Rogers paper more helpful:

@Article{Wilkinson.Rogers.73,
   author       = "G. N. Wilkinson and C. E. Rogers",
   title        = "Symbolic description of factorial models for analysis of
                   variance",
   journal      = "Applied Statistics",
   volume       = "22",
   pages        = "392--399",
   year         = "1973",
   comment      = "Reference from MASS",
}

The notation is slightly different; R uses ':' for their '.' and
'^' for their '**'.  I think the algebra is the same.

     -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

Adrian Waddell

2010-Jul-02 23:38 UTC

head link

[R] Some questions about R's modelling algebra

> Hadley Wickham <hadley <at> rice.edu> 
> 
>   Where is %in% documented within R?  I'm pretty sure it's a
different
> action to ?"%in%, and it's not mentioned in ?formula
You find the documentation for operators like <-, %in%, if, etc by putting
the operators between 
qoutes

?"%in%"
?"<-"
?"if"

Regards,

Adrian

Reasonably Related Threads

Search for more seemingly similar threads

R help - Jul 2010 - Some questions about R's modelling algebra

[R] Some questions about R's modelling algebra

[R] Some questions about R's modelling algebra

[R] Some questions about R's modelling algebra

[R] Some questions about R's modelling algebra

[R] Some questions about R's modelling algebra

Reasonably Related Threads