thr3ads.net - R help - [R] Playing with formulae [Sep 2003]

If this information is useful, please help other people find it:
Share via:

Ross Boylan

2003-Sep-12 22:47 UTC

[R] Playing with formulae

First, thanks to everyone for their responses to my programming style
question.  Second, I have some questions about some obscure corners of
the language.

Let
f <- y~x+z
t <- terms(f).

I want to do some manipulations of the formula that require getting the
names of variables as character strings (e.g., for indexing into a
dataset).  However, t, or even attr(t, "variables"), does not provide
character strings.

1. Does all.vars(f) reliably produce the same ordering as t?

2. Can objects of class name (which I notice appear in places in t) be
used the same way as character strings (e.g., indexing columns in a data
set, arguments to match)?  (This would matter if I could pull t apart
reliably.  I can't.  See 3b for more on that problem.)

3. t's response attribute is said to be an index of the response
variable in variables (I presume this means the variables attribute).
  a) Will all.vars(f)[attr(t, "response")] reliably get me the
character
     string for the name of the response variable?
  b) How can I get the response variable out of the "variables"
     attribute?  In my example,
     response is 1, but attr(t, "variables")[1] is list().
     Possible answer: attr(t, "variables")[[response+1]] looks right,
     and is of class name.  Hence the interest in question 2.

4. Is the actual number of coefficients the model will need
length(attr(t,"term.labels"))+attr(t, "intercept"),
regardless of interactions or I() terms?
(My interest is primarily in detecting the wrong number of terms, for
example if someone specifies an interaction).

5. The documentation for terms.formula appears to imply that if there is
a simple formula without interactions I will get coefficient estimates
in the same order that the original formula specified textually.  Right?
I'm concerned about this because I'm having a vector of simulation
coefficients passed in along with the formula, and I need to be sure
they line up with the model terms.

I know that's a lot of pretty detailed questions, but if you can offer
any help that would be great.  I've tried some simple tests that seem to
work, but of course those don't prove that the assumptions always hold. 
And the documentation does not seem to resolve the issues either.

I'm using R 1.7.1, but ideally my code will not be version-specific.

Thomas Lumley

2003-Sep-12 23:12 UTC

head link

[R] Playing with formulae

On Fri, 12 Sep 2003, Ross Boylan wrote:
> First, thanks to everyone for their responses to my programming style
> question.  Second, I have some questions about some obscure corners of
> the language.
>
> Let
> f <- y~x+z
> t <- terms(f).
>
> I want to do some manipulations of the formula that require getting the
> names of variables as character strings (e.g., for indexing into a
> dataset).  However, t, or even attr(t, "variables"), does not
provide
> character strings.
>
> 1. Does all.vars(f) reliably produce the same ordering as t?
It doesn't even produce the same set: consider

y~I(x+z)+w
> 2. Can objects of class name (which I notice appear in places in t) be
> used the same way as character strings (e.g., indexing columns in a data
> set, arguments to match)?  (This would matter if I could pull t apart
> reliably.  I can't.  See 3b for more on that problem.)
No, but as.character will convert them to strings.
> 3. t's response attribute is said to be an index of the response
> variable in variables (I presume this means the variables attribute).
>   a) Will all.vars(f)[attr(t, "response")] reliably get me the
character
>      string for the name of the response variable?
No.  consider
Surv(t,s)~x+z
>   b) How can I get the response variable out of the "variables"
>      attribute?  In my example,
>      response is 1, but attr(t, "variables")[1] is list().
>      Possible answer: attr(t, "variables")[[response+1]] looks
right,
>      and is of class name.  Hence the interest in question 2.
The "factors" attribute has row names corresponding to variables and
column names corresponding to terms.
> 4. Is the actual number of coefficients the model will need
> length(attr(t,"term.labels"))+attr(t, "intercept"),
> regardless of interactions or I() terms?
No. A term can create multiple columns of the design matrix, eg factors,
polynomials, splines.  You won't know how many until you call
model.matrix.
> 5. The documentation for terms.formula appears to imply that if there is
> a simple formula without interactions I will get coefficient estimates
> in the same order that the original formula specified textually.  Right?
> I'm concerned about this because I'm having a vector of simulation
> coefficients passed in along with the formula, and I need to be sure
> they line up with the model terms.
Yes.

It might be useful to have names on the coefficients,  though.  Then you
could match on the names and not worry


An example of the sort of thing you're trying to do is in
untangle.specials() in the survival package, which is used to locate terms
and variables for strata() and cluster() in coxph().  It uses the dimnames
of the "factors" attribute as keys.

	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Sep 2003 - Playing with formulae

[R] Playing with formulae

[R] Playing with formulae

Seemingly Similar Threads