Martin,
? There are a couple of issues with [.terms that have bitten my survival code.?
At the
useR conference I promised you a detailed (readable) explanation, and have been
lax in
getting it to you. The error was first pointed out in a bugzilla note from 2016,
by the
way.? The current survival code works around these.
Consider the following formula:
<<testform>>library(survival)? # only to get access to the lung data
set
test <- Surv(time, status) ~? age + offset(ph.ecog) + strata(inst)
tform <- terms(test, specials="strata")
mf <- model.frame(tform, data=lung)
mterm <- terms(mf)
@
The strata term is handled in a special way by coxph, and then needs to be
removed from
the model formula before calling model.matrix.
To do this the code uses essentially the following, which fails for the formula
above.
<<strata>>strata <- attr(mterm, "specials")$strata -
attr(mterm, "response")
X <- model.matrix(mterm[-strata], mf)
@
The root problem is the need for multiple subscripts.
\begin{itemize}
? \item The formula itself has length 5, with `~' as the first element
? \item The variables and predvars attributes are call objects, each a list()
with 4
elments: the response and all 3 predictors
? \item The term.labels attribute omits the resonse and the offset, so has?
length 2
? \item The factors attribute has 4 rows and 2 columns
? \item The dataClasses attribute is a character vector of length 4
\end{itemize}
So the ideal result of? mterm[remove the specials] would use subscript of
\begin{itemize}
? \item [-5] on the formula itself, variables and predvars attributes
? \item [-2] for term.labels
? \item [-4 , -2, drop=FALSE] for factor attribute
? \item [-2] for order attribute
? \item [-4] for the dataClasses attribute
\end{itemize}
That will recreate the formula that ``would have been'' had there been
no strata term.?
Now look at the first portion of the code in models.R
<<>>`[.terms` <- function (termobj, i)
{
??? resp <- if (attr(termobj, "response")) termobj[[2L]]
??? newformula <- attr(termobj, "term.labels")[i]
??? if (length(newformula) == 0L) newformula <- "1"
??? newformula <- reformulate(newformula, resp, attr(termobj,
"intercept"),
environment(termobj))
??? result <- terms(newformula, specials = names(attr(termobj,
"specials")))
??? # Edit the optional attributes
}
@
The use of reformulate() is a nice trick.? However, the index reported in the
specials
attribute is generated with reference to the variables
attribute, or equivalently the row names of the factors attribute, not with
respect to the
term.labels attribute. For consistency the second line should instead be
<<>>newformula <- row.names(attr(termobj,
"factors"))[i]
@
Of course, this will break code for anyone else who has used [.terms and, like
me, has
been adjusting for the ``response is counted in specials but
not in term.labels'' feature.? R core will have to discuss/decide what
is the right thing
to do, and I'll adapt.
The reformulate trick breaks in another way, one that only appeared on my radar
this week
via a formula like the following.
<<form2>>Surv(time, status) ~ age + (sex=='male') +
strata(inst)
@
In both the term.labels attribute and the row/col names of the factors attribute
the
parentheses disappear, and the result of the reformulate call is not a proper
formula.?
The + binds tighter than == leading to an error message that will confuse most
users. We
can argue, and I probably would, that the user should have used
I(sex=='male').? But they
didn't, and without the I() it is a legal formula, or at least one that
currently works.?
Fixing this issue is a lot harder.
An offset term causes issues in the 'Edit the optional attributes' part
of the routine as
well.? If you and/or R core will tell me what you think
the code should do, I'll create a patch.? My vote would be to use rownames
per the above
and ignore the () edge case.
The same basic code appears in drop.terms, by the way.
Terry T.
[[alternative HTML version deleted]]