Therneau, Terry M., Ph.D.
2024-Dec-16 03:08 UTC
[Rd] Changes in the survival package (long)
The latest version of the survival package has two important additions. In
prior code the call
coxph(Surv(time, status) ~ age + strata(inst), data=lung)
could fail if a version of either Surv() or strata() existed elsewhere on the
search path; the wrong function could be picked up. Second, a model with
survival::strata(inst) in the formula would not do what users expect. These have
now been addressed.
1. For the coxph, concordance, survcheck, survfit, and survdiff modeling
functions, any Surv(), strata(), cluster(), psline(), tt(), ridge(), frailty(),
frailty.gamma(), frailty.gauss(), or frailty.t() model terms will use the
versions from the survival namespace, automatically. A survival:: modifier is
not required.
2. Any use of survival::strata(), survival::cluster(), survival::pspline(), ...
(all but survival::Surv) will have the 'survival::' modifier removed
before the formula is evaluated.
3. An exception to 1 above are models with a function/variable conflict such as
Surv(time, status) ~ x + strata + strata(group) (found in the Epi package) or
Surv(time, status) ~ x + strata(strata) (found in epiDisplay); I simply
haven't thought of an automatic way to pick strata() from the survival
namespace, and the strata variable locally, without stepping on my own feet. In
that same formula Surv would still be found in the survival namespace, however.
4. The call component of the relevant coxph, survfit, etc result is not
changed, it still documents what you actually typed. (Updating it to show
"what was run" breaks a half dozen of the 800+ CRAN packages which
make use of survival).
This has no bearing on any discussion of whether to use coxph or
survival::coxph as the primary call. Only formulas are affected by this change.
One motivation for this is that use of :: messes up formula processing, in
particular the "specials" argument of model.frame, something the
survival package depends on. For instance, in versions prior to survival3.8-1
the following two calls lead to completely different models; in the second
strata is not recognized. I urge people to try this and see just how different
the two models are.
coxph(Surv(time, status) ~ age + strata(inst), data=lung)
coxph(Surv(time, status) ~ age + survival::strata(inst), data =
lung)
Another false variant of the second model is
temp <- survival::strata(lung$inst)
coxph(Surv(time, status) ~ age + temp, data=lung)
The number in instances of these incorrect forms in the set of reverse
dependencies was sobering. I suspect the error is even more prevalent in user
code. This update automatically fixes the first but not the second.
A second motivation was that many users, and this includes myself up to a
couple months ago, may assume that any formulas in a coxph call would
automatically resolve first in the survival namespace, in the same way that
internal calls from coxph to subsidary functions like coxph.fit() or
residuals.coxph() do, without having to resort to use of the double colon. But
formulas are evaluated by model.frame(), outside the survival namespace. To see
this do the following simple test (pre version 3.8-2)
Surv <- function(x) x^2 # silly replacement
coxph(Surv(time, status) ~ age + strata(inst), lung) # fails with an
error
I can't be responsible for many of the mistakes user's make with the
survival packages, but I want to prevent these two obvious ones.
As further explanation, we have to realize that formulas are special, i.e. many
of the symbols in a formula mean something else than simple mathematics would
give. We are all familiar with this for the *, / and : symbols: y ~ x1 * x2
does not fit a single predictor which is x1 times x2, and y ~ x1 + age/10 is not
a successful approach for adding "age in decades" as a predictor.
These special cases are based on pattern recognition in the formula parser. If
you instead wrote y ~ base::"*"(x1, x2), it won't produce the
same result. What many (perhaps most?) users fail to realize is that there are
other "specials" that are recognized in exactly the same way, i.e., by
pattern recognition, but which happen to look like a function call. This
include offset() in lm and glm, strata() in the survival package, s() in the gam
package, etc. Qualifying any of these with :: mucks up the recognition process.
For instance
glm(status ~ x + offset(log(time)), family=poisson, data= aml)
glm(status ~ x + stats::offset(log(time)), family= poisson,
data=aml)
are different models. The first is a valid poisson regression, the second
something else entirely.
In the survival package the Surv() function, however, is not recognized using
the specials argument in model.frame, i.e. during parsing. Instead, Surv()
produces an object with a particular class and the modeling functions in
survival key off that class. Thus using survival::Surv() or Surv() in the
formula both work, as does creating y <- Surv(time, status) outside and
typing coxph(y ~ age + sex, lung). Since it is not an error the new code does
not strike off the survival:: part from survival::Surv() (even though this
author much prefers the shorter form). Right hand side specials do get
modified.
So why didn't I make strata() work this way as well? In 1993, when this
design was first encoded, I based my design off the examples available to me,
namely those models found in Chambers and Hastie "Statistical Models in
S". Both namespaces and :: were far, far in the future, and my crystal
ball cloudy.
As a footnote, getting this right was more subtle than I anticipated. I
introduced this in survival_3.8-0, and quickly went through 3.8-1, and 3.8-2
versions. I wish to acknowlege the CRAN maintainers for their useful input and
patience as we have worked through the reverse dependency checks. Current on
github and (hopefully soon) on CRAN.
An unaswered question is how to best document this behaviour so that users will
find it.
Terry Therneau
[[alternative HTML version deleted]]
Good day, It is impressive to see such sustained package maintenance. -------------------------------------- Dr. Dario Strbenac Bioinformatics Research Associate School of Mathematics and Statistics, University of Sydney Camperdown N.S.W. 2050, Australia