On Wed, 2011-04-27 at 12:00 +0200, Peter Dalgaard wrote:> Er... No, I don't think Paul is being particularly rude here (and he
> has been doing us some substantial favors in the past, notably his
> useful Rtips page). I know the kind of functionality he is looking
> for; e.g., SAS JMP has some rather nice interactive displays of
> regression effects for which you'll need to fill in
"something" for
> the other variables.
>
> However, that being said, I agree with Duncan that we probably do not
> want to canonicalize any particular method of filling in
"average"
> values for data frame variables. Whatever you do will be statistically
> dubious (in particular, using the mode of a factor variable gives me
> the creeps: Do a subgroup analysis and your "average person"
switches
> from male to female?), so I think it is one of those cases where it is
> best to provide mechanism, not policy.
>
I agree with Peter. There are two tasks in newdata: deciding what the
default reference levels should be, and building the data frame with
those levels. It's the first part that is hard. For survival curves
from a Cox model the historical default has been to use the mean of each
covariate, which can be awful (sex coded as 0/1 leads to prediction for
a hermaphrodite?). Nevertheless, I've not been able to think of a
strategy that would give sensible answers for most of the data I use and
coxph retains the flawed default for lack of a better idea. When
teaching a class on this, I tell listeners "bite the bullet" and build
the newdata that makes clinical sense, because package defaults are
always unwise for some of the variables. How can a package possibly
know that it should use bilirubin=1.0 (upper limit of normal) and AST 45 when
the data set is one of my liver transplant studies?
Frank Harrell would argue that his "sometimes misguided" default in
cph is better than the "almost always wrong" one in coxph though, and
there is certainly some strength in that position.
Terry Therneau