thr3ads.net - R help - [R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term? [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Jürgen Biedermann

2012-Jun-16 13:04 UTC

[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...

I don't find a solution to use the "survfit" function (package: 
survival)  for a defined pattern of covariates with a Cox-Model 
including a time dependent interaction term. Somehow the definition of 
my "newdata" argument seems to be erroneous.
I already googled the problem, found many persons having the same or a 
similar problem, but still no solution.
I want to stress that my time-dependent covariate does not depend on the 
failure of an individual (in this case it doesn't seem sensible to 
predict a survivor function for an individual). Rather one of my effects 
declines with time (time-dependent coefficient).

For illustration, I use the example of John Fox's paper "Cox 
Proportional - Hazards Regression for Survival Data".
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf

Do you know any help? See code below

Thanks very much in advance
J?rgen Biedermann

#----------------------------------------
#Code

Rossi <- 
read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt",
header=T)

Rossi.2 <- fold(Rossi, time='week',
     event='arrest', cov=11:62, cov.names='employed')

# see below for the fold function from John Fox

# modeling an interaction with time (Page 14)

mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~
     fin + age + age:stop + prio,
     data=Rossi.2)
mod.allison.5

# Attempt to get the survivor function of a person with age=30, fin=0 
and prio=5

newdata.1 <- 
data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
fit <- survfit(mod.allison.5,newdata.1,id="Id")

Error message:

 >Fehler in model.frame.default(data = newdata.1, id = "Id",
formula =
Surv(start,  :
   Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)')

--> failure, length of variables are different.

#-----------------------------------------------------------------
fold <- function(data, time, event, cov,
     cov.names=paste('covariate', '.', 1:ncovs,
sep=""),
     suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
     vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
     xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
     all.cov <- unlist(cov)
     if (!is.list(cov)) cov <- list(cov)
     ncovs <- length(cov)
     nrow <- nrow(data)
     ncol <- ncol(data)
     ncov <- length(cov[[1]])
     nobs <- nrow*ncov
     if (length(unique(c(sapply(cov, length), length(cov.times)-1))) > 1)
         stop(paste(
             "all elements of cov must be of the same length and \n",
             "cov.times must have one more entry than each element of 
cov."))
     var.names <- names(data)
     subjects <- rownames(data)
     omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov
     keep.cols <- (1:ncol)[-omit.cols]
     nkeep <- length(keep.cols)
     if (is.numeric(event)) event <- var.names[event]
     times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
         else data[, cov.times]
     new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep)
     rownames <- rep("", nobs)
     colnames(new.data) <- c('start', 'stop', paste(event,
suffix, sep=""),
         var.names[-omit.cols], cov.names)
     end.row <- 0
     for (i in 1:nrow){
         start.row <- end.row + 1
         end.row <- end.row + ncov
         start <- times[i, 1:ncov]
         stop <- times[i, 2:(ncov+1)]
         event.time <- ifelse (stop == data[i, time] & data[i, event] == 
1, 1, 0)
         keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T)
         select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), 
1, all)
         rows <- start.row:end.row
         cov.mat <- xlag(matrix(unlist(data[i, all.cov]), 
nrow=length(rows)), lag)
         new.data[rows[select], ] <-
             cbind(start, stop, event.time, keep, cov.mat)[select,]
         rownames[rows] <- paste(subjects[i], '.', seq(along=rows),
sep="")
         }
     row.names(new.data) <- rownames
     as.data.frame(new.data[new.data[, 1] != Inf &
         apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ])
     }
#-----------------------------------------------------------------

John Fox

2012-Jun-16 13:55 UTC

head link

[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

Dear J?rgen,

All the values of your Id variable are equal to 1; how can that make sense?
Removing the argument Id=id may get you what you want.

By the way, the document to which you refer was an appendix to a 2002 book that
has been superseded by Fox and Weisberg, An R Companion to Applied Regression,
Second Edition (Sage, 2011). Appendices for that book are available at
<http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html>.

I hope this helps,
 John

------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
	
On Sat, 16 Jun 2012 15:04:48 +0200
 J?rgen Biedermann <juergen.biedermann at googlemail.com>
wrote:> Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...
> 
> I don't find a solution to use the "survfit" function
(package: survival)  for a defined pattern of covariates with a Cox-Model
including a time dependent interaction term. Somehow the definition of my
"newdata" argument seems to be erroneous.
> I already googled the problem, found many persons having the same or a
similar problem, but still no solution.
> I want to stress that my time-dependent covariate does not depend on the
failure of an individual (in this case it doesn't seem sensible to predict a
survivor function for an individual). Rather one of my effects declines with
time (time-dependent coefficient).
> 
> For illustration, I use the example of John Fox's paper "Cox
Proportional - Hazards Regression for Survival Data".
>
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf
> 
> Do you know any help? See code below
> 
> Thanks very much in advance
> J?rgen Biedermann
> 
> #----------------------------------------
> #Code
> 
> Rossi <-
read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt",
header=T)
> 
> Rossi.2 <- fold(Rossi, time='week',
>      event='arrest', cov=11:62, cov.names='employed')
> 
> # see below for the fold function from John Fox
> 
> # modeling an interaction with time (Page 14)
> 
> mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~
>      fin + age + age:stop + prio,
>      data=Rossi.2)
> mod.allison.5
> 
> # Attempt to get the survivor function of a person with age=30, fin=0 and
prio=5
> 
> newdata.1 <-
data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
> fit <- survfit(mod.allison.5,newdata.1,id="Id")
> 
> Error message:
> 
>  >Fehler in model.frame.default(data = newdata.1, id = "Id",
formula =
> Surv(start,  :
>    Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)')
> 
> --> failure, length of variables are different.
> 
> #-----------------------------------------------------------------
> fold <- function(data, time, event, cov,
>      cov.names=paste('covariate', '.', 1:ncovs,
sep=""),
>      suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
>      vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
>      xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
>      all.cov <- unlist(cov)
>      if (!is.list(cov)) cov <- list(cov)
>      ncovs <- length(cov)
>      nrow <- nrow(data)
>      ncol <- ncol(data)
>      ncov <- length(cov[[1]])
>      nobs <- nrow*ncov
>      if (length(unique(c(sapply(cov, length), length(cov.times)-1))) >
1)
>          stop(paste(
>              "all elements of cov must be of the same length and
\n",
>              "cov.times must have one more entry than each element of 
> cov."))
>      var.names <- names(data)
>      subjects <- rownames(data)
>      omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov
>      keep.cols <- (1:ncol)[-omit.cols]
>      nkeep <- length(keep.cols)
>      if (is.numeric(event)) event <- var.names[event]
>      times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
>          else data[, cov.times]
>      new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep)
>      rownames <- rep("", nobs)
>      colnames(new.data) <- c('start', 'stop',
paste(event, suffix, sep=""),
>          var.names[-omit.cols], cov.names)
>      end.row <- 0
>      for (i in 1:nrow){
>          start.row <- end.row + 1
>          end.row <- end.row + ncov
>          start <- times[i, 1:ncov]
>          stop <- times[i, 2:(ncov+1)]
>          event.time <- ifelse (stop == data[i, time] & data[i,
event] ==
> 1, 1, 0)
>          keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep,
byrow=T)
>          select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), 
> 1, all)
>          rows <- start.row:end.row
>          cov.mat <- xlag(matrix(unlist(data[i, all.cov]), 
> nrow=length(rows)), lag)
>          new.data[rows[select], ] <-
>              cbind(start, stop, event.time, keep, cov.mat)[select,]
>          rownames[rows] <- paste(subjects[i], '.',
seq(along=rows), sep="")
>          }
>      row.names(new.data) <- rownames
>      as.data.frame(new.data[new.data[, 1] != Inf &
>          apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ])
>      }
> #-----------------------------------------------------------------
>

Terry Therneau

2012-Jun-26 19:09 UTC

head link

[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

I'm finally back from vacation and looking at your email.

1. The primary mistake is in your call, where you say
     fit <- survfit(mod.allison.5, newdata.1, id="Id")

This will use the character string "Id" as the value of the
identifier,
not the data.  The effect is exactly the same as the difference between 
print(x) and print('x').

2. In reply to John's comment that "all the id values are the
same".  It
is correct.  Normally the survfit routine is used to produce multiple 
curves, one curve per line of the input data, for time-independent 
variables.   The presence of an id argument is used to tell it that 
there are multiple lines per subject in the data, e.g. time-dependent 
covariates.  So even though there is only one curve being produced we 
need an id statement to trigger the behavior.
   If you only want one curve for one individual, then
"individual=TRUE"
is an alternate, as John pointed out.

3. "It's very important to specify the Surv object and the formula 
directly in the coxph function ..."
Yes, I agree.  I always use your suggested form because it gives better 
documentation -- variable names are directly visible in the coxph call.  
I don't understand the attraction of the other form, but lot's of people
use it.
    Why did it go wrong?  Because the survfit function was evaluating
         Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + 
age + age:stop + pro, data=newdata.1

The length of the variables will be different.  The error message comes 
from the R internals, not my program.

Terry Therneau


On 06/16/2012 08:04 AM, Jürgen Biedermann wrote:>
> Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...
>
> I don't find a solution to use the "survfit" function
(package:
> survival)  for a defined pattern of covariates with a Cox-Model
> including a time dependent interaction term. Somehow the definition of
> my "newdata" argument seems to be erroneous.
> I already googled the problem, found many persons having the same or a
> similar problem, but still no solution.
> I want to stress that my time-dependent covariate does not depend on the
> failure of an individual (in this case it doesn't seem sensible to
> predict a survivor function for an individual). Rather one of my effects
> declines with time (time-dependent coefficient).
>
> For illustration, I use the example of John Fox's paper "Cox
> Proportional - Hazards Regression for Survival Data".
>
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf
>
> Do you know any help? See code below
>
> Thanks very much in advance
> Jürgen Biedermann
>
> #----------------------------------------
> #Code
>
> Rossi <-
>
read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt",
> header=T)
>
> Rossi.2 <- fold(Rossi, time='week',
>      event='arrest', cov=11:62, cov.names='employed')
>
> # see below for the fold function from John Fox
>
> # modeling an interaction with time (Page 14)
>
> mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~
>      fin + age + age:stop + prio,
>      data=Rossi.2)
> mod.allison.5
>
> # Attempt to get the survivor function of a person with age=30, fin=0
> and prio=5
>
> newdata.1 <-
>
data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
> fit <- survfit(mod.allison.5,newdata.1,id="Id")
>
> Error message:
>
> >Fehler in model.frame.default(data = newdata.1, id = "Id",
formula > Surv(start,  :
>    Variablenlängen sind unterschiedlich (gefunden für '(id)')
>
> --> failure, length of variables are different.
>
> #-----------------------------------------------------------------
> fold <- function(data, time, event, cov,
>      cov.names=paste('covariate', '.', 1:ncovs,
sep=""),
>      suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
>      vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
>      xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
>      all.cov <- unlist(cov)
>      if (!is.list(cov)) cov <- list(cov)
>      ncovs <- length(cov)
>      nrow <- nrow(data)
>      ncol <- ncol(data)
>      ncov <- length(cov[[1]])
>      nobs <- nrow*ncov
>      if (length(unique(c(sapply(cov, length), length(cov.times)-1))) >
1)
>          stop(paste(
>              "all elements of cov must be of the same length and
\n",
>              "cov.times must have one more entry than each element of
> cov."))
>      var.names <- names(data)
>      subjects <- rownames(data)
>      omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov
>      keep.cols <- (1:ncol)[-omit.cols]
>      nkeep <- length(keep.cols)
>      if (is.numeric(event)) event <- var.names[event]
>      times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
>          else data[, cov.times]
>      new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep)
>      rownames <- rep("", nobs)
>      colnames(new.data) <- c('start', 'stop',
paste(event, suffix,
> sep=""),
>          var.names[-omit.cols], cov.names)
>      end.row <- 0
>      for (i in 1:nrow){
>          start.row <- end.row + 1
>          end.row <- end.row + ncov
>          start <- times[i, 1:ncov]
>          stop <- times[i, 2:(ncov+1)]
>          event.time <- ifelse (stop == data[i, time] & data[i,
event] => 1, 1, 0)
>          keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep,
byrow=T)
>          select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs),
> 1, all)
>          rows <- start.row:end.row
>          cov.mat <- xlag(matrix(unlist(data[i, all.cov]),
> nrow=length(rows)), lag)
>          new.data[rows[select], ] <-
>              cbind(start, stop, event.time, keep, cov.mat)[select,]
>          rownames[rows] <- paste(subjects[i], '.',
seq(along=rows),
> sep="")
>          }
>      row.names(new.data) <- rownames
>      as.data.frame(new.data[new.data[, 1] != Inf &
>          apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ])
>      }
> #-----------------------------------------------------------------
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jun 2012 - How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?

Possibly Parallel Threads