Jürgen Biedermann
2012-Jun-16 13:04 UTC
[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?
Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the "survfit" function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous. I already googled the problem, found many persons having the same or a similar problem, but still no solution. I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data". http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Do you know any help? See code below Thanks very much in advance J?rgen Biedermann #---------------------------------------- #Code Rossi <- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T) Rossi.2 <- fold(Rossi, time='week', event='arrest', cov=11:62, cov.names='employed') # see below for the fold function from John Fox # modeling an interaction with time (Page 14) mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data=Rossi.2) mod.allison.5 # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 newdata.1 <- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit <- survfit(mod.allison.5,newdata.1,id="Id") Error message: >Fehler in model.frame.default(data = newdata.1, id = "Id", formula = Surv(start, : Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)') --> failure, length of variables are different. #----------------------------------------------------------------- fold <- function(data, time, event, cov, cov.names=paste('covariate', '.', 1:ncovs, sep=""), suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) all.cov <- unlist(cov) if (!is.list(cov)) cov <- list(cov) ncovs <- length(cov) nrow <- nrow(data) ncol <- ncol(data) ncov <- length(cov[[1]]) nobs <- nrow*ncov if (length(unique(c(sapply(cov, length), length(cov.times)-1))) > 1) stop(paste( "all elements of cov must be of the same length and \n", "cov.times must have one more entry than each element of cov.")) var.names <- names(data) subjects <- rownames(data) omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov keep.cols <- (1:ncol)[-omit.cols] nkeep <- length(keep.cols) if (is.numeric(event)) event <- var.names[event] times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) else data[, cov.times] new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep) rownames <- rep("", nobs) colnames(new.data) <- c('start', 'stop', paste(event, suffix, sep=""), var.names[-omit.cols], cov.names) end.row <- 0 for (i in 1:nrow){ start.row <- end.row + 1 end.row <- end.row + ncov start <- times[i, 1:ncov] stop <- times[i, 2:(ncov+1)] event.time <- ifelse (stop == data[i, time] & data[i, event] == 1, 1, 0) keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T) select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), 1, all) rows <- start.row:end.row cov.mat <- xlag(matrix(unlist(data[i, all.cov]), nrow=length(rows)), lag) new.data[rows[select], ] <- cbind(start, stop, event.time, keep, cov.mat)[select,] rownames[rows] <- paste(subjects[i], '.', seq(along=rows), sep="") } row.names(new.data) <- rownames as.data.frame(new.data[new.data[, 1] != Inf & apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ]) } #-----------------------------------------------------------------
John Fox
2012-Jun-16 13:55 UTC
[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?
Dear J?rgen, All the values of your Id variable are equal to 1; how can that make sense? Removing the argument Id=id may get you what you want. By the way, the document to which you refer was an appendix to a 2002 book that has been superseded by Fox and Weisberg, An R Companion to Applied Regression, Second Edition (Sage, 2011). Appendices for that book are available at <http://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html>. I hope this helps, John ------------------------------------------------ John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Sat, 16 Jun 2012 15:04:48 +0200 J?rgen Biedermann <juergen.biedermann at googlemail.com> wrote:> Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... > > I don't find a solution to use the "survfit" function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my "newdata" argument seems to be erroneous. > I already googled the problem, found many persons having the same or a similar problem, but still no solution. > I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). > > For illustration, I use the example of John Fox's paper "Cox Proportional - Hazards Regression for Survival Data". > http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf > > Do you know any help? See code below > > Thanks very much in advance > J?rgen Biedermann > > #---------------------------------------- > #Code > > Rossi <- read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", header=T) > > Rossi.2 <- fold(Rossi, time='week', > event='arrest', cov=11:62, cov.names='employed') > > # see below for the fold function from John Fox > > # modeling an interaction with time (Page 14) > > mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~ > fin + age + age:stop + prio, > data=Rossi.2) > mod.allison.5 > > # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 > > newdata.1 <- data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) > fit <- survfit(mod.allison.5,newdata.1,id="Id") > > Error message: > > >Fehler in model.frame.default(data = newdata.1, id = "Id", formula = > Surv(start, : > Variablenl?ngen sind unterschiedlich (gefunden f?r '(id)') > > --> failure, length of variables are different. > > #----------------------------------------------------------------- > fold <- function(data, time, event, cov, > cov.names=paste('covariate', '.', 1:ncovs, sep=""), > suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ > vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) > xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) > all.cov <- unlist(cov) > if (!is.list(cov)) cov <- list(cov) > ncovs <- length(cov) > nrow <- nrow(data) > ncol <- ncol(data) > ncov <- length(cov[[1]]) > nobs <- nrow*ncov > if (length(unique(c(sapply(cov, length), length(cov.times)-1))) > 1) > stop(paste( > "all elements of cov must be of the same length and \n", > "cov.times must have one more entry than each element of > cov.")) > var.names <- names(data) > subjects <- rownames(data) > omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov > keep.cols <- (1:ncol)[-omit.cols] > nkeep <- length(keep.cols) > if (is.numeric(event)) event <- var.names[event] > times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) > else data[, cov.times] > new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep) > rownames <- rep("", nobs) > colnames(new.data) <- c('start', 'stop', paste(event, suffix, sep=""), > var.names[-omit.cols], cov.names) > end.row <- 0 > for (i in 1:nrow){ > start.row <- end.row + 1 > end.row <- end.row + ncov > start <- times[i, 1:ncov] > stop <- times[i, 2:(ncov+1)] > event.time <- ifelse (stop == data[i, time] & data[i, event] == > 1, 1, 0) > keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T) > select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), > 1, all) > rows <- start.row:end.row > cov.mat <- xlag(matrix(unlist(data[i, all.cov]), > nrow=length(rows)), lag) > new.data[rows[select], ] <- > cbind(start, stop, event.time, keep, cov.mat)[select,] > rownames[rows] <- paste(subjects[i], '.', seq(along=rows), sep="") > } > row.names(new.data) <- rownames > as.data.frame(new.data[new.data[, 1] != Inf & > apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ]) > } > #----------------------------------------------------------------- >
Terry Therneau
2012-Jun-26 19:09 UTC
[R] How to specify "newdata" in a Cox-Modell with a time dependent interaction term?
I'm finally back from vacation and looking at your email. 1. The primary mistake is in your call, where you say fit <- survfit(mod.allison.5, newdata.1, id="Id") This will use the character string "Id" as the value of the identifier, not the data. The effect is exactly the same as the difference between print(x) and print('x'). 2. In reply to John's comment that "all the id values are the same". It is correct. Normally the survfit routine is used to produce multiple curves, one curve per line of the input data, for time-independent variables. The presence of an id argument is used to tell it that there are multiple lines per subject in the data, e.g. time-dependent covariates. So even though there is only one curve being produced we need an id statement to trigger the behavior. If you only want one curve for one individual, then "individual=TRUE" is an alternate, as John pointed out. 3. "It's very important to specify the Surv object and the formula directly in the coxph function ..." Yes, I agree. I always use your suggested form because it gives better documentation -- variable names are directly visible in the coxph call. I don't understand the attraction of the other form, but lot's of people use it. Why did it go wrong? Because the survfit function was evaluating Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + age + age:stop + pro, data=newdata.1 The length of the variables will be different. The error message comes from the R internals, not my program. Terry Therneau On 06/16/2012 08:04 AM, Jürgen Biedermann wrote:> > Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... > > I don't find a solution to use the "survfit" function (package: > survival) for a defined pattern of covariates with a Cox-Model > including a time dependent interaction term. Somehow the definition of > my "newdata" argument seems to be erroneous. > I already googled the problem, found many persons having the same or a > similar problem, but still no solution. > I want to stress that my time-dependent covariate does not depend on the > failure of an individual (in this case it doesn't seem sensible to > predict a survivor function for an individual). Rather one of my effects > declines with time (time-dependent coefficient). > > For illustration, I use the example of John Fox's paper "Cox > Proportional - Hazards Regression for Survival Data". > http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf > > Do you know any help? See code below > > Thanks very much in advance > Jürgen Biedermann > > #---------------------------------------- > #Code > > Rossi <- > read.table("http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt", > header=T) > > Rossi.2 <- fold(Rossi, time='week', > event='arrest', cov=11:62, cov.names='employed') > > # see below for the fold function from John Fox > > # modeling an interaction with time (Page 14) > > mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~ > fin + age + age:stop + prio, > data=Rossi.2) > mod.allison.5 > > # Attempt to get the survivor function of a person with age=30, fin=0 > and prio=5 > > newdata.1 <- > data.frame(unique(Rossi.2[c("start","stop")]),fin=0,age=30,prio=5,Id=1,arrest.time=0) > fit <- survfit(mod.allison.5,newdata.1,id="Id") > > Error message: > > >Fehler in model.frame.default(data = newdata.1, id = "Id", formula > Surv(start, : > Variablenlängen sind unterschiedlich (gefunden für '(id)') > > --> failure, length of variables are different. > > #----------------------------------------------------------------- > fold <- function(data, time, event, cov, > cov.names=paste('covariate', '.', 1:ncovs, sep=""), > suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ > vlag <- function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) > xlag <- function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) > all.cov <- unlist(cov) > if (!is.list(cov)) cov <- list(cov) > ncovs <- length(cov) > nrow <- nrow(data) > ncol <- ncol(data) > ncov <- length(cov[[1]]) > nobs <- nrow*ncov > if (length(unique(c(sapply(cov, length), length(cov.times)-1))) > 1) > stop(paste( > "all elements of cov must be of the same length and \n", > "cov.times must have one more entry than each element of > cov.")) > var.names <- names(data) > subjects <- rownames(data) > omit.cols <- if (!common.times) c(all.cov, cov.times) else all.cov > keep.cols <- (1:ncol)[-omit.cols] > nkeep <- length(keep.cols) > if (is.numeric(event)) event <- var.names[event] > times <- if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) > else data[, cov.times] > new.data <- matrix(Inf, nobs, 3 + ncovs + nkeep) > rownames <- rep("", nobs) > colnames(new.data) <- c('start', 'stop', paste(event, suffix, > sep=""), > var.names[-omit.cols], cov.names) > end.row <- 0 > for (i in 1:nrow){ > start.row <- end.row + 1 > end.row <- end.row + ncov > start <- times[i, 1:ncov] > stop <- times[i, 2:(ncov+1)] > event.time <- ifelse (stop == data[i, time] & data[i, event] => 1, 1, 0) > keep <- matrix(unlist(data[i, -omit.cols]), ncov, nkeep, byrow=T) > select <- apply(matrix(!is.na(data[i, all.cov]), ncol=ncovs), > 1, all) > rows <- start.row:end.row > cov.mat <- xlag(matrix(unlist(data[i, all.cov]), > nrow=length(rows)), lag) > new.data[rows[select], ] <- > cbind(start, stop, event.time, keep, cov.mat)[select,] > rownames[rows] <- paste(subjects[i], '.', seq(along=rows), > sep="") > } > row.names(new.data) <- rownames > as.data.frame(new.data[new.data[, 1] != Inf & > apply(as.matrix(!is.na(new.data[, cov.names])), 1, all), ]) > } > #----------------------------------------------------------------- >[[alternative HTML version deleted]]