aajit75
2011-Dec-19 08:34 UTC
[R] Calculating the probability of an event at time "t" from a Cox model fit
Dear R-users, I would like to determine the probability of event at specific time using cox model fit. On the development sample data I am able to get the probability of a event at time point(t). I need probability score of a event at specific time, using scoring scoring dataset which will have only covariates and not the response variables. Here is the sample code: n = 1000 beta1 = 2; beta2 = -1; lambdaT = .02 # baseline hazard lambdaC = .4 # hazard of censoring x1 = rnorm(n,0) x2 = rnorm(n,0) # true event time T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) C = rweibull(n, shape=1, scale=lambdaC) #censoring time time = pmin(T,C) #observed time is min of censored and true event = time==T # set to 1 if event is observed dataphr=data.frame(time,event,x1,x2) library(survival) fit_coxph <- coxph(Surv(time, event)~ x1 + x2 , method="breslow") library(peperr) predictProb.coxph(fit_coxph, Surv(dataphr$time, dataphr$event), dataphr, 0.003) # Using predictProb.coxph function, probability of event at time (t) is estimated for cox fit models, I want to estimate this probability on scoring dataset score_data as below with covariate x1 and x2. Is it possible/ is there any way to get these probabilities? since in predictProb.coxph function it requires response, which is not preseent on scoring sample. n = 10000 set.seed(1) x1 = rnorm(n,0) x2 = rnorm(n,0) score_data <- data.frame(x1,x2) Thanks in advance!! ~ Ajit -- View this message in context: http://r.789695.n4.nabble.com/Calculating-the-probability-of-an-event-at-time-t-from-a-Cox-model-fit-tp4213318p4213318.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2011-Dec-19 15:11 UTC
[R] Calculating the probability of an event at time "t" from a Cox model fit
On Dec 19, 2011, at 3:34 AM, aajit75 wrote:> Dear R-users, > > I would like to determine the probability of event at specific time > using > cox model fit. On the development sample data I am able to get the > probability of a event at time point(t). > I need probability score of a event at specific time, using scoring > scoring > dataset which will have only covariates and not the response > variables. > > Here is the sample code: > > n = 1000 > beta1 = 2; beta2 = -1; > lambdaT = .02 # baseline hazard > lambdaC = .4 # hazard of censoring > x1 = rnorm(n,0) > x2 = rnorm(n,0) > > # true event time > T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) > C = rweibull(n, shape=1, scale=lambdaC) #censoring time > time = pmin(T,C) #observed time is min of censored and true > event = time==T # set to 1 if event is observed > dataphr=data.frame(time,event,x1,x2) > > library(survival) > fit_coxph <- coxph(Surv(time, event)~ x1 + x2 , method="breslow") > > library(peperr) > predictProb.coxph(fit_coxph, Surv(dataphr$time, dataphr$event), > dataphr, > 0.003) > > # Using predictProb.coxph function, probability of event at time (t) > is > estimated for cox fit models,No , it should not be the probability at time T (which would be infinitesimally small at any time point), but rather the probability prior to time T. I am puzzled why one would not use predict.coxph with type="expected". ?predict.coxph> I want to estimate this probability on scoring > dataset score_data as below with covariate x1 and x2. > > Is it possible/ is there any way to get these probabilities? since in > predictProb.coxph function it requires response, which is not > preseent on > scoring sample.Raising what would seem to be the obvious question: Why didn't you make one? Or use the derivation Surv object?> > n = 1000 > set.seed(1) > x1 = rnorm(n,0) > x2 = rnorm(n,0) > score_data <- data.frame(x1,x2)> n = 100 > set.seed(1) > x1 = rnorm(n,0) > x2 = rnorm(n,0) > score_data <- data.frame(x1,x2) > s.scr <- survival::Surv(rep(2, n),rep(1, n) ) > predictProb.coxph(fit_coxph, s.scr, score_data, 0.003) Error in model.frame.default(formula = Surv(time, event) ~ x1 + x2) : variable lengths differ (found for 'x1') After getting that model.frame error with the 10000 element test set: "variable lengths differ (found for 'x1')", I changed it to the same length as in the derivation set and ran: predictProb.coxph(fit_coxph, Surv(time, event), score_data, 0.003) [,1] [1,] 9.977413e-01 [2,] 9.879081e-01 [3,] 9.893564e-01 [4,] 5.857123e-01 [5,] 9.550606e-01 [6,] 9.761398e-01 [7,] 9.699297e-01 snipped So apparently that function will accept a dataframe even though its help page says it should be a matrix and it will run if the same number of records aare present as in the derivation set. This next version also worked and gave identical results, so it appears the Surv object is really just a place holder. s.scr <- survival::Surv(rep(2, n),rep(0, n) ) > predictProb.coxph(fit_coxph, s.scr, score_data, 0.003) [,1] [1,] 9.977413e-01 [2,] 9.879081e-01 [3,] 9.893564e-01 [4,] 5.857123e-01 [5,] 9.550606e-01 snipped Since I am not a regular user I cannot really answer the question about whether a Surv() object and a dataframe of different dimensions than the fit object should be valid input to that function. (Copied the peperr author.)>David Winsemius, MD West Hartford, CT
Maybe Matching Threads
- Scoring using cox model: probability of survival before time t
- simulate time data
- reference classes, LAZY_DUPLICATE_OK, and external pointers
- problem with BootCV for coxph in pec after feature selection with glmnet (lasso)
- CRAN (and crantastic) updates this week