Daniel Meddings
2015-Jul-22 08:20 UTC
[R] How to simulate informative censoring in a Cox PH model?
I wish to simulate event times where the censoring is informative, and to compare parameter estimator quality from a Cox PH model with estimates obtained from event times generated with non-informative censoring. However I am struggling to do this, and I conclude rather than a technical flaw in my code I instead do not understand what is meant by informative and un-informative censoring. My approach is to simulate an event time T dependent on a vector of covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}. This corresponds to T~ Weibull(lambda(x),v), where the scale parameter lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}), lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume the regression coefficients are p-dimensional. I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C), lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not observed). I am convinced this is informative censoring because as long as beta_T~=0 and beta_C~=0 then for each subject the data generating process for T and C both depend on x. In contrast I generate non-informative censoring times D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i) and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed event), and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event not observed). Here beta_D is a scalar. I "scale" the simulation by choosing the lambda_T, lambda_C and lambda_D parameters such that on average T_i<C_i and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and Y_ninf_i. The problem is that even for say 30% censoring (which I think is high), the Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I expected the estimates using Y_inf to be biased, and I think I see why: however different beta_C is from beta_T, a censored subject can presumably influence the estimation of beta_T only by affecting the set of subjects at risk at any time t, but this does not change the fact that every single Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus I do not see how my simulation can possibly produce biased estimates for beta_T using Y_inf. But then what is informative censoring if not based on this approach? Any help would be greatly appreciated. [[alternative HTML version deleted]]
Greg Snow
2015-Jul-22 23:33 UTC
[R] How to simulate informative censoring in a Cox PH model?
I think that the Cox model still works well when the only information in the censoring is conditional on variables in the model. What you describe could be called non-informative conditional on x. To really see the difference you need informative censoring that depends on something not included in the model. One option would be to use copulas to generate dependent data and then transform the values using your Weibul. Or you could generate your event times and censoring times based on x1 and x2, but then only include x1 in the model. On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings <dpmeddings at gmail.com> wrote:> I wish to simulate event times where the censoring is informative, and to > compare parameter estimator quality from a Cox PH model with estimates > obtained from event times generated with non-informative censoring. However > I am struggling to do this, and I conclude rather than a technical flaw in > my code I instead do not understand what is meant by informative and > un-informative censoring. > > My approach is to simulate an event time T dependent on a vector of > covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}. > This corresponds to T~ Weibull(lambda(x),v), where the scale parameter > lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is > fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}), > lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume > the regression coefficients are p-dimensional. > > I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C), > lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and > a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and > delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not > observed). I am convinced this is informative censoring because as long as > beta_T~=0 and beta_C~=0 then for each subject the data generating process > for T and C both depend on x. > > In contrast I generate non-informative censoring times > D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i) > and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed event), > and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event not > observed). Here beta_D is a scalar. I "scale" the simulation by choosing > the lambda_T, lambda_C and lambda_D parameters such that on average T_i<C_i > and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and > Y_ninf_i. > > The problem is that even for say 30% censoring (which I think is high), the > Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I > expected the estimates using Y_inf to be biased, and I think I see why: > however different beta_C is from beta_T, a censored subject can presumably > influence the estimation of beta_T only by affecting the set of subjects at > risk at any time t, but this does not change the fact that every single > Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus > I do not see how my simulation can possibly produce biased estimates for > beta_T using Y_inf. > > But then what is informative censoring if not based on this approach? > > Any help would be greatly appreciated. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Daniel Meddings
2015-Jul-24 16:14 UTC
[R] How to simulate informative censoring in a Cox PH model?
Hi Greg Many thanks for taking the time to respond to my query. You are right about pointing out the distinction between what variables are and are not included in the event times process and in the censoring process. I clearly forgot this important aspect. I amended my code to do as you advise and now I am indeed getting biased estimates when using the informatively censored responses. The problem is now that the estimates from the independently censored responses are the same - i.e. they are just as biased. Thus the bias seems to be due entirely to model mis-specification and not the informative censoring. To give a concrete example I simulate event times T_i and censoring times C_i from the following models; T_i~ Weibull(lambda_t(x),v_t), lambda_t(x)=lambda_t*exp( beta_t_0 + (beta_t_1*Treat_i) + (beta_t_2*Z_i) + (beta_t_3*Treat_i*Z_i) ) C_i~ Weibull(lambda_c(x),v_c), lambda_c(x)=lambda_c*exp( beta_c_0 + (beta_c_1*Treat_i) + (beta_c_2*Z_i) + (beta_c_3*Treat_i*Z_i) ) D_i~Weibull(lambda_d(x),v_D), lambda_d(x)=lamda_d*exp( beta_d_0) where ; beta_t_0 = 1, beta_t_1 = -1, beta_t_2 = 1, beta_t_3 = -2, lambda_t=0.5 beta_c_0 = 0.2, beta_c_1 = -2, beta_c_2 = 2, beta_c_3 = -2, lambda_c=0.5 beta_d_0 = -0.7, lambda_d=0.1 When I fit the cox model to both the informatively censored responses and the independent censored responses I include only the Treatment covariate in the model. I simulate Treatment from a Bernoulli distribution with p=0.5 and Z_i from a beta distribution so that Z ranges from 0 to 1 (I like to think of Z as a "poor" prognosis probability where Z_i=1 means a subject is 100% certain to have a poor prognosis and Z_i=0 means zero chance). These parameter choices give approximately 27% and 25% censoring for the informatively censored responses (using C_i) and the independent censored responses (using D_i) respectively. I use N=2000 subjects and 2000 simulation replications. The above simulation I get estimates of beta_t_2 of -1.526 and -1.537 for the informatively censored responses and the independent censored responses respectively. Furthermore when I fit a cox model to the full responses (no censoring at all) I get an estimate of beta_t_2 of -1.542. This represents the best that can possibly be done given that Z and Treat*Z are not in the model. Clearly censoring is not making much of a difference here - model mis-specification dominates. I still must be doing something wrong but I cannot figure this one out. Thanks Dan On Thu, Jul 23, 2015 at 12:33 AM, Greg Snow <538280 at gmail.com> wrote:> I think that the Cox model still works well when the only information > in the censoring is conditional on variables in the model. What you > describe could be called non-informative conditional on x. > > To really see the difference you need informative censoring that > depends on something not included in the model. One option would be > to use copulas to generate dependent data and then transform the > values using your Weibul. Or you could generate your event times and > censoring times based on x1 and x2, but then only include x1 in the > model. > > On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings <dpmeddings at gmail.com> > wrote: > > I wish to simulate event times where the censoring is informative, and to > > compare parameter estimator quality from a Cox PH model with estimates > > obtained from event times generated with non-informative censoring. > However > > I am struggling to do this, and I conclude rather than a technical flaw > in > > my code I instead do not understand what is meant by informative and > > un-informative censoring. > > > > My approach is to simulate an event time T dependent on a vector of > > covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}. > > This corresponds to T~ Weibull(lambda(x),v), where the scale parameter > > lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is > > fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}), > > lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I > assume > > the regression coefficients are p-dimensional. > > > > I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C), > > lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) > and > > a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and > > delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not > > observed). I am convinced this is informative censoring because as long > as > > beta_T~=0 and beta_C~=0 then for each subject the data generating process > > for T and C both depend on x. > > > > In contrast I generate non-informative censoring times > > D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i) > > and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed > event), > > and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event > not > > observed). Here beta_D is a scalar. I "scale" the simulation by choosing > > the lambda_T, lambda_C and lambda_D parameters such that on average > T_i<C_i > > and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and > > Y_ninf_i. > > > > The problem is that even for say 30% censoring (which I think is high), > the > > Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when > I > > expected the estimates using Y_inf to be biased, and I think I see why: > > however different beta_C is from beta_T, a censored subject can > presumably > > influence the estimation of beta_T only by affecting the set of subjects > at > > risk at any time t, but this does not change the fact that every single > > Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. > Thus > > I do not see how my simulation can possibly produce biased estimates for > > beta_T using Y_inf. > > > > But then what is informative censoring if not based on this approach? > > > > Any help would be greatly appreciated. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Gregory (Greg) L. Snow Ph.D. > 538280 at gmail.com >[[alternative HTML version deleted]]