Philipp Rappold
2009-May-05 12:08 UTC
[R] Cox Proportional Hazard with missing covariate data
Dear friends, I have used R for some time now and have a tricky question about the coxph-function: To sum it up, I am not sure whether I can use coxph in conjunction with missing covariate data in a model with time-variant covariates. The point is: I know how "old" every piece that I oberserve is, but do not have fully historical information about the corresponding covariates. Maybe you have some advice for me, although this problem might only be 70% R and 30% statistically-related. Here's a detailled explanation: SITUATION & OBJECTIVE: I want to analyze the effect of environmental effects (i.e. temperature and humidity) on the lifetime of some wear-parts. The study should be conducted on a yearly basis, meaning that I have collected empirical data on every wearpart at the end of every year. DATA: I have collected the following data: - Status of the wear-part: Equals "0" if part is still alive, equals "1" if part has "died" (my event variable) - Environmental data: Temperature and humidity have been measured at each of the wear-parts on a yearly basis (because each wear-part is at a different location, I have different data for each wear-part) PROBLEM: I started collecting data between 2001 and 2007. In 2001, a vast amount of of wearparts has already been in use. I DO KNOW for every part how long it has been used (even if it was employed before 2001), but I DO NOT have any information about environmental conditions like temperature or humidity before 2001 (I call this semi-left-censored). Of course, one could argue that I should simply exclude these parts from my analysis, but I don't want to loose valuable information, also because the amount of "new parts" that have been employed between 2001 and 2007 is rather small. Additionally, I cannot make any assumption about the underlying lifetime distribution. Therefore I have to use a non-parametrical model for estimation (most likely cox). QUESTION:>From an econometric perspective, is it possible to use CoxProportional Hazard model in this setting? As mentioned before, I have time-variant covariates for each wearpart, as well as what I call "semi-left-censored" data that I want to use. If not, what kind of analysis would you suggest? Thanks a lot for your great help, I really appreciate it. All the best Philipp
Arthur Allignol
2009-May-05 12:51 UTC
[R] Cox Proportional Hazard with missing covariate data
Hi, In fact, you have left-truncated observations. What timescale do you use, time 0 is the study entry, or when the wear-part has been used for the first time? If it is the latter, you can specify the "age" of the wear part at study entry in Surv(). For example, if a wear part has been used for 5 years before study entry, and "dies" 2 years after, the data will look like that: start stop status 5 7 1 Hope this helps, Arthur Allignol Philipp Rappold wrote:> Dear friends, > > I have used R for some time now and have a tricky question about the coxph-function: To sum it up, I am not sure whether I can use coxph in conjunction with missing covariate data in a model with time-variant covariates. The point is: I know how "old" every piece that I oberserve is, but do not have fully historical information about the corresponding covariates. Maybe you have some advice for me, although this problem might only be 70% R and 30% statistically-related. Here's a detailled explanation: > > SITUATION & OBJECTIVE: > I want to analyze the effect of environmental effects (i.e. > temperature and humidity) on the lifetime of some wear-parts. The > study should be conducted on a yearly basis, meaning that I have > collected empirical data on every wearpart at the end of every year. > > DATA: > I have collected the following data: > - Status of the wear-part: Equals "0" if part is still alive, equals > "1" if part has "died" (my event variable) > - Environmental data: Temperature and humidity have been measured at > each of the wear-parts on a yearly basis (because each wear-part is at > a different location, I have different data for each wear-part) > > PROBLEM: > I started collecting data between 2001 and 2007. In 2001, a vast > amount of of wearparts has already been in use. I DO KNOW for every > part how long it has been used (even if it was employed before 2001), > but I DO NOT have any information about environmental conditions like > temperature or humidity before 2001 (I call this semi-left-censored). > Of course, one could argue that I should simply exclude these parts > from my analysis, but I don't want to loose valuable information, also > because the amount of "new parts" that have been employed between 2001 > and 2007 is rather small. > > Additionally, I cannot make any assumption about the underlying > lifetime distribution. Therefore I have to use a non-parametrical > model for estimation (most likely cox). > > QUESTION: >>From an econometric perspective, is it possible to use Cox > Proportional Hazard model in this setting? As mentioned before, I have > time-variant covariates for each wearpart, as well as what I call > "semi-left-censored" data that I want to use. If not, what kind of > analysis would you suggest? > > Thanks a lot for your great help, I really appreciate it. > > All the best > Philipp > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Philipp Rappold
2009-May-05 13:21 UTC
[R] Cox Proportional Hazard with missing covariate data
Hi, Arthur, thanks a lot for your super-fast reply! In fact I am using the time when the part has been used for the first time, so your example should work in my case. Moreover, as I have time-variant covariates, the example should look like this in my specific case: start stop status temp humid 5 6 0 32 43 6 7 1 34 42 Just two more things: (1) I am quite a newbie to cox-regression, so I wonder what you think about the approach that I mentioned above? Don't worry, I won't nail you down to this, just want to make sure I am not totally "off track"! (2) I don't think that you'd call this "left-truncated" observations, because I DO know the time when the part was used for the first time, I just don't have covariate values for its whole time of life, e.g. just the last two years in the example above. Left truncation in my eyes would mean that I did not even observe a specific part, e.g. because it has died before the study started. Again, thanks a lot, I'll be happy to provide valuable help on this list as soon as my R-skills are advancing. All the best Philipp Arthur Allignol wrote:> Hi, > > In fact, you have left-truncated observations. > > What timescale do you use, time 0 is the > study entry, or when the wear-part has been used for the > first time? > > If it is the latter, you can specify the "age" of the wear part > at study entry in Surv(). For example, if a wear part has been > used for 5 years before study entry, and "dies" 2 years after, > the data will look like that: > start stop status > 5 7 1 > > Hope this helps, > Arthur Allignol > > > Philipp Rappold wrote: >> Dear friends, >> >> I have used R for some time now and have a tricky question about the >> coxph-function: To sum it up, I am not sure whether I can use coxph in >> conjunction with missing covariate data in a model with time-variant >> covariates. The point is: I know how "old" every piece that I >> oberserve is, but do not have fully historical information about the >> corresponding covariates. Maybe you have some advice for me, although >> this problem might only be 70% R and 30% statistically-related. Here's >> a detailled explanation: >> >> SITUATION & OBJECTIVE: >> I want to analyze the effect of environmental effects (i.e. >> temperature and humidity) on the lifetime of some wear-parts. The >> study should be conducted on a yearly basis, meaning that I have >> collected empirical data on every wearpart at the end of every year. >> >> DATA: >> I have collected the following data: >> - Status of the wear-part: Equals "0" if part is still alive, equals >> "1" if part has "died" (my event variable) >> - Environmental data: Temperature and humidity have been measured at >> each of the wear-parts on a yearly basis (because each wear-part is at >> a different location, I have different data for each wear-part) >> >> PROBLEM: >> I started collecting data between 2001 and 2007. In 2001, a vast >> amount of of wearparts has already been in use. I DO KNOW for every >> part how long it has been used (even if it was employed before 2001), >> but I DO NOT have any information about environmental conditions like >> temperature or humidity before 2001 (I call this semi-left-censored). >> Of course, one could argue that I should simply exclude these parts >> from my analysis, but I don't want to loose valuable information, also >> because the amount of "new parts" that have been employed between 2001 >> and 2007 is rather small. >> >> Additionally, I cannot make any assumption about the underlying >> lifetime distribution. Therefore I have to use a non-parametrical >> model for estimation (most likely cox). >> >> QUESTION: >>> From an econometric perspective, is it possible to use Cox >> Proportional Hazard model in this setting? As mentioned before, I have >> time-variant covariates for each wearpart, as well as what I call >> "semi-left-censored" data that I want to use. If not, what kind of >> analysis would you suggest? >> >> Thanks a lot for your great help, I really appreciate it. >> >> All the best >> Philipp >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Terry Therneau
2009-May-06 12:03 UTC
[R] Cox Proportional Hazard with missing covariate data
I concur with your other reply - you have left-trunctated data. Left-truncated: Any parts that failed before 2001 (start of study) are not included in the study. Left-censored: Parts that failed before the study start are included, but with an indeterminate failure date ("< 20001"). The Cox model deals with left truncated data easily, but not with left censored. In the Cox model all that matters are covariate values at the time that an event occurs, so an unknown past is not an issue. Terry Therneau
Reasonably Related Threads
- testing proportional hazard in a Cox model including a time-varying covariate
- Cox Proportional Hazards model with a time-varying covariate
- Nesting in Cox proportional hazards survivorship analysis
- Kernel:[Hardware Error]: use of vacuum
- AFT-model with time-varying covariates and left-truncation