Mattia Prosperi
2010-Nov-02 19:57 UTC
[R] multi-level cox ph with time-dependent covariates
Dear all, I would like to know if it is possible to fit in R a Cox ph model with time-dependent covariates and to account for hierarchical effects at the same time. Additionally, I'd like also to know if it would be possible to perform any feature selection on this model fit. I have a data set that is composed by multiple marker measurements (and hundreds of covariates) at different time points from different tissue samples of different patients. Suppose that the data were coming from animal model with very few subjects (n=6) that were followed up given a pathogen exposure, measured several times, sampling different tissues in the same days, until a certain outcome was reached (or outcome censored). Suppose that the pathogen can vary over time (might be a bacteria that selects for drug-resistance) and that also it can vary across different tissue reservoirs within the same patient. In other words: names(data) = patient_id, start_time, stop_time, tissue_id, pathogen_type, marker1, ..., marker100, ..., outcome If I had multiple observations per patient at different time intervals, I would model it like this (hope it is correct) model<-coxph(Surv(start_time,stop_time,outcome)~all_covariates+cluster(patient_id)) But now I have both the patient and the tissue, and hundreds of different variables. I thought I could use the coxme library, since it has also a ridge regression feature. Shall I then model nested random effects by considering both the patient_id and the tissue_id? Like model<-coxme(Surv(start_time,stop_time,outcome) ~ covariates + (1 | patient_id/tissue_id)) Then, how could I shrink the coefficients in order to select a subset of them with non-neglegible effects? May I also consider the possibility to run an AIC-based forward-backward selection? thanks and apologies if I am completely out of the trails, M.P.
Mattia Prosperi
2010-Nov-03 20:38 UTC
[R] multi-level cox ph with time-dependent covariates
Dear all, I would like to know if it is possible to fit in R a Cox ph model with time-dependent covariates and to account for hierarchical effects at the same time. Additionally, I'd like also to know if it would be possible to perform any feature selection on this model fit. I have a data set that is composed by multiple marker measurements (and hundreds of covariates) at different time points from different tissue samples of different patients. Suppose that the data were coming from animal model with very few subjects (n=6) that were followed up given a pathogen exposure, measured several times, sampling different tissues in the same days, until a certain outcome was reached (or outcome censored). Suppose that the pathogen can vary over time (might be a bacteria that selects for drug-resistance) and that also it can vary across different tissue reservoirs within the same patient. In other words: names(data) = patient_id, start_time, stop_time, tissue_id, pathogen_type, marker1, ..., marker100, ..., outcome If I had multiple observations per patient at different time intervals, I would model it like this (hope it is correct) model<-coxph(Surv(start_time,stop_time,outcome)~all_covariates+cluster(patient_id)) But now I have both the patient and the tissue, and hundreds of different variables. I thought I could use the coxme library, since it has also a ridge regression feature. Shall I then model nested random effects by considering both the patient_id and the tissue_id? Like model<-coxme(Surv(start_time,stop_time,outcome) ~ covariates + (1 | patient_id/tissue_id)) Then, how could I shrink the coefficients in order to select a subset of them with non-neglegible effects? May I also consider the possibility to run an AIC-based forward-backward selection? thanks and apologies if I am completely out of the trails, M.P.
Terry Therneau
2010-Nov-04 13:26 UTC
[R] multi-level cox ph with time-dependent covariates
Your question has two levels: 1. What is the right model for this data 2. Can model __ be fit Wrt 2 and coxme: For a reliable fit you need to have more events than random effects. Thus for patient/tissue I would want to see multiple events per patient/tissue pair. This is statistical issue -- when there are too few events the confidence intervals for the random effects end up being a mile wide. (Exception, if the number of events is very large, >10^5 say as sometimes occurs in economics studies, the estimates can work.) coxme works fine with start,stop data. Wrt question 1. Your models assume that marker1, marker2, ... each have the same effect across tissue types. Adding a random effect gave per subject or per subject/tissue intercepts. Do you instead want to do shrinkage of the marker1, .. coefficients? Terry Therneau