Therneau, Terry M., Ph.D.
2014-Aug-13 14:19 UTC
[R] Cox regression model for matched data with replacement
On 08/13/2014 08:38 AM, John Pura wrote:> Thank you for the reply. However, I think I may not have clarified what my cases are. I'm studying the effect of radiation treatment (vs. none) on survival. My cases are patients who received radiation and controls are those who did not. I used a propensity score model to match cases to controls in a 1:2 fashion. However, because the matching was done with replacement, some controls were matched to more than one case. How can I go about analyzing this - would frequency weighting work? > > Thanks, > JohnWe went down the wrong path. When people use the word "case" it almost always refers to "subjects who had the outcome". If I read the above correctly you have the more simple situation of subset selection. Subjects were chosen to be in the model without reference to their outcome status, with the goal of balancing treatment wrt other predictive factors. Correct? If so, my preferred modeling strategy, in order. 1. coxph(Surv(time, status) ~ treatment, data=one) Where data set "one" has one copy of each subject selected to be in the study. If they were nominated twice they still appear once. Optional: give each control a case weight equal to the number of times they were selected. This will better balance the data set wrt the factors. 2. Same model, with covariates. The argument about whether covariates on which you have balanced should be included in the model is as old the hills --- "belt AND suspenders?" --- with proponents on both sides. Meh. Unless there are too many of course. I still like the 10-20 events per covarate rule to choose the maximum number of predictors. 3. coxph(Surv(time, status) ~ treatment + strata(group), data=two) I veiw this as model 2 with paranoia. "The covariate effects are so odd that we'll never model them correctly, so treat each combination as unique." The data set two needs to have each treated subject + their controls in a separate stratum. Even though some controls are in the data set twice, they don't need case weights since they are in any given stratum only once. For any of the above you can add a robust variance. Required if case weights are used. Terry T