Joe Ceradini
2016-Apr-26 18:04 UTC
[R] survival::clogit, how to extract residuals for GOF assessment
Hi Folks, Hopefully this question has enough R and not too much stats to be appropriate for this list. Based on,* Hosmer et al. 2013. Logistic regression for matched case-control studies. Applied Logistic Regression *(eqtn. 7.8)*, *I am assessing GOF of conditional (or matched) logistic regression models with the *standardized Pearson residuals*. The authors define ?large? as delta chi-squared values > 4. For a 1:1 study, I can fit a conditional logistic model via an unconditional routine by making the response variable all 1?s, taking the difference of the covariate values for each pair, and removing the intercept. I can then extract the standardized residuals from the model (see code at bottom for example). However, if I want to fit a 1:many model, I need to use survival::clogit, which is where my question comes in: How can I apply this same "test" to a clogit model? Which residuals should I be extracting? Or, is this not an option for a clogit model? ## The default residuals of coxph in R are the martingale residuals. ## resid(fit1,type=c("martingale", "deviance", "score", "schoenfeld", ## "dfbeta", "dfbetas", "scaledsch","partial")) R code below shows equivalence between clogit and binomial GLM fit on the differences (note: these would not be equivalent if used a "cluster" argument in clogit), and GOF "test" for binomial GLM fit on the differences. I would like to assess GOF of the fit.clogit model but do not know how. require(survival) require(plyr) # Dataframe dat.full <- structure(list(resp = c(1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L), x1 = c(3.92, 2.04, 2.27, 9.25, 6.13, 10.44, 5.09, 1.27, 5.9, 2.88, 3.79, 1.46, 3.35, 3.82, 4.28, 6.52, 3.54, 3.46, 0.46, 1.68, 3.22, 1.64, -0.12, 2.78, 2.58, 2, 2.83, 3.58, 1.45, 1.64, 2.89, 3.12, 5.6, 8.29, 3.42, 4.8, 3.04, 4.33, 5.31, 1.78, 8.18, 4.56, 4.85, 7.99, 7.52, 6.85, 7.64, 3.33, 5.17, 4.62, 1.24, 2.54, 3.08, 8.2, 1.81, 2.78, 2.16, 2.76, 3.45, 3.43), ID = c(10L, 10L, 11L, 11L, 13L, 13L, 17L, 17L, 18L, 18L, 23L, 23L, 25L, 25L, 29L, 29L, 31L, 31L, 33L, 33L, 35L, 35L, 38L, 38L, 39L, 39L, 4L, 4L, 41L, 41L, 43L, 43L, 45L, 45L, 46L, 46L, 49L, 49L, 50L, 50L, 54L, 54L, 55L, 55L, 56L, 56L, 57L, 57L, 59L, 59L, 6L, 6L, 60L, 60L, 7L, 7L, 8L, 8L, 9L, 9L)), .Names = c("resp", "x1", "ID"), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 21L, 22L, 23L, 24L, 25L, 26L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 49L, 50L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 71L, 72L, 73L, 74L, 75L, 76L), class = "data.frame") # fit clogit fit.clogit <- clogit(resp ~ x1 + strata(ID), method = "efron", data dat.full) summary(fit.clogit) # Make dataframe so can fit on differences with unconditional routine. # x1 where resp = 1 minus x1 where resp = 0, grouped by ID dat.diff <- ddply(dat.full, .(ID), summarise, x1.diff = x1[resp == 1] - x1[resp == 0]) dat.diff$resp <- 1 # response variable is all 1's # Fit on differences and 1's, remove intercept fit.diff <- glm(resp ~ -1 + x1.diff, family = binomial, data = dat.diff) summary(fit.diff) summary(fit.clogit)$coefficients # GOF: delta chi-squared plot(fit.diff$fitted.values, rstandard(fit.diff)^2) round(sort(rstandard(fit.diff)^2), 4) Thanks! Joe [[alternative HTML version deleted]]