Therneau, Terry M., Ph.D.
2016-Jan-21 15:01 UTC
[R] Survival::coxph (clogit), survConcordance vs. summary(fit) concordance
I read the digest form which puts me behind, plus the last 2 days have been solid meetings with an external advisory group so I missed the initial query. Three responses. 1. The clogit routine sets the data up properly and then calls a stratified Cox model. If you want the survConcordance routine to give the same answer, it also needs to know about the strata survConcordance (Surv(rep(1, 76L), resp) ~ predict(fit) + strata(ID), data=dat) I'm not surprised that you get a very different answer with/without strata. 2. I've never thought of using a robust variance for the matched case/control model. I'm having a hard time wrapping my head around what you would expect that to accomplish (statistically). Subjects are already matched on someone from the same site, so where does a per-site effect creep in? Assuming there is a good reason and I just don't see it (not an unwarranted assumption), I'm not aware of any work on what an appropriate variance would be for the concordance in that case. 3. I need to think about the large variance issue. Terry Therneau On 01/20/2016 08:09 PM, r-help-request at r-project.org wrote:> Hi, > > I'm running conditional logistic regression with survival::clogit. I have > "1-1 case-control" data, i.e., there is 1 case and 1 control in each strata. > > Model: > fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron", > data = dat) > Where resp is 1's and 0's, and x1 and x2 are both continuous. > > Predictors are both significant. A snippet of summary(fit): > Concordance= 0.763 (se = 0.5 ) > Rsquare= 0.304 (max possible= 0.5 ) > Likelihood ratio test= 27.54 on 2 df, p=1.047e-06 > Wald test = 17.19 on 2 df, p=0.0001853 > Score (logrank) test = 17.43 on 2 df, p=0.0001644, Robust = 6.66 > p=0.03574 > > The concordance estimate seems good but the SE is HUGE. > > I get a very different estimate from the survConcordance function, which I > know says computes concordance for a "single continuous covariate", but it > runs on my model with 2 continuous covariates.... > > survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat) > n= 76 > Concordance= 0.9106648 se= 0.09365047 > concordant discordant tied.risk tied.time std(c-d) > 1315.0000 129.0000 0.0000 703.0000 270.4626 > > Are both of these concordance estimates valid but providing different > information? > Is one more appropriate for measuring "performance" (in the AUC sense) of > conditional logistic models? > Is it possible that the HUGE SE estimate represents a convergence problem > (no warnings were thrown when fit the model), or is this model just useless? > > Thanks!
Joe Ceradini
2016-Jan-21 15:29 UTC
[R] Survival::coxph (clogit), survConcordance vs. summary(fit) concordance
Thanks Terry! I thought that since I was providing survConcordance with the model object that the same formula would be applied. But I was obviously wrong. I just ran survConcordance with the addition of the strata argument, as you suggested, and got the same answer as summary(fit)....with the same scary SE. This is a wildlife habitat selection analysis. Each individual animal has habitat features that they used (1) and habitat that was available but that they did not use (0). The habitat that is available is different for each individual, hence the need for strata(ID of individual). However, all the habitat data are collected from multiple discrete sites and each site has multiple individuals on it. For all these analyses of these data, I've assumed that individuals within a site may be more correlated than individuals between sites, hence addition of cluster(site). I was able recalculate the same concordance estimate as summary(fit) by estimating predicted probabilities using: risk <- predict(fit, type='risk') risk / (1+risk) And then used a probability cut-off of 0.5 for whether an observed point was correctly classified, which returned the same 0.76 as the concordance estimate. So, can I just think of this concordance as a classification table (or confusion matrix) with a 0.5 threshold (thus classification error would be (1 - 0.76)? Was I mistaken in thinking concordance was more akin to AUC in unconditional logistic regression? Thanks. Joe On Thu, Jan 21, 2016 at 8:01 AM, Therneau, Terry M., Ph.D. < therneau at mayo.edu> wrote:> I read the digest form which puts me behind, plus the last 2 days have > been solid meetings with an external advisory group so I missed the initial > query. Three responses. > > 1. The clogit routine sets the data up properly and then calls a > stratified Cox model. If you want the survConcordance routine to give the > same answer, it also needs to know about the strata > survConcordance (Surv(rep(1, 76L), resp) ~ predict(fit) + strata(ID), > data=dat) > I'm not surprised that you get a very different answer with/without strata. > > 2. I've never thought of using a robust variance for the matched > case/control model. I'm having a hard time wrapping my head around what > you would expect that to accomplish (statistically). Subjects are already > matched on someone from the same site, so where does a per-site effect > creep in? Assuming there is a good reason and I just don't see it (not an > unwarranted assumption), I'm not aware of any work on what an appropriate > variance would be for the concordance in that case. > > 3. I need to think about the large variance issue. > > Terry Therneau > > > > On 01/20/2016 08:09 PM, r-help-request at r-project.org wrote: > >> Hi, >> >> I'm running conditional logistic regression with survival::clogit. I have >> "1-1 case-control" data, i.e., there is 1 case and 1 control in each >> strata. >> >> Model: >> fit <- clogit(resp ~ x1 + x2, strata(ID), cluster(site), method ="efron", >> data = dat) >> Where resp is 1's and 0's, and x1 and x2 are both continuous. >> >> Predictors are both significant. A snippet of summary(fit): >> Concordance= 0.763 (se = 0.5 ) >> Rsquare= 0.304 (max possible= 0.5 ) >> Likelihood ratio test= 27.54 on 2 df, p=1.047e-06 >> Wald test = 17.19 on 2 df, p=0.0001853 >> Score (logrank) test = 17.43 on 2 df, p=0.0001644, Robust = 6.66 >> p=0.03574 >> >> The concordance estimate seems good but the SE is HUGE. >> >> I get a very different estimate from the survConcordance function, which I >> know says computes concordance for a "single continuous covariate", but it >> runs on my model with 2 continuous covariates.... >> >> survConcordance(Surv(rep(1, 76L), resp) ~ predict(fit), dat) >> n= 76 >> Concordance= 0.9106648 se= 0.09365047 >> concordant discordant tied.risk tied.time std(c-d) >> 1315.0000 129.0000 0.0000 703.0000 270.4626 >> >> Are both of these concordance estimates valid but providing different >> information? >> Is one more appropriate for measuring "performance" (in the AUC sense) of >> conditional logistic models? >> Is it possible that the HUGE SE estimate represents a convergence problem >> (no warnings were thrown when fit the model), or is this model just >> useless? >> >> Thanks! >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Cooperative Fish and Wildlife Research Unit Zoology and Physiology Dept. University of Wyoming JoeCeradini at gmail.com / 914.707.8506 wyocoopunit.org [[alternative HTML version deleted]]