Christopher David Desjardins
2010-Aug-04 17:22 UTC
[R] Question regarding significance of a covariate in a coxme survival model
Hi, I am running a Cox Mixed Effects Hazard model using the library coxme. I am trying to model time to onset (age_sym1) of thought problems (e.g. hearing voices) (sym1). As I have siblings in my dataset, I have decided to account for this by including a random effect for family (famid). My covariate of interest is Mother's diagnosis where a 0 is bipolar, 1 is control, and 2 is major depression. I am trying to fit the following model. thorp1 <- coxme(Surv(age_sym1, sym1) ~ lifedxm + (1 | famid), data = bip.surv) Which provides the following output: ------------------------------------------------- > thorp1 Cox mixed-effects model fit by maximum likelihood Data: bip.surv events, n = 99, 189 (3 observations deleted due to missingness) Iterations= 10 54 NULL Integrated Penalized Log-likelihood -479.0372 -467.3549 -435.2096 Chisq df p AIC BIC Integrated loglik 23.36 3.00 3.3897e-05 17.36 9.58 Penalized loglik 87.66 47.27 3.2374e-04 -6.88 -129.54 Model: Surv(age_sym1, sym1) ~ lifedxm + (1 | famid) Fixed coefficients coef exp(coef) se(coef) z p lifedxmCONTROL -1.0576257 0.3472794 0.3872527 -2.73 0.0063 lifedxmMAJOR -0.6329957 0.5309987 0.3460847 -1.83 0.0670 Random effects Group Variable Std Dev Variance famid Intercept 0.9877770 0.9757033 -------------------------------------------------- So it appears that there is a difference in hazard rates associated with Mother's diagnosis between BIPOLAR and CONTROL. Just to be safe, I fit a model without this covariate present and decided to compare the models with AIC and see if fit decreased with this covariate not in the model. ---------------------------------------------------------- thorp1.n <- coxme(Surv(age_sym1, sym1) ~ (1 | famid), data = bip.surv) > thorp1.n Cox mixed-effects model fit by maximum likelihood Data: bip.surv events, n = 99, 189 (3 observations deleted due to missingness) Iterations= 10 54 NULL Integrated Penalized Log-likelihood -479.0372 -471.3333 -436.0478 Chisq df p AIC BIC Integrated loglik 15.41 1.00 0.00008663 13.41 10.81 Penalized loglik 85.98 49.48 0.00099915 -12.97 -141.37 Model: Surv(age_sym1, sym1) ~ (1 | famid) Random effects Group Variable Std Dev Variance famid Intercept 1.050949 1.104495 ------------------------------------------------------------ The problem is that the AIC for the model w/o lifedxm is 13.41 and the model w/ lifedxm is 17.36. So fit actually improved without lifedxm in the model but really the fit is indistinguishable if I use Burnham & Anderson, 2002's criteria. So my conundrum is this - The AIC and the p-values appear to disagree. The p-value would suggest that lifedxm should be included in the model and the AIC says it doesn't improve fit. In general, I tend to favor the AIC (I usually work with large sample sizes and multilevel models) but I am just beginning with survival models and I am curious if a survival model guru might suggest whether lifedxm needs to be in the model or not based on my results here? Would people generally use the AIC in this situation? Also, I can't run anova() on this models because of the random effect. I am happy to provide the data if necessary. Please cc me on a reply. Thanks, Chris
Teresa Iglesias
2010-Aug-27 20:32 UTC
[R] Question regarding significance of a covariate in a coxme survival model
Christopher David Desjardins <desja004 <at> umn.edu> writes:> > Hi, > I am running a Cox Mixed Effects Hazard model using the library coxme. I > am trying to model time to onset (age_sym1) of thought problems (e.g. > hearing voices) (sym1). As I have siblings in my dataset, I have > decided to account for this by including a random effect for family > (famid). My covariate of interest is Mother's diagnosis where a 0 is > bipolar, 1 is control, and 2 is major depression. I am trying to fit > the following model. > > thorp1 <- coxme(Surv(age_sym1, sym1) ~ lifedxm + (1 | famid), data = > bip.surv) > > Which provides the following output: > > ------------------------------------------------- > > thorp1 > Cox mixed-effects model fit by maximum likelihood > Data: bip.surv > events, n = 99, 189 (3 observations deleted due to missingness) > Iterations= 10 54 > NULL Integrated Penalized > Log-likelihood -479.0372 -467.3549 -435.2096 > Chisq df p AIC BIC > Integrated loglik 23.36 3.00 3.3897e-05 17.36 9.58 > Penalized loglik 87.66 47.27 3.2374e-04 -6.88 -129.54 > Model: Surv(age_sym1, sym1) ~ lifedxm + (1 | famid) > Fixed coefficients > coef exp(coef) se(coef) z p > lifedxmCONTROL -1.0576257 0.3472794 0.3872527 -2.73 0.0063 > lifedxmMAJOR -0.6329957 0.5309987 0.3460847 -1.83 0.0670 > Random effects > Group Variable Std Dev Variance > famid Intercept 0.9877770 0.9757033 > > -------------------------------------------------- > > So it appears that there is a difference in hazard rates associated with > Mother's diagnosis between BIPOLAR and CONTROL. Just to be safe, I fit a > model without this covariate present and decided to compare the models > with AIC and see if fit decreased with this covariate not in the model. > > ---------------------------------------------------------- > thorp1.n <- coxme(Surv(age_sym1, sym1) ~ (1 | famid), data = bip.surv) > > thorp1.n > Cox mixed-effects model fit by maximum likelihood > Data: bip.surv > events, n = 99, 189 (3 observations deleted due to missingness) > Iterations= 10 54 > NULL Integrated Penalized > Log-likelihood -479.0372 -471.3333 -436.0478 > Chisq df p AIC BIC > Integrated loglik 15.41 1.00 0.00008663 13.41 10.81 > Penalized loglik 85.98 49.48 0.00099915 -12.97 -141.37 > Model: Surv(age_sym1, sym1) ~ (1 | famid) > Random effects > Group Variable Std Dev Variance > famid Intercept 1.050949 1.104495 > ------------------------------------------------------------ > > The problem is that the AIC for the model w/o lifedxm is 13.41 and the > model w/ lifedxm is 17.36. So fit actually improved without lifedxm in > the model but really the fit is indistinguishable if I use Burnham & > Anderson, 2002's criteria. > > So my conundrum is this - The AIC and the p-values appear to disagree. > The p-value would suggest that lifedxm should be included in the model > and the AIC says it doesn't improve fit. In general, I tend to favor the > AIC (I usually work with large sample sizes and multilevel models) but I > am just beginning with survival models and I am curious if a survival > model guru might suggest whether lifedxm needs to be in the model or not > based on my results here? Would people generally use the AIC in this > situation? Also, I can't run anova() on this models because of the > random effect. > > I am happy to provide the data if necessary. > > Please cc me on a reply. > > Thanks, > Chris > >Hi Chris, Did you get an answer to why the p-value and AIC score disagreed? I am getting the same results with my own data. They're not small disagreements either. The AIC score is telling me the opposite of what the p-value and the parameter coef are saying. The p-value and the coef for the predictor variable are in agreement. I've also noticed in some published papers with tables containing p-values and AIC scores that the chisq p-value and AIC score are in opposition too. This makes me think I'm missing something and that this all actually makes sense. However given that AIC = ? 2 (log L) + 2K (where K is the number of parameters) and seeing as how the log-likelihood scores below are in the hundreds, shouldn't the AIC score be in the hundreds as well?? -------------------------------------- model0 (intercept only model) NULL Integrated Penalized Log-likelihood -119.8470 -117.9749 -113.1215 Chisq df p AIC BIC Integrated loglik 3.74 1.00 0.052989 1.74 0.08 Penalized loglik 13.45 7.06 0.063794 -0.68 -12.43 Random effects Group Variable Std Dev Variance Site Intercept 0.6399200 0.4094977 _____ model1 (before vs after) NULL Integrated Penalized Log-likelihood -119.8470 -112.1598 -108.1663 Chisq df p AIC BIC Integrated loglik 15.37 2.00 0.00045863 11.37 8.05 Penalized loglik 23.36 7.06 0.00153710 9.25 -2.49 Fixed coefficients Coef exp(coef) se(coef) z p Time -1.329997 0.2644779 0.4009229 -3.32 0.00091 Random effects Group Variable Std Dev Variance Site Intercept 0.5625206 0.3164294 ______ model2 (stim1 vs stim2) NULL Integrated Penalized Log-likelihood -119.8470 -117.8677 -113.167 Chisq df p AIC BIC Integrated loglik 3.96 2.00 0.138160 -0.04 -3.37 Penalized loglik 13.36 7.83 0.093314 -2.31 -15.34 Fixed coefficients coef exp(coef) se(coef) z p stim 0.1621367 1.176021 0.3534788 0.46 0.65 Random effects Group Variable Std Dev Variance Site Intercept 0.6262600 0.3922016 ---------------------------------------------- If you didn't get an answer, hopefully, someone that understands all this will reply for both of us. Thanks, -Teresa
Cheng Peng
2010-Aug-29 13:59 UTC
[R] Question regarding significance of a covariate in a coxme survival model
The likelihood ratio test is more reliable when one model is nested in the other. This true for your case. AIC/SBC are usually used when two models are in a hiearchical structure. Please also note that any decision made made based on AIC/SBC scores are very subjective since no sampling distribution can be used to make a "rigorous" decision. regarding the magnitutes between the loglikelihood and AIC/SBC, I would say the author must used a modified version in coxme() since several different modified AIC/SBC scores are running in practice. My suggestion would be to use LR test for your case: For the integrated likelihhod: LL.small.model = - 467.3549 (including lifedxm) LL.large.model = - 471.3333 (excluding lifedxm) DF.diff = 3 - 1 = 2 LR: -2*(- 471.3333 + 467.3549) = 7.9568 p-value: 1-pchisq(7.9568,2) = 0.01871556 For the penalized likelihhod: LPL.small.model = -435.2096 (including lifedxm) LPL.large.model = -436.0478 (excluding lifedxm) DF.diff = 3 - 1 = 2 PLR: -2*(- 436.0478 + 435.2096 ) = 1.6764 p-value: 1-pchisq(1.6764,2) = 0.4324883 Two different likehood methods produce different results, which one you should use depends on which likelihood makes more sense to you (or which likehood is better). HTH -- View this message in context: http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399090.html Sent from the R help mailing list archive at Nabble.com.
Cheng Peng
2010-Aug-29 14:05 UTC
[R] Question regarding significance of a covariate in a coxme survival model
My suggestion: If compare model 1 and model 2 with model 0 respectively, the (penalized) likelihood ratio test is valid. IF you compare model 2 with model 3, the (penalized) likelihood ratio test is invalid. You may want to use AIC/SBC to make a subjective decision. -- View this message in context: http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399095.html Sent from the R help mailing list archive at Nabble.com.
C. Peng
2010-Aug-29 14:44 UTC
[R] Question regarding significance of a covariate in a coxme survival model
The likelihood ratio test is more reliable when one model is nested in the other. This true for your case. AIC/SBC are usually used when two models are in a hiearchical structure. Please also note that any decision made made based on AIC/SBC scores are very subjective since no sampling distribution can be used to make a "rigorous" decision. Regarding the magnitutes between the loglikelihood and AIC/SBC, I would say the author must used a modified version in coxme() since several different modified AIC/SBC scores are running in practice. My suggestion would be to use LR test for your case: For the integrated likelihhod: LL.small.model = - 467.3549 (including lifedxm) LL.large.model = - 471.3333 (excluding lifedxm) DF.diff = 3 - 1 = 2 LR: -2*(- 471.3333 + 467.3549) = 7.9568 p-value: 1-pchisq(7.9568,2) = 0.01871556 For the penalized likelihhod: LPL.small.model = -435.2096 (including lifedxm) LPL.large.model = -436.0478 (excluding lifedxm) DF.diff = 3 - 1 = 2 PLR: -2*(- 436.0478 + 435.2096 ) = 1.6764 p-value: 1-pchisq(1.6764,2) = 0.4324883 Two different likehood methods produce different results, which one you should use depends on which likelihood makes more sense to you (or which likehood is better). HTH -- View this message in context: http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399114.html Sent from the R help mailing list archive at Nabble.com.
C. Peng
2010-Aug-29 14:44 UTC
[R] Question regarding significance of a covariate in a coxme survival model
My suggestion for Teresa: If compare model 1 and model 2 with model 0 respectively, the (penalized) likelihood ratio test is valid. IF you compare model 2 with model 3, the (penalized) likelihood ratio test is invalid. You may want to use AIC/SBC to make a subjective decision. -- View this message in context: http://r.789695.n4.nabble.com/Question-regarding-significance-of-a-covariate-in-a-coxme-survival-model-tp2313880p2399116.html Sent from the R help mailing list archive at Nabble.com.