Bonnett, Laura
2018-May-24 15:56 UTC
[R] Predictions from a Cox model - understanding centering of binary/categorical variables
Dear all, I am using R 3.4.3 on Windows 10. I am preparing some teaching materials and I'm having trouble matching the by-hand version with the R code. I have fitted a Cox model - let's use the ovarian data as an example: library(survival) data(ovarian) ova_mod <- coxph(Surv(futime,fustat)~age+rx,data=ovarian) If I want to make predict survival for a new set of individuals at 100 days then that is trivial using predict.coxph e.g.: newdata <- data.frame(futime=rep(100,5),fustat=rep(1,5),age=c(45,50,55,60,65),rx=c(1,2,1,2,1)) preds <- predict(ova_mod,newdata,type="expected") survpreds <- exp(-1*preds) survpreds [1] 0.9967635 0.9969739 0.9859543 0.9868629 0.9401437 However, due to centering I believe, I am finding this a bit difficult to replicate by hand. To replicate the analysis I need the baseline survival at 100 days, the regression coefficients, and the mean/proportions. Baseline survival at 100 days: summary(survfit(ova_mod),time=100)$surv = 0.9888 Regression coefficient: ova_mod$coef = 0.1473 (age) & -0.8040 Mean age: mean(ovarian$age) = 56.1654 Proportions with rx: prop.table(table(ovarian$rx)) = 0.5 0.5 So, to recreate the predicted survival probability I would work out the linear predictor for my first individual: LP1 = ((45-56.1654)*0.1437)+((1*0.5)*-0.8040) = -2.0470 However, predict(ova_mod,newdata,type="lp") suggests that it should be -1.2430. Therefore have I misinterpreted the centering? I.e that we take off the mean value from continuous variables, and multiply by the proportion with that response for binary/categorical variables? Then, assuming the I have the correct LP of -1.2430, I need to raise the baseline survival estimate at 100 days to the exp(LP1). This gives 0.9968 which is almost the predicted value of 0.9968 of above. However, I need to get the linear predictors to agree with the output first! Many thanks for your help with this. Kind regards, Laura [[alternative HTML version deleted]]
Bert Gunter
2018-May-24 19:15 UTC
[R] Predictions from a Cox model - understanding centering of binary/categorical variables
Generally speaking, primarily statistical issues, which this appears to be, are OT on R-help, which is concerned with R programming issues (although they do someties intersect). You might therefore do better on a statistical forum such as stats.stackexchange.com (especially if you do not get a satisfactory response here). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, May 24, 2018 at 8:56 AM, Bonnett, Laura <L.J.Bonnett at liverpool.ac.uk> wrote:> Dear all, > > I am using R 3.4.3 on Windows 10. I am preparing some teaching materials > and I'm having trouble matching the by-hand version with the R code. > > I have fitted a Cox model - let's use the ovarian data as an example: > library(survival) > data(ovarian) > ova_mod <- coxph(Surv(futime,fustat)~age+rx,data=ovarian) > > If I want to make predict survival for a new set of individuals at 100 > days then that is trivial using predict.coxph e.g.: > newdata <- data.frame(futime=rep(100,5),fustat=rep(1,5),age=c(45,50, > 55,60,65),rx=c(1,2,1,2,1)) > preds <- predict(ova_mod,newdata,type="expected") > survpreds <- exp(-1*preds) > survpreds > [1] 0.9967635 0.9969739 0.9859543 0.9868629 0.9401437 > > However, due to centering I believe, I am finding this a bit difficult to > replicate by hand. > > To replicate the analysis I need the baseline survival at 100 days, the > regression coefficients, and the mean/proportions. > Baseline survival at 100 days: summary(survfit(ova_mod),time=100)$surv > 0.9888 > Regression coefficient: ova_mod$coef = 0.1473 (age) & -0.8040 > Mean age: mean(ovarian$age) = 56.1654 > Proportions with rx: prop.table(table(ovarian$rx)) = 0.5 0.5 > > So, to recreate the predicted survival probability I would work out the > linear predictor for my first individual: > LP1 = ((45-56.1654)*0.1437)+((1*0.5)*-0.8040) = -2.0470 > However, predict(ova_mod,newdata,type="lp") suggests that it should be > -1.2430. Therefore have I misinterpreted the centering? I.e that we take > off the mean value from continuous variables, and multiply by the > proportion with that response for binary/categorical variables? > > Then, assuming the I have the correct LP of -1.2430, I need to raise the > baseline survival estimate at 100 days to the exp(LP1). This gives 0.9968 > which is almost the predicted value of 0.9968 of above. However, I need to > get the linear predictors to agree with the output first! > > Many thanks for your help with this. > > Kind regards, > Laura > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]