vikkiyft
2011-Feb-21 04:46 UTC
[R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph
Dear R-help, I am having a problem with the interpretation of result from validate.cph in the Design package. My purpose is to fit a cox model and validate the Somer's Dxy. I used the hypothetical data given in the help manual with modification to the cox model fit. My research problem is very similar to this example. This is the model without stratification:> library(Design) > f1 <- cph(S ~ age, x=TRUE, y=TRUE) > coef(f)age 0.0440095> set.seed(1) > validate(f1, B=10, dxy=T)index.orig training test optimism index.corrected n Dxy -0.3376784858 -0.3287537760 -0.3376784858 0.0089247099 -0.34660320 10 R2 0.0627722521 0.0636136044 0.0627722521 0.0008413523 0.06193090 10 Slope 1.0000000000 1.0000000000 0.9896987441 0.0103012559 0.98969874 10 D 0.0237993965 0.0239476118 0.0237993965 0.0001482153 0.02365118 10 U -0.0008208378 -0.0008141441 0.0007104737 -0.0015246178 0.00070378 10 Q 0.0246202342 0.0247617559 0.0230889228 0.0016728331 0.02294740 10 But if I fit a stratified cox model to the same data, the result becomes:> f2<- cph(S ~ age + strat(sex), x=TRUE, y=TRUE, surv=TRUE, time.inc=2) > coef(f)age 0.04271953> set.seed(1) > validate(f2, dxy=TRUE, u=2, B=10)index.orig training test optimism index.corrected n Dxy 0.3514778665 0.3259011492 0.3044982080 0.0214029412 0.3300749254 10 R2 0.0622369082 0.0651967502 0.0622369082 0.0029598419 0.0592770663 10 Slope 1.0000000000 1.0000000000 0.9621830568 0.0378169432 0.9621830568 10 D 0.0257519780 0.0267239073 0.0257519780 0.0009719293 0.0247800486 10 U -0.0009257142 -0.0009125388 0.0009102968 -0.0018228356 0.0008971213 10 Q 0.0266776922 0.0276364461 0.0248416812 0.0027947649 0.0238829273 10 The coefficients are similar between the models, so I expect the results would be somewhat similar, yet the two models give totally contrasting Dxy. I reckon a negative Dxy value is normal in the sense that the survival time and the prediction are concordant, but why does the result become discordant when stratification is used? Is there something wrong or is there a sensible interpretation for this? This problem is very critical to me because the Design package is the only one I can use for my purpose. Any advice is greatly appreciated. Best regards, Vikki -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3316820.html Sent from the R help mailing list archive at Nabble.com.
Frank Harrell
2011-Feb-21 12:46 UTC
[R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph
Vikky, You'll notice that the model containing sex in addition to age has a higher apparent Dxy as you would expect [R^2 is not higher because it captures only the age effect]. The validated Dxy's may be as they are because of the very low number of bootstrap resamples (10) that you used. Please following the posting guide, i.e., include a simple self-reproducing script that I can run - one that simulates its own dataset. Please switch to the rms package is this is the one that is being fully supported. See http://biostat.mc.vanderbilt.edu/Rrms. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3317265.html Sent from the R help mailing list archive at Nabble.com.
vikkiyft
2011-Feb-24 17:29 UTC
[R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph
Dear Prof Frank, I tried to simulate an example data set as close as possible to my own real data with the codes below. There are only two covariates, tumor(3 levels) and ecog(3 levels). "rx" is treatment (4 levels). Validation with the stratified model (by rx) had a negative R2.. and the R2 under the training column was so high...? n<-1000 set.seed(110222) data<-matrix(rep(0,5000),ncol=5) data[,1]<-sample(1:3,n,rep=TRUE,prob=c(.32, .30, .38)) for (i in 1:1000){ if (data[i,1]==1) data[i,2]<-sample(1:3,1,prob=c(.76, .18, .06)) if (data[i,1]==2) data[i,2]<-sample(1:3,1,prob=c(.67, .24, .09)) if (data[i,1]==3) data[i,2]<-sample(1:3,1,prob=c(.47, .37, .16))} for (i in 1:1000){ if (data[i,1]==1) data[i,3]<-sample(1:4,1,prob=c(.70, .19, .03, .08)) if (data[i,1]==2) data[i,3]<-sample(1:4,1,prob=c(.42, .28, .12, .18)) if (data[i,1]==3) data[i,3]<-sample(1:4,1,prob=c(.11, .29, .30, .30))} for (i in 1:1000){ if (data[i,3]==1) data[i,4]<-12*rgamma(1000,rate=0.4,shape=1.7)[c(sample(26:975,1,prob=c(rep(1/950,950))))] if (data[i,3]==2) data[i,4]<-12*rgamma(1000,rate=0.9,shape=1.7)[c(sample(26:975,1,prob=c(rep(1/950,950))))] if (data[i,3]==3) data[i,4]<-12*rgamma(1000,rate=1.2,shape=0.6)[c(sample(26:975,1,prob=c(rep(1/950,950))))] if (data[i,3]==4) data[i,4]<-12*rgamma(1000,rate=1.5,shape=0.7)[c(sample(26:975,1,prob=c(rep(1/950,950))))]} for (i in 1:1000){ if (data[i,3]==1) data[i,5]<-sample(c(0,1),1,prob=c(.53, .47)) if (data[i,3]==2) data[i,5]<-sample(c(0,1),1,prob=c(.17, .83)) if (data[i,3]==3) data[i,5]<-sample(c(0,1),1,prob=c(.05, .95)) if (data[i,3]==4) data[i,5]<-sample(c(0,1),1,prob=c(.06, .94))} colnames(data)<-c("tumor","ecog","rx","os","censor") data<-data.frame(data) attach(data) library(rms) surv.obj<-Surv(os,censor) validate(cph(surv.obj~factor(tumor)+factor(ecog),x=T,y=T,surv=T),method="b",B=100,dxy=T) validate(cph(surv.obj~factor(tumor)+factor(ecog)+strat(rx), x=T,y=T,surv=T,time.inc=60),method="b",B=100,dxy=T,u=60) Best regards, Vikki -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3323016.html Sent from the R help mailing list archive at Nabble.com.