Koffijberg, H.
2006-Sep-19 12:04 UTC
[R] How to interpret these results from a simple gamma-frailty model
Dear R users, I'm trying to fit a gamma-frailty model on a simulated dataset, with 6 covariates, and I'm running into some results I do not understand. I constructed an example from my simulation code, where I fit a coxph model without frailty (M1) and with frailty (M2) on a number of data samples with a varying degree of heterogeneity (I'm running R 2.3.1, running takes ~1 min). library(survival); set.seed(10000) lambda <- 0.01 # Exp. hazard rate # Beta coefficients for Age,TC,HDLC,SBP,Diab,Smok beta <- c(0.0483,0.0064,-0.0270,0.0037,0.4284,0.5234) n <- 1000; Ngrp <- 2; # Nr patients, Nr frailty groups # Thetas for gamma-frailty thetaset <- c(1,2,10,100); Ntheta <- length(thetaset); # Define the simulated population age <-rnorm(n,48.6,11.7);tc<-rnorm(n,200,30) hdlc<-rnorm(n,47,6);sbp<-rnorm(n,135,6) rtmp <- runif(0,1,n=n); diab <- rep(0, n); diab[rtmp < 0.05223] <- 1; rtmp <- runif(0,1,n=n); smok <- rep(0, n); smok[rtmp < 0.40458] <- 1; # Storage for the estimated coefficients in both models cf <- 0; length(cf) <- 2*Ntheta*6; dim(cf) <- c(Ntheta,2,6); for (i in 1:Ntheta) { l = thetaset[i] v=rep(rgamma(n/2,shape=l,scale=1/l),2) # Shared frailty values print(paste("Theta = ",l, " variance in introduced frailty v = ",var(v))) id=rep(seq(1:(n/2)),2); # Frailty id's, c <- rep(1,n); # censoring flags r <- runif(n,0,1); # For random variate generation t <- (-1*log(r)) / (lambda*v*exp(age*beta[1]+tc*beta[2]+hdlc*beta[3]+sbp*beta[4]+diab*beta[5]+smok*beta[6])) fitM1=coxph(Surv(t,c)~age+tc+hdlc+sbp+diab+smok); # Model 1, no frailty cf[i,1,] <- round(fitM1$coef,6) # Model 1 coefficients fitM2=coxph(Surv(t,c)~age+tc+hdlc+sbp+diab+smok+ frailty(id,dist="gamma",sparse=T,method="em")) # Model 2, frailty cf[i,2,] <- round(fitM2$coef,6) # Model 2, coefficients # store estimated Variance of random effect varre=fitM2$history$frailty$history[length(fitM2$history$frailty$history[,1])] print(paste("Theta = ",l, " variance in estimated random effect= ",varre)) print(paste("Theta = ",l, " variance in estimated ind. frailty = ", var(exp(fitM2$frail)))) print(paste("Theta=",l," org beta ",beta," M1: ",cf[i,1,], " M2: ", cf[i,2,])) # Calculate the 'actual' 10-year risk and the risk estimated using M1 and M2 estv <- exp(rep(fitM2$frail,2)) # Individual frailty values from M2 # Original risk score rs<- age*beta[1]+tc*beta[2]+hdlc*beta[3]+sbp*beta[4]+diab*beta[5]+smok*beta[6] pevent <- exp(-10*lambda)^(exp(rs)) # Original 10-year risk rsM1<- (cf[i,1,1]*age+cf[i,1,2]*tc+cf[i,1,3]*hdlc+cf[i,1,4]*sbp+cf[i,1,5]*diab+cf[i,1,6]*smok) peventM1 <- exp(-10*lambda)^(exp(rsM1)) rsM2<- (cf[i,1,1]*age+cf[i,1,2]*tc+cf[i,1,3]*hdlc+cf[i,1,4]*sbp+cf[i,1,5]*diab+cf[i,1,6]*smok) peventM2 <- exp(-10*lambda)^(estv*exp(rsM1)) # Proportion of more accurate predictions pred <- sum(abs(pevent-peventM2) < abs(pevent-peventM1))/n print(paste("Theta = ",l," proportion pred. M2 more accurate than M1 = ",pred)); } corresponding output: [1] "Theta = 1 variance in introduced frailty v = 0.948105787326522" [1] "Theta = 1 variance in estimated random effect= 1.04435029128552" [1] "Theta = 1 variance in estimated ind. frailty = 0.520078219281493" [1] "Theta= 1 org beta 0.0483 M1: 0.025015 M2: 0.045987" [2] "Theta= 1 org beta 0.0064 M1: 0.002274 M2: 0.005178" [3] "Theta= 1 org beta -0.027 M1: -0.014239 M2: -0.028824" [4] "Theta= 1 org beta 0.0037 M1: 0.00071 M2: 0.010826" [5] "Theta= 1 org beta 0.4284 M1: 0.421602 M2: 0.648227" [6] "Theta= 1 org beta 0.5234 M1: 0.360069 M2: 0.593551" [1] "Theta = 1 proportion pred. M2 more accurate than M1 = 0.448" [1] "Theta = 2 variance in introduced frailty v = 0.515806773271924" [1] "Theta = 2 variance in estimated random effect= 0.644954876113229" [1] "Theta = 2 variance in estimated ind. frailty = 0.277209957022078" [1] "Theta= 2 org beta 0.0483 M1: 0.033213 M2: 0.046216" [2] "Theta= 2 org beta 0.0064 M1: 0.005633 M2: 0.009491" [3] "Theta= 2 org beta -0.027 M1: -0.010048 M2: -0.016658" [4] "Theta= 2 org beta 0.0037 M1: -0.006032 M2: -0.003639" [5] "Theta= 2 org beta 0.4284 M1: 0.402992 M2: 0.403226" [6] "Theta= 2 org beta 0.5234 M1: 0.457665 M2: 0.691871" [1] "Theta = 2 proportion pred. M2 more accurate than M1 = 0.482" [1] "Theta = 10 variance in introduced frailty v = 0.104197974543906" [1] "Theta = 10 variance in estimated random effect= 0.00331174766884151" [1] "Theta = 10 variance in estimated ind. frailty = 2.30691702541282e-05" [1] "Theta= 10 org beta 0.0483 M1: 0.048761 M2: 0.048885" [2] "Theta= 10 org beta 0.0064 M1: 0.002817 M2: 0.002833" [3] "Theta= 10 org beta -0.027 M1: -0.024016 M2: -0.02411" [4] "Theta= 10 org beta 0.0037 M1: 0.002227 M2: 0.002206" [5] "Theta= 10 org beta 0.4284 M1: 0.36471 M2: 0.36624" [6] "Theta= 10 org beta 0.5234 M1: 0.531466 M2: 0.532811" [1] "Theta = 10 proportion pred. M2 more accurate than M1 = 0.604" [1] "Theta = 100 variance in introduced frailty v = 0.0101553285954112" [1] "Theta = 100 variance in estimated random effect= 0.0245876188637748" [1] "Theta = 100 variance in estimated ind. frailty = 0.00113624325597910" [1] "Theta= 100 org beta 0.0483 M1: 0.051439 M2: 0.052775" [2] "Theta= 100 org beta 0.0064 M1: 0.002654 M2: 0.002834" [3] "Theta= 100 org beta -0.027 M1: -0.025281 M2: -0.025606" [4] "Theta= 100 org beta 0.0037 M1: 0.005468 M2: 0.005903" [5] "Theta= 100 org beta 0.4284 M1: 0.467735 M2: 0.48221" [6] "Theta= 100 org beta 0.5234 M1: 0.673852 M2: 0.683147" [1] "Theta = 100 proportion pred. M2 more accurate than M1 = 0.564" Warning message: Inner loop failed to coverge for iterations 2 3 in: coxpenal.fit(X, Y, strats, offset, init = init, control, weights = weights,>I don't understand the following: [1] The variance of the estimated individual frailty values is always much lower than both the original variance and the estimated variance of the random effect. Why is this ? Obviously the variance of the random effect is not calculated directly from the individual frailties after exponentiating them. [2] Random effects that are not very large are not even picked up by M2, e.g. theta = 10, yields a variance of ~1/10 in the frailty distribution used to set up the data, however, the estimated variance of that random effect equals only 0.003311. Why is frailty not picked up in this case ? Is it just that the variance of the heterogeneity inserted into the data is too small ? [3] When I compare the predictive abilities of M1 and M2, by calculating the differences, e.g. between predicted 10-year risks, and determine the proportion of more accurate predictions over the complete sample, when using M2 instead of M1, M2 does not perform relevantly better than M1. For theta = 1 or 2 (substantial heterogeneity) M2 performs even worst than M1, while it should be able to take this heterogeneity into account and model M1 cannot do that. Can anyone explain why using M2 does not lead to better predictions in this situation ? [4] Intuitively it seems to me that in absence of heterogeneity (or with very little heterogeneity, e.g. theta=100) both models should be able to estimate the regression coefficients for the 6 covariates accurately, given enough events. With n=1000, the estimated coefficients are still very inaccurate (see e.g. theta=100, beta 2, 4 and 6) even though we have over 150 events/parameter. Even with lots of events (e.g. 50000) the estimates get more accurately but still are way from perfect. Is there an underlying reason for this slow convergence ? Is there anyone who can clarify some of these results ? Many thanks in advance for your help. Erik. ====================================================Erik Koffijberg, MSc Julius Center for Health Sciences and Primary Care University Medical Center Utrecht Str. 6.131 P.O.Box 85500, 3508 GA Utrecht, The Netherlands Tel: +31 30 250 3013, Fax: +31 30 250 5480 E-mail: H.Koffijberg at umcutrecht.nl