Hello R-users. I believe that the way basehaz (in the survival package) compute the baseline hazard function is false. I come to question this function when it gives me hazard probabilities greater than 1. Looking at the code I think I've localised the error : hazard probability is computed as : H <- -log(surv) but it seems to me that hazard probabilities is rather an instantaneous survival rate that could be computed this way : H[i] <- 1 - surv[i] / surv[i-1] Using this rule I achieve satisfiable results with the two following functions : surv2haz <- function(surv) { haz <- surv haz[1] <- 1 - surv[1] for(i in c(2:length(surv))) { haz[i] <- 1 - surv[i] / surv[i - 1] } return(haz) } haz2surv <- function(haz) { surv <- haz surv[1] <- 1 - haz[1] for(i in c(2:length(haz))) { surv[i] <- (1 - haz[i]) * surv[i-1] } return(surv) } If I'm right, wouldn't it be a good idea to change the basehaz function, to avoid misleading the overconfident user (as I happen to be) ? I hope this will help contributing to a wonderful tool that speed up my understanding of statistical analysis and my research. David -- David Mas ERMES-FRE 2887-CNRS Universit? Pantheon-Assas Paris II 12, place du Pantheon F-75230 Paris Cedex 05 Tel: +33 (0)1 44 41 89 91 Mob: +33 (0)6 84 15 77 67 Fax: +33 (0)1 40 51 81 30 http://www.u-paris2.fr/ermes/
Hi, David Mas wrote:> I believe that the way basehaz (in the survival package) compute the > baseline hazard function is false. > > I come to question this function when it gives me hazard probabilities > greater than 1. > > Looking at the code I think I've localised the error : > > hazard probability is computed as : > > H <- -log(surv)Maybe the documentation is not clear enough about that, but what you obtain in the previous code line is not the "hazard probability" but the "Cumulative Hazard Rate". The hazard rate is typically defined h(t)=f(t)/S(t) where f(t) is the density at time t and S(t) is the value of the survival function at time t. And please note that this is not a probability. The hazard can take on values larger than 1. Besides the literature given in ?basehaz, I can recommend Klein, JP Moeschberger, ML. Survival Analysis: Techniques for Censored and Truncated Data. Springer 2003 Starting at page 27, the hazard function and the cumulative hazard are introduced. Hope this helps, Roland [[alternative HTML version deleted]]