Hi, I am working on fitting a proportional hazard model to predict the probability of default for mortgage loans. I have a question regarding survfit function. My historical data set is a pool of loans with monthly observed default status for the next 24 months. The data is left truncated (delayed entry to observation window after the loan is opened) and right censored. I would like to fit the model with time varying covariate such as unemployment rates and time constant variables at loan application, and then use the model to predict the probability of default in the next 24 month for the pool of loans we have right now, by using function survfit. When loans are outside of the observed time window, is it reliable to use survfit function to do the prediction? If it’s not reliable, how to deal with this problem? Is there another way to set the model? Any thoughts are appreciated. Thanks so much in advance. Ying(Cindy) [[alternative HTML version deleted]]
Two thoughts. First, prediction with time dependent covariates is always an issue. If you had unemployment as a month-by-month time-dependent covariate in the first model, then for prediction you will need to provide a month-by-month future unemployment scenario. Doing this is easy in the code, but how to choose which scenario is "relevant" and/or "interesting" is hard. See section 10.2.4 of Therneau and Grambsh for more discussion. Second, I think your time intervals will be ok. Given what you know now, the question is "will there be failure in the next 24". I'd think of "the next 24" as the time scale, and not a particular slice of calander time such as "1/1/2003 - 1/1/2005" Terry Therneau