Dimitrie Siriopol

2017-Jun-13 13:02 UTC

### [R] Classification and Regression Tree for Survival Analysis

I am trying to use the CART in a survival analysis. I have three variables of interest (all 3 ordinal - x, y and z, each of them with 5 categories) from which I want to make smaller groups (just an example 1st category from X variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 categories from the Z category etc) based on their, let's say, association with mortality. Now I would also want that this analysis to be adjusted for a number of variables (that I don't want to incorporate in the decision tree, just to take them into consideration in the relationship between the 3 variables and the outcome; I would also want to mention that for this confounders I have missing values - how should this be deal with?), this survival analysis to be stratified and also to use clusters. I have tried party and rpart packages, but I don't seem to get how to properly do what I want. Thank you [[alternative HTML version deleted]]

Bert Gunter

2017-Jun-13 16:59 UTC

### [R] Classification and Regression Tree for Survival Analysis

1. Please read and follow the posting guide below. Your post does not meet the guidelines. 2. Search before posting! e.g. on rseek.org: "Regression trees survival analysis" in which you will find: https://cran.r-project.org/web/views/MachineLearning.html -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jun 13, 2017 at 6:02 AM, Dimitrie Siriopol via R-help <r-help at r-project.org> wrote:> I am trying to use the CART in a survival analysis. I have three variables of interest (all 3 ordinal - x, y and z, each of them with 5 categories) from which I want to make smaller groups (just an example 1st category from X variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 categories from the Z category etc) based on their, let's say, association with mortality. > Now I would also want that this analysis to be adjusted for a number of variables (that I don't want to incorporate in the decision tree, just to take them into consideration in the relationship between the 3 variables and the outcome; I would also want to mention that for this confounders I have missing values - how should this be deal with?), this survival analysis to be stratified and also to use clusters. > I have tried party and rpart packages, but I don't seem to get how to properly do what I want. > Thank you > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.

Achim Zeileis

2017-Jun-13 20:28 UTC

### [R] Classification and Regression Tree for Survival Analysis

On Tue, 13 Jun 2017, Dimitrie Siriopol via R-help wrote:> I am trying to use the CART in a survival analysis. I have three variables of interest (all 3 ordinal - x, y and z, each of them with 5 categories) from which I want to make smaller groups (just an example 1st category from X variable with the 2nd and 3rd categories from the Y category and 2, 3 and 4 categories from the Z category etc) based on their, let's say, association with mortality. > Now I would also want that this analysis to be adjusted for a number of variables (that I don't want to incorporate in the decision tree, just to take them into consideration in the relationship between the 3 variables and the outcome; I would also want to mention that for this confounders I have missing values - how should this be deal with?), this survival analysis to be stratified and also to use clusters. > I have tried party and rpart packages, but I don't seem to get how to properly do what I want.I don't think that such an analysis is available "out of the box". In principle, you can iterate between (a) estimating a survival regression with the confounders - given the groups from the tree, and (b) estimating the tree - given an offset in the survival regression for the confounders. Such a strategy is implemented in the palmtree() function from the "partykit" package - however only for lm() and glm() models, not for survreg(). But the same idea could be applied in that case as well, e.g., using a Weibull distribution. For incorporating stratification/clustering one could either use clustered inference in the variable selection or add some random effect. For lm/glm this is provided in the package "glmertree" but I don't think there are readily available code blocks to do the same for a survival response. And as for the missing values in the confounders: I can't think of a good strategy for this. One could try generic imputation strategies but it's rather unlikely that this does not affect the subsequent regression plus tree selection process. References for palmtree and glmertree: http://arxiv.org/abs/1612.07498 http://EconPapers.RePEc.org/RePEc:inn:wpaper:2015-10> Thank you > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >