Jose Bustos Melo
2011-Aug-29 18:53 UTC
[R] Rpart modelling a decisión tree and getting probability
Hello everyone, I working in a public health project and we have created a Decision Tree for categorical variables usign the package rpart. Our goal is to develop a model (Using the ROC tool) in order to predict presence/ausent of diabetes and get a better understanding of what are the important factors in a particular chilean population. There are some importants variable that we have found. Now we want to apply this model over a big dataset in order to determinate a possible outcome (probability of getting the deseasse), but we only have the combination of predictive variables for a particular person. We have created this code: library( rpart) fit1 <- rpart(sickness~ aetinghabit+gse+age+sex, method="class", data=data) prediccion<-predict(fit1,bigdatabase, type="prob") predictionsyes<-prediccion[,2] pred <- prediction(predictionsyes, datos$sickness) # but this is My question is. How do I put the people's conditions in this model in order to get the people probability of getting this desease? It's possible to do a ROC curve using only this bigdatabase? Because we don't have the outcome if this people got or not this disease. It would be very helpful if someone can give us some light about it. Any web source of doing it will be very appreciated. Thanks in advance. Best Regards, José Bustos Escuela de Enfermeria Pontificia Universidad Católica de Chile Proyecto FONIS 2010 Celular 95939144 [[alternative HTML version deleted]]