Greetings, I checked the Indian diabetes data again and get one tree for the data with reordered columns and another tree for the original data. I compared these two trees, the split points for these two trees are exactly the same but the fitted classes are not the same for some cases. And the misclassification errors are different too. I know how CART deal with ties --- even we are using the same data, the subjects to the left and right would not be the same if we just rearrange the order of covariates. But the problem is, the fitted trees are exactly the same on the split points. Shouldn't we get the same fitted values if the decisions are the same at each step? Why the same structured trees have different observations on the nodes? The source code for running the diabetes data example and the output of trees are attached. Your professional opinion is very much appreciated. library(mlbench) data(PimaIndiansDiabetes2) mydata<-PimaIndiansDiabetes2 library(rpart) fit2<-rpart(diabetes~., data=mydata,method="class") plot(fit2,uniform=T,main="CART for original data") text(fit2,use.n=T,cex=0.6) printcp(fit2) table(predict(fit2,type="class"),mydata$diabetes) ## misclassifcation table: rows are fitted class neg pos neg 437 68 pos 63 200 pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)]) fit3<-rpart(diabetes~., data=pmydata,method="class") plot(fit3,uniform=T,main="CART after exchaging mass & glucose") text(fit3,use.n=T,cex=0.6) printcp(fit3) table(predict(fit3,type="class"),pmydata$diabetes) ##after exchage the order of BODY mass and PLASMA glucose neg pos neg 436 64 pos 64 204 Best, -- -------------------------------------------------------------------------------------- Yuanyuan Huang Email: sunnyuan.h at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: ReorderedTree.pdf Type: application/pdf Size: 5984 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090522/e4a941c2/attachment-0004.pdf> -------------- next part -------------- A non-text attachment was scrubbed... Name: OriginalTree.pdf Type: application/pdf Size: 5922 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090522/e4a941c2/attachment-0005.pdf>
Yuanyuan wrote:> Greetings, > > I checked the Indian diabetes data again and get one tree for the data with > reordered columns and another tree for the original data. I compared these > two trees, the split points for these two trees are exactly the same but the > fitted classes are not the same for some cases. And the misclassification > errors are different too. I know how CART deal with ties --- even we are > using the same data, the subjects to the left and right would not be the > same if we just rearrange the order of covariates. > > But the problem is, the fitted trees are exactly the same on the split > points. Shouldn't we get the same fitted values if the decisions are the > same at each step? Why the same structured trees have different observations > on the nodes?Because they may use different surrogate variables. Note that your data contain missing values that are handled by surrogates. Best, Uwe Ligges> The source code for running the diabetes data example and the output of > trees are attached. Your professional opinion is very much appreciated. > > library(mlbench) > data(PimaIndiansDiabetes2) > mydata<-PimaIndiansDiabetes2 > library(rpart) > fit2<-rpart(diabetes~., data=mydata,method="class") > plot(fit2,uniform=T,main="CART for original data") > text(fit2,use.n=T,cex=0.6) > printcp(fit2) > table(predict(fit2,type="class"),mydata$diabetes) > ## misclassifcation table: rows are fitted class > neg pos > neg 437 68 > pos 63 200 > > > pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)]) > fit3<-rpart(diabetes~., data=pmydata,method="class") > plot(fit3,uniform=T,main="CART after exchaging mass & glucose") > text(fit3,use.n=T,cex=0.6) > printcp(fit3) > table(predict(fit3,type="class"),pmydata$diabetes) > ##after exchage the order of BODY mass and PLASMA glucose > neg pos > neg 436 64 > pos 64 204 > > > Best, > > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.