Yuanyuan
2009-May-12 16:19 UTC
[R] questions on rpart (tree changes when rearrange the order of covariates?!)
Greetings, I am using rpart for classification with "class" method. The test data is the Indian diabetes data from package mlbench. I fitted a classification tree firstly using the original data, and then exchanged the order of Body mass and Plasma glucose which are the strongest/important variables in the growing phase. The second tree is a little different from the first one. The misclassification tables are different too. I did not change the data, but why the results are so different? Does anyone know how rpart deal with ties? Here is the codes for running the two trees. library(mlbench) data(PimaIndiansDiabetes2) mydata<-PimaIndiansDiabetes2 library(rpart) fit2<-rpart(diabetes~., data=mydata,method="class") plot(fit2,uniform=T,main="CART for original data") text(fit2,use.n=T,cex=0.6) printcp(fit2) table(predict(fit2,type="class"),mydata$diabetes) ## misclassifcation table: rows are fitted class neg pos neg 437 68 pos 63 200 #Klimt(fit2,mydata) pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)]) fit3<-rpart(diabetes~., data=pmydata,method="class") plot(fit3,uniform=T,main="CART after exchaging mass & glucose") text(fit3,use.n=T,cex=0.6) printcp(fit3) table(predict(fit3,type="class"),pmydata$diabetes) ##after exchage the order of BODY mass and PLASMA glucose neg pos neg 436 64 pos 64 204 #Klimt(fit3,pmydata) Thanks, -------------------------------------------------------------------------------------- Yuanyuan Huang [[alternative HTML version deleted]]
Uwe Ligges
2009-May-13 09:30 UTC
[R] questions on rpart (tree changes when rearrange the order of covariates?!)
Yuanyuan wrote:> Greetings, > > I am using rpart for classification with "class" method. The test data is > the Indian diabetes data from package mlbench. > > I fitted a classification tree firstly using the original data, and then > exchanged the order of Body mass and Plasma glucose which are the > strongest/important variables in the growing phase. The second tree is a > little different from the first one. The misclassification tables are > different too. I did not change the data, but why the results are so > different?Well, at some splits the variable that comes first and yields in the same reduction of the entropy criterion as another one might be used, hence another result. Uwe Ligges> > Does anyone know how rpart deal with ties? > > Here is the codes for running the two trees. > > > library(mlbench) > data(PimaIndiansDiabetes2) > mydata<-PimaIndiansDiabetes2 > library(rpart) > fit2<-rpart(diabetes~., data=mydata,method="class") > plot(fit2,uniform=T,main="CART for original data") > text(fit2,use.n=T,cex=0.6) > printcp(fit2) > table(predict(fit2,type="class"),mydata$diabetes) > ## misclassifcation table: rows are fitted class > neg pos > neg 437 68 > pos 63 200 > #Klimt(fit2,mydata) > > pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)]) > fit3<-rpart(diabetes~., data=pmydata,method="class") > plot(fit3,uniform=T,main="CART after exchaging mass & glucose") > text(fit3,use.n=T,cex=0.6) > printcp(fit3) > table(predict(fit3,type="class"),pmydata$diabetes) > ##after exchage the order of BODY mass and PLASMA glucose > neg pos > neg 436 64 > pos 64 204 > #Klimt(fit3,pmydata) > > > Thanks, > > > -------------------------------------------------------------------------------------- > Yuanyuan Huang > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.