So I have 2 sets of data - a training data set and a test data set. I've been doing the analysis on the training data set and then using predict and feeding the test data through that. There are 114 rows in the training data and 117 in the test data and 1024 columns in both. It's actually the same set of data split into two. The rows are made of 5 different numbers. They do represent something but it would take too long to explain. I want to try and find a classification rule for the 5 numbers in the rows based on the columns so I created a classification tree and plotted that and then pruned it. My question is how do you print the misclassification rate at each node on the actual diagram of the classification tree. I can't seem to get it up there. In my notes it uses gmistext() but I have a feeling that it's for Splus rather than R as gmistext() doesn.t work for me either. Second question is when I try using the predict.tree to put the test data into the tree and then plot it it comes up with a really weird and wrong looking plot. Here is the code I'm using: tree1 <- tree(row~.,data=train) pruned.tree <- prune.tree(tree1, best = 5, method = "misclass") predict.tree1 <- predict.tree(prune.tree, data = main) plot(predict.tree);text(predict.tree) I sort of don't get a classification tree, I get the x axis labelled 1, the y axis labelled 2 and then about 4 small black rectangles scattered across the plot. Thanks in Advance. -- View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22461673.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2009-Mar-11 20:12 UTC
[R] Couple of Questions about Classification trees
Jen_mp3 wrote:> So I have 2 sets of data - a training data set and a test data set. I've been > doing the analysis on the training data set and then using predict and > feeding the test data through that. There are 114 rows in the training data > and 117 in the test data and 1024 columns in both. It's actually the same > set of data split into two. The rows are made of 5 different numbers. They > do represent something but it would take too long to explain.Your sample size is too small by a factor of perhaps 100 for simple data splitting to provide stable results. Then you have the problem of an improper scoring rule, i.e., one that when optimized gives the wrong answer. Frank Harrell> > I want to try and find a classification rule for the 5 numbers in the rows > based on the columns so I created a classification tree and plotted that and > then pruned it. My question is how do you print the misclassification rate > at each node on the actual diagram of the classification tree. I can't seem > to get it up there. In my notes it uses gmistext() but I have a feeling that > it's for Splus rather than R as gmistext() doesn.t work for me either. > > Second question is when I try using the predict.tree to put the test data > into the tree and then plot it it comes up with a really weird and wrong > looking plot. Here is the code I'm using: > tree1 <- tree(row~.,data=train) > pruned.tree <- prune.tree(tree1, best = 5, method = "misclass") > predict.tree1 <- predict.tree(prune.tree, data = main) > plot(predict.tree);text(predict.tree) > I sort of don't get a classification tree, I get the x axis labelled 1, the > y axis labelled 2 and then about 4 small black rectangles scattered across > the plot. > > Thanks in Advance.-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
The issue with the sample size is that there are so many measurements in comparison to number of meats. Aside from that, you should check out the rpart package. Its commands are similar to the tree package, but there are more options for the plots. I don't know immediately how to display misclassification rates, but the text.rpart command can display numbers of incorrectly- and correctly-classified observations in each node. Ed -- Ed Merkle, PhD Assistant Professor Dept. of Psychology Wichita State University Wichita, KS, USA 67260> Date: Wed, 11 Mar 2009 13:53:46 -0700 (PDT) > From: Jen_mp3 <Jen_mp3 at msn.com> > Subject: Re: [R] Couple of Questions about Classification trees > To: r-help at r-project.org > Message-ID: <22464302.post at talk.nabble.com> > Content-Type: text/plain; charset=us-ascii > > > > Okay perhaps I should've been more clear about the data. Im actually working > on spectroscopic measurements from food authenticity testing. I have five > different types of meat: 55 of chicken, 55 of turkey, 55 of pork, 34 of beef > and 32 of lamb - 231 in total. On each of these 231 meats, 1024 > spectroscopic measurements were taken. Matrix of 231 by 1024. But the > questions I want answered are which of the 1024 measurements are important > for predicting meat type and which of the different types of meat are > incorrectly classified - i.e can we tell the difference between chicken and > turkey. So to carry out a multivariate analysis on the data Ive split it > into two. A training data set and a test data set - half and half although I > think the larger half (55 goes into 27 and 28) went into the test data set > which explains the inequalities in the row numbers. By the way 1024 is > standard - can't change that. Can't change the 231 either. > > So I created a new row with the meat types for each row. > > End up with the following R code: > library(tree) > meat.tree <- tree(meat.type~., data=train) > using tree.cv (or cv.tree) lowest missclassification rate is 5 so cut the > number of nodes down to 5 using prune.tree > prunedtree <- prune.tree(meat.tree, best = 5, method = "misclass") > Then I want to use predict.tree and the test data set. > predicttree <- predict.tree(prunedtree, data = test) > I already said what it produces. > > Again, how would I display the misclassification rate at each node on the > diagram? I know about misclass.tree(prunedtree, detail = TRUE) but that > doesn't actually display them on the classification tree - it just gives a > bunch of numbers of the worksheet and it just wouldn't look very neat if I > had to add them later. > > -- > View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22464302.html > Sent from the R help mailing list archive at Nabble.com.