Hi, I am trying to use CART to find an ideal cut-off value for a simple diagnostic test (ie when the test score is above x, diagnose the condition). When I put in the model fit=rpart(outcome ~ predictor1(TB144), method="class", data=data8) sometimes it gives me a tree with multiple nodes for the same predictor (see below for example of tree with 1 or multiple nodes). Is there a way to tell it to make only 1 node? Or is it safe to assume that the cut-off value on the primary node is the ideal cut-off? Thanks! Katie http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB144n.jpg http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB122n.jpg -- View this message in context: http://n4.nabble.com/rcart-classification-and-regression-trees-CART-tp964970p964970.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2009-Dec-16 13:44 UTC
[R] rcart - classification and regression trees (CART)
Katie N wrote:> Hi, > I am trying to use CART to find an ideal cut-off value for a simple > diagnostic test (ie when the test score is above x, diagnose the condition). > When I put in the model > > fit=rpart(outcome ~ predictor1(TB144), method="class", data=data8) > > sometimes it gives me a tree with multiple nodes for the same predictor (see > below for example of tree with 1 or multiple nodes). Is there a way to tell > it to make only 1 node? Or is it safe to assume that the cut-off value on > the primary node is the ideal cut-off? > > Thanks! > Katie > > http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB144n.jpg > > http://n4.nabble.com/file/n964970/smartDNA%2BCART%2B-%2BTB122n.jpg > >Katie, Do note that the strategy you are using is inconsistent with decision theory. Optimal decisions have to condition on everything you know about a single patient, and do not ask the question "to what group does this patient belong?". For example, we estimate something given the patient's age is 20 instead of given that her age is less than 60. That's why logistic regression is used so frequently to estimate probabilities of disease. Any cutoff that must be used has to be on the predicted probability scale in order to get an optimum decision, and that cutoff must be specified by the provider of the utility function. Even then the cutoff is not fully trusted, e.g., a physician may order another test as the last minute when the probability of disease is in a gray zone. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Terry Therneau
2009-Dec-16 14:06 UTC
[R] rcart - classification and regression trees (CART)
"Is there a way to tell it to make a tree with only one node?" - see the maxdepth parameter in ?rpart.control "Is it safe to assume that the cut-off value on the primary node is the ideal cut-off?" - trees are built sequentially; the first split will be the same for a tree with only one split or one that continues further. Terry T.