Florian Koller-Meinfelder
2007-Apr-19 06:03 UTC
[R] "tree-ID" in any segmentation package available?
Dear R-helpers, I am looking for a segmentation package that gives some "tree identifier" as output for every observation in the data set (my response variable is binary). I have skimmed through "rpart", "ada" and "adabag": The output "trees" gives you the formula, but I have to run several thousand segmentations on different data sets and it is tricky to use this information within a macro (the only thing I could think of is to use some string manipulation on the tree formula and apply it to the data, but I hope there is an easier way - e.g. if the algorithm created 12 different trees a vector that links every observation to one of these 12 segments would be ideal). Cheers, Florian Florian Koller-Meinfelder Research Consulting & Development ______________________________ GfK Fernsehforschung GmbH Nordwestring 101 90319 Nürnberg Tel +49 (0)911 395-3554 Fax +49 (0)911 395-4130 www.gfk.com/gfkfernsehforschung This email and any attachments may contain confidential or p...{{dropped}}
Torsten Hothorn
2007-Apr-19 12:45 UTC
[R] "tree-ID" in any segmentation package available?
On Thu, 19 Apr 2007, Florian Koller-Meinfelder wrote:> Dear R-helpers, > > I am looking for a segmentation package that gives some "tree identifier" > as output for every observation in the data set (my response variable is > binary). I have skimmed through "rpart", "ada" and "adabag": The output > "trees" gives you the formula, but I have to run several thousand > segmentations on different data sets and it is tricky to use this > information within a macro (the only thing I could think of is to use some > string manipulation on the tree formula and apply it to the data, but I > hope there is an easier way - e.g. if the algorithm created 12 different > trees a vector that links every observation to one of these 12 segments > would be ideal). >is this> library("party") > airq <- subset(airquality, !is.na(Ozone)) > airct <- ctree(Ozone ~ ., data = airq,+ controls = ctree_control(maxsurrogate = 3))> where(airct)[1] 5 5 5 5 5 5 5 5 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 3 5 6 9 9 6 5 5 5 5 5 8 9 [38] 6 8 9 8 8 8 8 5 6 6 3 6 8 8 9 3 8 8 6 9 8 8 8 6 3 6 6 8 8 8 8 9 8 9 6 6 5 [75] 3 5 6 6 5 5 6 3 8 9 8 8 8 8 8 8 8 8 9 6 6 5 5 6 5 3 5 5 3 5 5 5 6 5 5 6 5 [112] 5 3 5 5 5 what you want? `where' gives you the number of the terminal node each observation in the learning sample is element of. Best wishes, Torsten> Cheers, > Florian > > > > > Florian Koller-Meinfelder > Research Consulting & Development > ______________________________ > > GfK Fernsehforschung GmbH > Nordwestring 101 > 90319 N?rnberg > > Tel +49 (0)911 395-3554 > Fax +49 (0)911 395-4130 > www.gfk.com/gfkfernsehforschung > > > > > > This email and any attachments may contain confidential or...{{dropped}}