Hi guys, If I am applying ctree() on a data (specifying some control parameters like maxdepth), is there a way I can programmatically access the (smaller) datasets corresponding to the terminal nodes in the tree? Say, if there are 7 terminal nodes, I need those 7 datasets (of course, I can look at the respective node-splitting attributes and write out a filtering function - but clearly too much to ask for if I have a large number of terminal nodes). Intention is to perform regression on each of these terminal datasets. Regards, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]
On Mon, 2 May 2016, Preetam Pal wrote:> Hi guys, > > If I am applying ctree() on a data (specifying some control parameters like > maxdepth), is there a way I can programmatically access the (smaller) > datasets corresponding to the terminal nodes in the tree? Say, if there are > 7 terminal nodes, I need those 7 datasets (of course, I can look at the > respective node-splitting attributes and write out a filtering function - > but clearly too much to ask for if I have a large number of terminal > nodes). Intention is to perform regression on each of these terminal > datasets.If you use the "partykit" implementation you can do: library("partykit") ct <- ctree(Species ~ ., data = iris) data_party(ct, id = 6) to obtain the data associated with node 6 for example. You can also use ct[6] to obtain the subtree and ct[6]$data for its associated data. For setting up a factor with the terminal node IDs, you can also use predict(ct, type = "node") and then use that in lm() etc. Finally, note that there is also lmtree() and glmtree() for trees with (generalized) linear models in their nodes.> Regards, > Preetam > > -- > Preetam Pal > (+91)-9432212774 > M-Stat 2nd Year, Room No. N-114 > Statistics Division, C.V.Raman > Hall > Indian Statistical Institute, B.H.O.S. > Kolkata. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Great, thank you so much Achim. But one issue, in case I do not know how many terminal nodes would be there, what do I do? Note that I do not need the datasets corresponding to the intermediate nodes only need the terminal datasets. Regards, Preetam On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:> On Mon, 2 May 2016, Preetam Pal wrote: > > Hi guys, >> >> If I am applying ctree() on a data (specifying some control parameters >> like >> maxdepth), is there a way I can programmatically access the (smaller) >> datasets corresponding to the terminal nodes in the tree? Say, if there >> are >> 7 terminal nodes, I need those 7 datasets (of course, I can look at the >> respective node-splitting attributes and write out a filtering function - >> but clearly too much to ask for if I have a large number of terminal >> nodes). Intention is to perform regression on each of these terminal >> datasets. >> > > If you use the "partykit" implementation you can do: > > library("partykit") > ct <- ctree(Species ~ ., data = iris) > data_party(ct, id = 6) > > to obtain the data associated with node 6 for example. You can also use > ct[6] to obtain the subtree and ct[6]$data for its associated data. > > For setting up a factor with the terminal node IDs, you can also use > predict(ct, type = "node") and then use that in lm() etc. > > Finally, note that there is also lmtree() and glmtree() for trees with > (generalized) linear models in their nodes. > > Regards, >> Preetam >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, Room No. >> N-114 >> Statistics Division, C.V.Raman >> Hall >> Indian Statistical Institute, B.H.O.S. >> Kolkata. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>-- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]