Great, thank you so much Achim. But one issue, in case I do not know how many terminal nodes would be there, what do I do? Note that I do not need the datasets corresponding to the intermediate nodes only need the terminal datasets. Regards, Preetam On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:> On Mon, 2 May 2016, Preetam Pal wrote: > > Hi guys, >> >> If I am applying ctree() on a data (specifying some control parameters >> like >> maxdepth), is there a way I can programmatically access the (smaller) >> datasets corresponding to the terminal nodes in the tree? Say, if there >> are >> 7 terminal nodes, I need those 7 datasets (of course, I can look at the >> respective node-splitting attributes and write out a filtering function - >> but clearly too much to ask for if I have a large number of terminal >> nodes). Intention is to perform regression on each of these terminal >> datasets. >> > > If you use the "partykit" implementation you can do: > > library("partykit") > ct <- ctree(Species ~ ., data = iris) > data_party(ct, id = 6) > > to obtain the data associated with node 6 for example. You can also use > ct[6] to obtain the subtree and ct[6]$data for its associated data. > > For setting up a factor with the terminal node IDs, you can also use > predict(ct, type = "node") and then use that in lm() etc. > > Finally, note that there is also lmtree() and glmtree() for trees with > (generalized) linear models in their nodes. > > Regards, >> Preetam >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, Room No. >> N-114 >> Statistics Division, C.V.Raman >> Hall >> Indian Statistical Institute, B.H.O.S. >> Kolkata. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>-- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]
On Mon, 2 May 2016, Preetam Pal wrote:> Great, thank you so much Achim.But one issue, in case I do not know how many > terminal nodes would be there, what do I do? Note that I do not need the > datasets corresponding to the intermediate nodes only need the terminal > datasets.With predict(ct, type = "node") you can set up a new variable, e.g., iris$node <- factor(predict(ct, type = "node")) and then use this to obtain the subset corresponding to each of the terminal nodes.> Regards, > Preetam? > > On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> > wrote: > On Mon, 2 May 2016, Preetam Pal wrote: > > Hi guys, > > If I am applying ctree() on a data (specifying some > control parameters like > maxdepth), is there a way I can programmatically > access the (smaller) > datasets corresponding to the terminal nodes in the > tree? Say, if there are > 7 terminal nodes, I need those 7 datasets (of > course, I can look at the > respective node-splitting attributes and write out a > filtering function - > but clearly too much to ask for if I have a large > number of terminal > nodes). Intention is to perform regression on each > of these terminal > datasets. > > > If you use the "partykit" implementation you can do: > > library("partykit") > ct <- ctree(Species ~ ., data = iris) > data_party(ct, id = 6) > > to obtain the data associated with node 6 for example. You can > also use ct[6] to obtain the subtree and ct[6]$data for its > associated data. > > For setting up a factor with the terminal node IDs, you can also > use predict(ct, type = "node") and then use that in lm() etc. > > Finally, note that there is also lmtree() and glmtree() for > trees with (generalized) linear models in their nodes. > > Regards, > Preetam > > -- > Preetam Pal > (+91)-9432212774 > M-Stat 2nd Year,? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ?Room No. N-114 > Statistics Division,? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ?C.V.Raman > Hall > Indian Statistical Institute,? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ?B.H.O.S. > Kolkata. > > ? ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE > and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > > > > > -- > Preetam Pal ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > (+91)-9432212774 > M-Stat 2nd Year, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Room No. N-114 > Statistics Division, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? C.V.Raman > HallIndian Statistical Institute, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? B.H.O.S. > Kolkata. > >
Again, really appreciate your help on this. Thanks, Achim. -Preetam On Tue, May 3, 2016 at 3:22 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:> On Mon, 2 May 2016, Preetam Pal wrote: > > Great, thank you so much Achim.But one issue, in case I do not know how >> many >> terminal nodes would be there, what do I do? Note that I do not need the >> datasets corresponding to the intermediate nodes only need the terminal >> datasets. >> > > With predict(ct, type = "node") you can set up a new variable, e.g., > > iris$node <- factor(predict(ct, type = "node")) > > and then use this to obtain the subset corresponding to each of the > terminal nodes. > > > Regards, >> Preetam >> >> On Tue, May 3, 2016 at 3:08 AM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> >> wrote: >> On Mon, 2 May 2016, Preetam Pal wrote: >> >> Hi guys, >> >> If I am applying ctree() on a data (specifying some >> control parameters like >> maxdepth), is there a way I can programmatically >> access the (smaller) >> datasets corresponding to the terminal nodes in the >> tree? Say, if there are >> 7 terminal nodes, I need those 7 datasets (of >> course, I can look at the >> respective node-splitting attributes and write out a >> filtering function - >> but clearly too much to ask for if I have a large >> number of terminal >> nodes). Intention is to perform regression on each >> of these terminal >> datasets. >> >> >> If you use the "partykit" implementation you can do: >> >> library("partykit") >> ct <- ctree(Species ~ ., data = iris) >> data_party(ct, id = 6) >> >> to obtain the data associated with node 6 for example. You can >> also use ct[6] to obtain the subtree and ct[6]$data for its >> associated data. >> >> For setting up a factor with the terminal node IDs, you can also >> use predict(ct, type = "node") and then use that in lm() etc. >> >> Finally, note that there is also lmtree() and glmtree() for >> trees with (generalized) linear models in their nodes. >> >> Regards, >> Preetam >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, >> Room No. N-114 >> Statistics Division, >> C.V.Raman >> Hall >> Indian Statistical Institute, >> B.H.O.S. >> Kolkata. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE >> and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> >> >> -- >> Preetam Pal >> (+91)-9432212774 >> M-Stat 2nd Year, Room No. >> N-114 >> Statistics Division, C.V.Raman >> HallIndian Statistical Institute, B.H.O.S. >> Kolkata. >> >>-- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]