I've recently had a research manuscript rejected by an editor. The manuscript showed that for a real life data set, random forest outperformed multiple linear regression with respect to predicting the target variable. The editor's objection was that random forest is a black box where the random assignment of features to trees was intractable. I need to find an alternative method to random forest that does not suffer from the black box label. Any suggestions? Would caret::treebag be free of random assignment of features? Your assistance is appreciated. -- [[alternative HTML version deleted]]
Though off-topic for this list, your question (complaint?) comes up a lot in discussions of analytical methods, and has generated hundreds of papers (Google is your friend here). You can start with https://www.quora.com/What-are-the-pros-and-cons-of-GLM-vs-Random-forest-vs-SVM <https://www.quora.com/What-are-the-pros-and-cons-of-GLM-vs-Random-forest-vs-SVM> for some of the controversies. It looks to me as if your editor stated (poorly) the problem that some models that are good at pattern-matching (RF) are less useful for predicting new observations. Others n the list who are more erudite than I may choose to comment, amplify, or refute...> On May 30, 2017, at 11:54 AM, Barry King <barry.king at qlx.com> wrote: > > I've recently had a research manuscript rejected by an editor. The > manuscript showed > that for a real life data set, random forest outperformed multiple linear > regression > with respect to predicting the target variable. The editor's objection was > that > random forest is a black box where the random assignment of features to > trees was > intractable. I need to find an alternative method to random forest that > does not > suffer from the black box label. Any suggestions? Would caret::treebag be > free of > random assignment of features? Your assistance is appreciated. > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Simmering, Jacob E
2017-May-30 19:27 UTC
[R] seek non-black box alternative to randomForest
Barry, This is mostly a mailing list about R - you have have more luck with statistical questions on www.stat.stackexchange.com. That said - the editor is wrong. The limitations of trees that random forests ?solves? is overfitting. The mechanism by which a random forest classifier is built is not a black box - some number of features and some number of rows are selected to produce a split. The reasons why this approach avoids the issues associated with trees is also clear. These are theory based claims. The random selection is critical to the function of the process. I?d suggest resubmitting the paper to a different journal instead of trying to find some way to fit a random forest without the random part.> On May 30, 2017, at 1:54 PM, Barry King <barry.king at qlx.com> wrote: > > I've recently had a research manuscript rejected by an editor. The > manuscript showed > that for a real life data set, random forest outperformed multiple linear > regression > with respect to predicting the target variable. The editor's objection was > that > random forest is a black box where the random assignment of features to > trees was > intractable. I need to find an alternative method to random forest that > does not > suffer from the black box label. Any suggestions? Would caret::treebag be > free of > random assignment of features? Your assistance is appreciated. > > -- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I?m interested in the subject. If you send the question to another platform, please share the link here to follow up. Also, I wish to see the manuscript and rejected parts and detailed reasons. Most of the time, scientists want to reveal/discuss underlying physical process in an event and it?s not enough to show that method A is better than method B. Perhaps, discussions and why the randomforest is better than multiple linear regression is not enough for him. This also may mean black box.> On 30 May 2017, at 22:27, Simmering, Jacob E <jacob-simmering at uiowa.edu> wrote: > > Barry, > > This is mostly a mailing list about R - you have have more luck with statistical questions on www.stat.stackexchange.com. > > That said - the editor is wrong. The limitations of trees that random forests ?solves? is overfitting. The mechanism by which a random forest classifier is built is not a black box - some number of features and some number of rows are selected to produce a split. The reasons why this approach avoids the issues associated with trees is also clear. These are theory based claims. The random selection is critical to the function of the process. I?d suggest resubmitting the paper to a different journal instead of trying to find some way to fit a random forest without the random part. > > >> On May 30, 2017, at 1:54 PM, Barry King <barry.king at qlx.com> wrote: >> >> I've recently had a research manuscript rejected by an editor. The >> manuscript showed >> that for a real life data set, random forest outperformed multiple linear >> regression >> with respect to predicting the target variable. The editor's objection was >> that >> random forest is a black box where the random assignment of features to >> trees was >> intractable. I need to find an alternative method to random forest that >> does not >> suffer from the black box label. Any suggestions? Would caret::treebag be >> free of >> random assignment of features? Your assistance is appreciated. >> >> -- >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Tue, May 30, 2017 at 12:11 PM, Don McKenzie <dmck at u.washington.edu> wrote:> Though off-topic for this list, your question (complaint?) comes up a lot in discussions of analytical methods, and has generated hundreds of papers (Google is your friend here). > You can start with > > https://www.quora.com/What-are-the-pros-and-cons-of-GLM-vs-Random-forest-vs-SVM <https://www.quora.com/What-are-the-pros-and-cons-of-GLM-vs-Random-forest-vs-SVM> > > for some of the controversies. It looks to me as if your editor stated (poorly) the problem that some models that are good at pattern-matching (RF) are less useful for predicting new observations. > > Others n the list who are more erudite than I may choose to comment, amplify, or refute...... But hopefully will not, as this could quickly devolve into endless opining that would clog up this list, as you have yourself noted. -- Bert> > >> On May 30, 2017, at 11:54 AM, Barry King <barry.king at qlx.com> wrote: >> >> I've recently had a research manuscript rejected by an editor. The >> manuscript showed >> that for a real life data set, random forest outperformed multiple linear >> regression >> with respect to predicting the target variable. The editor's objection was >> that >> random forest is a black box where the random assignment of features to >> trees was >> intractable. I need to find an alternative method to random forest that >> does not >> suffer from the black box label. Any suggestions? Would caret::treebag be >> free of >> random assignment of features? Your assistance is appreciated. >> >> -- >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.