The current "classwt" option in the randomForest package has been
there since the beginning, and is different from how the official Fortran code
(version 4 and later) implements class weights. It simply account for the class
weights in the Gini index calculation when splitting nodes, exactly as how a
single CART tree is done when given class weights. Prof. Breiman came up with
the newer class weighting scheme implemented in the newer version of his Fortran
code after we found that simply using the weights in the Gini index didn't
seem to help much in extremely unbalanced data (say 1:100 or worse). If using
weighted Gini helps in your situation, by all means do it. I can only say that
in the past it didn't give us the result we were expecting.
Best,
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of James Long
> Sent: Tuesday, September 13, 2011 2:10 AM
> To: r-help at r-project.org
> Subject: [R] class weights with Random Forest
>
> Hi All,
>
> I am looking for a reference that explains how the
> randomForest function in
> the randomForest package uses the classwt parameter. Here:
>
> http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html
>
> Andy Liaw suggests not using classwt. And according to:
>
> http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt
> -option-td817149.html
>
> it has "not been implemented" as of 2007. However it improved
> classification
> performance for a problem I am working on, more than
> adjusting the sampsize
> parameter. So I'm wondering if it has been implemented
> recently (since 2007)
> or if there is a detailed explanation of what this
> unimplemented version is
> doing.
>
> Thanks!
> James
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}