Weiwei Shi
2006-Sep-28 17:52 UTC
[R] mx2 contingency tables or (2^(m-1)-1)'s 2x2 contingency tables in the context of feature selection for random forest
Dear Listers: I have a categorical feature selection problem for random forest. Suppose I have a multiple-leveled category variable A, which has m=3 levels: red, green, and blue and the final target is binary classification. I want to evaluate its power in discrimination between 2 classes. We know rf splits multiple-leveled category variable by considering all combinations of its levels. So suppose again I have 1000 such multiple-leveled category variables and I need to do some feature selection. Then I would like to try chi-sqr tests (or information gain). To match the splitting method used in rf, I am thinking if I should simply use mx2 contingency table or (2^(m-1)-1)'s 2x2 contingency tables in which I pick the best p-value to evaluate A's power. For the latter, I am sure it is very alike the way used in rf. But is the former good enough? Thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
Possibly Parallel Threads
- Generating 2x2 contingency tables
- Coefficient of association for 2x2 contingency tables
- dovecot replication - new and cur folders on mx1 and mx2
- dovecot replication - new and cur folders on mx1 and mx2
- dovecot replication - new and cur folders on mx1 and mx2