thr3ads.net - R help - [R] Categorical Variables and Machine Learning [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Lorenzo Isella

2011-Feb-17 14:13 UTC

[R] Categorical Variables and Machine Learning

Dear All,
Please consider a dataframe like the one below (I am showing only a few 
rows).
>          role degree strength weight count disparity intermittency
>            P     10       82  18017     2  2.317073  5.550314e-05
>            P      7      529   4345    60  5.178466  6.904488e-03
>            P      8      609   4382    10  6.204535  1.141031e-03
>            D     42      230   6910    88  1.791153  6.367583e-03
You have a categorical variable (the role variable) which can assume 
only a few values
("P","D","C","N","A")
referring to different
individuals for whom you collect some extra properties (namely, degree, 
strength, weight, disparity and intermittency, like in the table above).
My goal is to find the most suitable property (or combination of 
properties) to guess the role of an individual. It looks like a typical 
machine learning problem, but I have categorical variables to predict.
I am drowning in the wealth of R packages for machine learning, but I 
really would like something simple and easy to use (consider that the 
dataset covers only 120 individuals, so performance is not a problem).
Any suggestion is appreciated.
Cheers

Lorenzo

Andrew Ziem

2011-Feb-17 16:18 UTC

head link

[R] Categorical Variables and Machine Learning

Try the function ctree() in the package party or earth() in earth.   You can use
factor variable as is, or you can transform the factor to binary variables
(i.e., is_P is 0 or 1, is_D is 0 or 1).  In the second case, you can use any
algorithm, and earth() automatically transforms factors to binary features.

However, you may find 120 variables is not much data.


Andrew



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Lorenzo Isella
Sent: Thursday, February 17, 2011 7:14 AM
To: r-help
Subject: [R] Categorical Variables and Machine Learning

Dear All,
Please consider a dataframe like the one below (I am showing only a few
rows).
>          role degree strength weight count disparity intermittency
>            P     10       82  18017     2  2.317073  5.550314e-05
>            P      7      529   4345    60  5.178466  6.904488e-03
>            P      8      609   4382    10  6.204535  1.141031e-03
>            D     42      230   6910    88  1.791153  6.367583e-03
You have a categorical variable (the role variable) which can assume
only a few values
("P","D","C","N","A")
referring to different
individuals for whom you collect some extra properties (namely, degree,
strength, weight, disparity and intermittency, like in the table above).
My goal is to find the most suitable property (or combination of
properties) to guess the role of an individual. It looks like a typical
machine learning problem, but I have categorical variables to predict.
I am drowning in the wealth of R packages for machine learning, but I
really would like something simple and easy to use (consider that the
dataset covers only 120 individuals, so performance is not a problem).
Any suggestion is appreciated.
Cheers

Lorenzo

R help - Feb 2011 - Categorical Variables and Machine Learning

[R] Categorical Variables and Machine Learning

[R] Categorical Variables and Machine Learning