Please see the R Machine Learning Task View
(cran.r-project.org/web/views/MachineLearning.html) for a
starting point on decision trees.
On 9/14/2011 7:11 PM, Lorenzo Isella wrote:> Dear All,
> I am recycling a previous email of mine where I asked some questions
> about clustering mixed numerical/categorical data. This time I am more
> into data mining. I am given a set of known statistical indexes {s_i},
> i=1,2...N for a N countries. These indexes in general are a both
> numerical and categorical variables. For each country, I also have a
> property x_i whose value is known, but that I also would like to be
> able to predict correctly using a model. This is needed in order to
> assess the importance of the various indexes in determining {x_i}.
> There are two cases of interest
>
> (1) all the {x_i} are numerical variables, e.g. the average life
> expectancy
>
> (2) all the {x_i} are categorical variables (e.g. the fact that the
> country joins treaty A, B or C). This reminds me of discrete choice
> models.
>
> Any suggestions about how to tackle this problems? In the past I used
> mclust, but it is limited to all the {s_i} being numerical variables.
>
> I saw an example of the use of glm for predicting binary variables
>
> ats.ucla.edu/stat/R/dae/probit.htm
>
> which may be relevant for (2). In general I know that some people use
> Weka for this sort of tasks, but I wonder if I can use R to get a
> decision tree and a confusion matrix and to be able to predict how the
> {x_i} would change by varying the value of one statistical index.
> Many thanks for your suggestions
>
> Lorenzo
>
> ______________________________________________
> R-help@r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]