mmv.listservs
2009-Aug-11 19:56 UTC
[R] Data Mining Packages in R for categorical and numerical values
Hello, I''ve looked around and I can't seem to find a package to do
data
mining in R for a mixture of categorical and numerical attributes.
If you have this data set:
## dummy data
set.seed(123)
dummy <- data.frame(A = sample(paste("tasks",1:100), 10000,B
sample(paste("loads",1:100), 10000,
replace = TRUE),
B = rnorm(10000))
## We can then try this:
op <- par(mar = c(5,6,4,2) + 0.1)
boxplot(B ~ A, data = dummy, horizontal = TRUE, axes = FALSE)
axis(side = 1)
axis(side = 2, at = seq_along(levels(dummy$A)),
labels = levels(dummy$A), cex.axis = 0.5,
las = 1)
box()
par(op)
which gives 10,000 rows x 4 columns where one column is categorical
("tasks,
loads") and the other 2 cols are numeric.
Is it possible to do data mining like clustering on a mixture of categorical
and numeric variables? If so what package should I be studying or using? Is
random forest the only algorithm that can handle a mixture of attributes?
Thanks
[[alternative HTML version deleted]]
