mmv.listservs
2009-Aug-11 19:56 UTC
[R] Data Mining Packages in R for categorical and numerical values
Hello, I''ve looked around and I can't seem to find a package to do data mining in R for a mixture of categorical and numerical attributes. If you have this data set: ## dummy data set.seed(123) dummy <- data.frame(A = sample(paste("tasks",1:100), 10000,B sample(paste("loads",1:100), 10000, replace = TRUE), B = rnorm(10000)) ## We can then try this: op <- par(mar = c(5,6,4,2) + 0.1) boxplot(B ~ A, data = dummy, horizontal = TRUE, axes = FALSE) axis(side = 1) axis(side = 2, at = seq_along(levels(dummy$A)), labels = levels(dummy$A), cex.axis = 0.5, las = 1) box() par(op) which gives 10,000 rows x 4 columns where one column is categorical ("tasks, loads") and the other 2 cols are numeric. Is it possible to do data mining like clustering on a mixture of categorical and numeric variables? If so what package should I be studying or using? Is random forest the only algorithm that can handle a mixture of attributes? Thanks [[alternative HTML version deleted]]