search for: andy1234

Displaying 3 results from an estimated 3 matches for "andy1234".

2011 Sep 02
2
Classifying large text corpora using R
Dear everyone, I am new to R, and I am looking at doing text classification on a huge collection of documents (>500,000) which are distributed among 300 classes (so basically, this is my training data). Would someone please be kind enough to let me know about the R packages to use and their scalability (time and space)? I am very new to R and do not know of the right packages to use. I
2011 Jul 31
1
Entropy based feature selection in R
I need to use entropy based feature selection to reduce term space while doing text classification. Are there any R packages available that would help me do this? I can also make do with chi squared based algorithm, if there are packages for that. Thanks in advance. Andy -- View this message in context: http://r.789695.n4.nabble.com/Entropy-based-feature-selection-in-R-tp3708056p3708056.html
2011 Dec 31
1
Reading large sparse arff files into R
Hi, I am trying to read in a large and highly sparse ARFF file into R which was produced by WEKA. However the package 'RWeka' just chokes on this file. The data set has about 40k observations and about 20k dimensions. Even after 1hr read.arff method of RWeka is still trying to read in the file, whereas WEKA is able to read it in in less than 20seconds. What are my options at this