thr3ads.net - R help - svm implementation using RTextTools [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Ayushi Pandey

2014-Jul-25 10:18 UTC

svm implementation using RTextTools

Hello.
This is the first time i am using RTextTools. I have to implement an SVM
classification on a collection of text documents. I am following this
tutorial.

http://journal.r-project.org/archive/2013-1/collingwood-jurka-boydstun-etal.pdf

I am giving you my code, stepwise.

#First i read my data and gave an index file. The index file had a list of
all the text documents to be classified along with their individual tag.
Example, if there is a file, abc.txt, belonging to the genre X, the index
file will have it stored as abc.txt,X  and so on.

Code :
data =
read_data('C:/Users/dell/Dropbox/Bundeli/corpus/wob/sklearn/folder',
type=c('folder'), index
'C:/Users/dell/Dropbox/Bundeli/corpus/wob/sklearn/index.txt')

#####Second, i create a doc-term matrix.

doc_matrix <- create_matrix(data, language="english",
removeNumbers=TRUE,
stemWords=TRUE, removeSparseTerms=.8)

#####Third, i create a container which houses

container <- create_container(doc_matrix, data$genre, trainSize=1:93,
testSize=94:116, virgin=FALSE)

#######Here, data$genre is a label, where each document has its genre label
given in exact order, aligned like an index.

######So far, there has been no error.

But Now when i try to train the SVM on the container, using the following
code,

SVM <- train_model(container, "SVM")

 ##### It gives me this error.######

Error in svm.default(x = container@training_matrix, y container@training_codes, 
:   x and y don't match.

######When i see the structure of the "container', it shows me training
codes empty. Like this. (attached full structure)######

Slot "training_codes":
factor(0)
Levels:

Slot "testing_codes":
factor(0)
Levels:

#####Can somebody please, please help? I have been desperately trying to
look for some answer. Could there be something wrong with the index file of
read_data, or is it a problem with the data$genre variable? Those are the
new things, i may have gotten them incorrect. I will be most grateful.
#######

	[[alternative HTML version deleted]]

R help - Jul 2014 - svm implementation using RTextTools

svm implementation using RTextTools