Hello.
This is the first time i am using RTextTools. I have to implement an SVM
classification on a collection of text documents. I am following this
tutorial.
http://journal.r-project.org/archive/2013-1/collingwood-jurka-boydstun-etal.pdf
I am giving you my code, stepwise.
#First i read my data and gave an index file. The index file had a list of
all the text documents to be classified along with their individual tag.
Example, if there is a file, abc.txt, belonging to the genre X, the index
file will have it stored as abc.txt,X and so on.
Code :
data =
read_data('C:/Users/dell/Dropbox/Bundeli/corpus/wob/sklearn/folder',
type=c('folder'), index
'C:/Users/dell/Dropbox/Bundeli/corpus/wob/sklearn/index.txt')
#####Second, i create a doc-term matrix.
doc_matrix <- create_matrix(data, language="english",
removeNumbers=TRUE,
stemWords=TRUE, removeSparseTerms=.8)
#####Third, i create a container which houses
container <- create_container(doc_matrix, data$genre, trainSize=1:93,
testSize=94:116, virgin=FALSE)
#######Here, data$genre is a label, where each document has its genre label
given in exact order, aligned like an index.
######So far, there has been no error.
But Now when i try to train the SVM on the container, using the following
code,
SVM <- train_model(container, "SVM")
##### It gives me this error.######
Error in svm.default(x = container@training_matrix, y container@training_codes,
: x and y don't match.
######When i see the structure of the "container', it shows me training
codes empty. Like this. (attached full structure)######
Slot "training_codes":
factor(0)
Levels:
Slot "testing_codes":
factor(0)
Levels:
#####Can somebody please, please help? I have been desperately trying to
look for some answer. Could there be something wrong with the index file of
read_data, or is it a problem with the data$genre variable? Those are the
new things, i may have gotten them incorrect. I will be most grateful.
#######
[[alternative HTML version deleted]]