Hello. I am trying to work with the text mining package tm.
I have a directory called textsTweet1 which contains three files
short.txt
myTextFile.txt
myTextFile.csv
short.txt contains one line: THE CAT IN THE HAT\n
myTextFile contains some tweets from Twitter. The first few lines of
myTextFile.txt are:
@oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I
NEED TO GET ON JAPAN TIME NOW. NO SLEEP!!!SAKURA at Niigata, Japan
http://ff.im/-29ufG19:30 [BS Japan] ????????? #50 ????????????????????RT@
kvsrinath Japan's New Flat Screens: The Eco-Friendly TV .
http://is.gd/sIS7 #greenMold99 says: Introduction to Chiropractic and manual
therapeutics when unfit.Choice of schools in Japan, and mo...
http://i.sitesays.com/lc7Japan Said to Sell 17 Trillion Yen of Extra Bonds -
Bloomberg
Actually there were no new lines in the original file but I inserted a new
line before every occurrence of http.
I ran the following code:
library("tm")
my.path <- 'C:\\dataForR\\textsTweet1\\'
my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv'
(ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain,
language = "la")))
Response from R:
A text document collection with 3 text documents
Warning message:
In readLines(filename, encoding = encoding) :
incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt'
Then I ran the TermDocMatrix function. It is supposed to take a file and
more or less count the occurrences of each word in the file. Or as the
documentation says "Constructs a term-document matrix"
> tdm<-TermDocMatrix(ovid)
> Data(tdm)[1:2, 105:107]
2 x 3 sparse Matrix of class "dgCMatrix"
revealed said sakura
1 . . .
2 15 15 15
> Data(tdm)[1:21, 100:105]
Error in intI(i, n = di[1], dn = dn[[1]]) : index larger than maximal 3
I don't understand why I am getting only two lines. I can see that the first
line is for the short.txt file
and the second line seems to be for the whole myTextFile.txt file.
How can I get TermDocMatrix to output each row of myTextFile.txt as a
separate row?
Thanks very much.
--
View this message in context:
http://www.nabble.com/question-about-the-Text-Mining-package-tm-tp23091573p23091573.html
Sent from the R help mailing list archive at Nabble.com.