Thanks all for your help. I fear text mining is an abstract little corner of
"R".
I have imported 3228 text (.txt) files, each a news story, into R using
[tm]:
textd <- Corpus(DirSource("other/docs"), readerControl =
list(reader
=readPlain))
I can pre-process each individual document using tolower(textd[[1]])
however, when I try to run tmTolower() I get a no such command error, and
then the Term Document Matrix command gives me a peculiar error:
> other.TDM <- TermDocumentMatrix(textd, control = list(stopwords = TRUE))
Error in tolower(txt) :
invalid input 'Valentino bag, breakfasting at West Palm Beach caf? Testa .
. . VALENTINO, in' in 'utf8towcs'>
Is it something to do with the structure of the documents I've read in.
The "tm" documentation is *extremely* abstract, at my Neanderthal
level.
Thanks to anyone who can help
--
View this message in context:
http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3299399.html
Sent from the R help mailing list archive at Nabble.com.
Hi, ia have similar problem you had.Did you manage to find out what that error meant? thanks, m -- View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3540468.html Sent from the R help mailing list archive at Nabble.com.
Matevž Pavlič
2011-May-21 12:59 UTC
[R] Help using "tm" text mining package - preprocessing
Got it...the problem was with Slovenian characters. Once i replaced them with normal characters it works fine. Tnx anyway, m -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of mpavlic Sent: Saturday, May 21, 2011 1:06 PM To: r-help at r-project.org Subject: Re: [R] Help using "tm" text mining package - preprocessing Hi, ia have similar problem you had.Did you manage to find out what that error meant? thanks, m -- View this message in context: http://r.789695.n4.nabble.com/Help-using-tm-text-mining-package-preprocessing-tp3299399p3540468.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.