On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote:
> Hello everybody,
> I work, I try, with TM but I have a problem with some special words in
> french. I think this is due to the manner to transform PDF to text,
> but I'm
> not perfectly sure.
> Let's see to the example :
>
> findFreqTerms(tdm1,30)
> [33] "<U+F0A3>" "<U+FB01>n"
"<U
> +FB01>nancement"
> "<U+FB01>nancier" "<U+FB01>nanci?re"
"<U+FB01>nanci?res"
> "<U+FB01>nanciers" "<U+FB01>xe"
>
> Some french words are not well reading by TM with the reader
> readPlain. I
> try to use reader= reader PDF. But it doesn't work so I must
> transformed PDF
> text to text. And some words are not understand so when I use
> TermDocumentMatrix a word like inflation diseappear. It's a big
> probleme for
> me. I spend lot of time on this problem, any idea ? Thank's for you
> time.
You included no information about your platform, locale settings, or
encoding of the text.
?Encoding
?sessionInfo
--
David Winsemius, MD
West Hartford, CT