Hello,
I am working on "tm" package.
I have 2 pdf files saved in the directory D:/Files
I issued the following commands (marked in red bold) for which I got some
errors and warnings (marked in bold)
*surgj <- Corpus(DirSource("D:/Files"), readerControl =
list(language "ansi"))*
*Warning messages:
1: In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf'
2: In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'D:/Files/provmedsurgj00978-0007.pdf'*
*> inspect(surgj)*
*A corpus with 2 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID
[[1]]
%PDF-1.3
Error: invalid input '%Åþë×' in 'utf8towcs'*
Could anybody help me to identify where I went wrong and what I need to do
to proceed further?
Thanks,
Shreyasee
[[alternative HTML version deleted]]