This is neither the Xpdf support forum nor the Windows Setup Program Reinvention
support group... and you really need to read and follow the Posting Guide for
the R mailing lists.
FWIW I would guess that you need to learn about environment variables and in
particular about the PATH variable. There are subtleties about when and how they
get defined that are OS-specific and certainly off topic here that may trip you
up along the way. Alternatively, you may read the Xpdf documentation or a how-to
blog about Xpdf that gives you a recipe, but again that is not about R. Once you
can start a CMD shell and run the command directly then you are most of the way
to getting R to invoke it.
--
Sent from my phone. Please excuse my brevity.
On July 21, 2016 5:26:26 PM PDT, Steven Kang <stochastickang at gmail.com>
wrote:>Hi R users,
>
>I?m having some issues trying to extract texts from PDF file using tm
>package.
>
>Here are the steps that were carried out:
>
>1. Downloaded and installed the following programs:
>
>- Xpdf (Copied the ?bin32?, ?bin64?, ?doc? folders into ?C:\Program
>Files\Xpdf? directory; also added C:\Program
>Files\Xpdf\bin64\pdfinfo.exe &
>C:\Program Files\Xpdf\bin64\pdftotext.exe in existing PATH
>
>- Tesseract
>
>- Imagemagick
>
>2. Used the following scripts and the corresponding error messages:
>
># Directory where PDF files are stored
>
>>cname <- getwd()
>
>>Corpus(DirSource(cname), readerControl=list(reader = readPDF))
>
>Error in system2("pdftotext", c(control$text, shQuote(x),
"-"), stdout
>>TRUE) :
>'"pdftotext"' not found
>
> In addition: Warning message:
>
>running command '"pdfinfo"
"C:\Users\R_Files\XXX.pdf"' had status 127
>
>>file.exists(Sys.which(c("pdfinfo","pdftpotext")))
>[1] FALSE FALSE
>
>It seems like R can?t find pdfinfo & pdftotext exe files, but not sure
>as
>to why this would be the case despite xpdf files being copied into
>?C:\Program Files? (Im using Windows 7 64bits)
>
>I?m aware that ?pdf_text? function from pdftools package can extract
>texts
>from PDF file and outputs into a string. But I was after something
>which is
>able to convert PDF (ie transaction data) into a dataframe without
>regular
>expression. Is tm package capable of doing this conversion? Are there
>any
>other alternatives to these methods?
>
>Your expertise in resolving this problem would be highly appreciated.
>
>
>Steve
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.