thr3ads.net - R help - [R] Reading PDF files [Dec 2012]

If this information is useful, please help other people find it:
Share via:

rahul143

2012-Dec-02 16:25 UTC

[R] Reading PDF files

I need to do text mining on PDF files. I understand there is a readPDF 
command in tm that can be used. Have read the 2008 posts on converting 
PDF files to text by Tony Breyal and others. 

  

Wondering if the procedure has been standardized in any tutorial or 
otherwise? Being new to R, I was able to follow only part of the 
discussion. 

  

Any way to get a set of step by step instructions appropriate for my 
level? I am an ageing academic who has worked mostly with SAS and 
MATLAB. 

  



-----
TO GET MORE DETAILS CLICK HERE  
--
View this message in context:
http://r.789695.n4.nabble.com/Reading-PDF-files-tp4651657.html
Sent from the R help mailing list archive at Nabble.com.

jose romero

2012-Dec-03 16:25 UTC

head link

[R] Reading PDF files

Hello:

Apart from readPDF in the tm package, you can use the pdf to text converter
command in linux, which is "pdftotext".  Say "file.pdf" is
your file, from R you'd use:

system("pdftotext file.pdf -layout")

This invokes the pdftotext command from within R and creates a file called
"file.txt" with the converted pdf, which you'd have to read into
R.  The -layout option is so the conversion to text is as similar as possible to
the original layout of the pdf file.

Regards,

jose loreto romero palma
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more maybe matching threads

R help - Dec 2012 - Reading PDF files

[R] Reading PDF files

[R] Reading PDF files

Seemingly Similar Threads