thr3ads.net - R help - [R] reading data from a pdf [Oct 2005]

If this information is useful, please help other people find it:
Share via:

rambam@bigpond.net.au

2005-Oct-22 10:31 UTC

[R] reading data from a pdf

> Hi, I'm trying to read data from a PDF file.Is it possible to do it
> with R? Thanks,  Marco
If cut and paste to a text file fails, try this:

pdftotext (from the xpdf project)

or

http://pdftohtml.sourceforge.net
pdftohtml is a utility which converts PDF files into HTML and
XML formats

In addition, pdftk, the command line pdf toolkit may be useful
http://www.accesspdf.com/pdftk/

-- 

Seek simplicity and mistrust it.
Alfred Whitehead

A witty saying proves nothing. 
Voltaire

Jean Eid

2005-Oct-24 15:04 UTC

head link

[R] reading data from a pdf

Hi,

In my experience pdftotext did not do a very good job at this because it 
screws up the formatting of tables. This of course depends on what 
program the pdf document was originally constructed with. What I found 
most appealing is the use of cut and paste into xemacs or emacs and use 
M-x  canonically-space-region function. This  will eliminate any extra 
spaces. However if the pdf document was prepared through scanning and 
one uses a  character recognition program, then all is up in the air and 
the formatting of tables have to be done by hand.

Jean
rambam at bigpond.net.au wrote:
>>Hi, I'm trying to read data from a PDF file.Is it possible to do it
>>with R? Thanks,  Marco
>>    
>>
>
>If cut and paste to a text file fails, try this:
>
>pdftotext (from the xpdf project)
>
>or
>
>http://pdftohtml.sourceforge.net
>pdftohtml is a utility which converts PDF files into HTML and
>XML formats
>
>In addition, pdftk, the command line pdf toolkit may be useful
>http://www.accesspdf.com/pdftk/
>
>  
>

Reasonably Related Threads

Search for more apparently analagous threads

R help - Oct 2005 - reading data from a pdf

[R] reading data from a pdf

[R] reading data from a pdf

Reasonably Related Threads