Hi,
In my experience pdftotext did not do a very good job at this because it
screws up the formatting of tables. This of course depends on what
program the pdf document was originally constructed with. What I found
most appealing is the use of cut and paste into xemacs or emacs and use
M-x canonically-space-region function. This will eliminate any extra
spaces. However if the pdf document was prepared through scanning and
one uses a character recognition program, then all is up in the air and
the formatting of tables have to be done by hand.
Jean
rambam at bigpond.net.au wrote:
>>Hi, I'm trying to read data from a PDF file.Is it possible to do it
>>with R? Thanks, Marco
>>
>>
>
>If cut and paste to a text file fails, try this:
>
>pdftotext (from the xpdf project)
>
>or
>
>http://pdftohtml.sourceforge.net
>pdftohtml is a utility which converts PDF files into HTML and
>XML formats
>
>In addition, pdftk, the command line pdf toolkit may be useful
>http://www.accesspdf.com/pdftk/
>
>
>