thr3ads.net - R help - [R] scanning a pdf scan [Oct 2006]

If this information is useful, please help other people find it:
Share via:

roger koenker

2006-Oct-27 16:34 UTC

[R] scanning a pdf scan

I have a pdf scan of several pages of data from a quite famous old  
paper by
C.S. Pierce (1873).  I would like (what else?) to convert it into an  
R dataframe.
Somewhat to my surprise the pdf seems to already be in a character  
recognized
form, since I can search for numerical strings and they are nicely  
found.  Of
course, as is usual with such tables there are also headings and  
column lines, etc
etc. that are less interesting than the numbers themselves.  I've  
tried saving the
pdf in various formats, some of which look vaguely tractable, but I'm  
hoping
that there is something that is more automatic.

Does anyone have experience that they could share toward this objective?


url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    rkoenker at uiuc.edu            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Champaign, IL 61820

Gabor Grothendieck

2006-Oct-27 16:52 UTC

head link

[R] scanning a pdf scan

I don't have specific experience with this but strapply
of package gsubfn can extract information from a string by content
as opposed to delimiters. e.g.
> library(gsubfn)
> strapply("abc34def56xyz", "[0-9]+", c)[[1]][1] "34" "56"

On 10/27/06, roger koenker <rkoenker at uiuc.edu>
wrote:> I have a pdf scan of several pages of data from a quite famous old
> paper by
> C.S. Pierce (1873).  I would like (what else?) to convert it into an
> R dataframe.
> Somewhat to my surprise the pdf seems to already be in a character
> recognized
> form, since I can search for numerical strings and they are nicely
> found.  Of
> course, as is usual with such tables there are also headings and
> column lines, etc
> etc. that are less interesting than the numbers themselves.  I've
> tried saving the
> pdf in various formats, some of which look vaguely tractable, but I'm
> hoping
> that there is something that is more automatic.
>
> Does anyone have experience that they could share toward this objective?
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    rkoenker at uiuc.edu            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Champaign, IL 61820
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more reasonably related threads

R help - Oct 2006 - scanning a pdf scan

[R] scanning a pdf scan

[R] scanning a pdf scan

Apparently Analagous Threads