Cool, thanks Jim!! I would love to be able to write my own script for this as I have many images/ pdf's in a folder and would like to batch process them using an R script!! Thanks On Tuesday, July 26, 2016, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Shane, > FreeOCR is a really good place to start. > > http://www.paperfile.net/ > > Jim > > > On Wed, Jul 27, 2016 at 6:11 AM, Shane Carey <careyshan at gmail.com > <javascript:;>> wrote: > > Hi, > > > > Has anyone ever done any ocr in R?? I have some scanned images that I > would > > like to convert to text!! > > Thanks > > > > > > -- > > Le gach dea ghui, > > Shane > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and > more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >-- Le gach dea ghui, Shane [[alternative HTML version deleted]]
Hi Shane, If you want to run OCR on the command line, the Tessaract engine is probably the way to go. Harder to build and install, but you can call it from an R session. Jim On Wed, Jul 27, 2016 at 8:24 AM, Shane Carey <careyshan at gmail.com> wrote:> Cool, thanks Jim!! > I would love to be able to write my own script for this as I have many > images/ pdf's in a folder and would like to batch process them using an R > script!! > Thanks > > > On Tuesday, July 26, 2016, Jim Lemon <drjimlemon at gmail.com> wrote: >> >> Hi Shane, >> FreeOCR is a really good place to start. >> >> http://www.paperfile.net/ >> >> Jim >> >> >> On Wed, Jul 27, 2016 at 6:11 AM, Shane Carey <careyshan at gmail.com> wrote: >> > Hi, >> > >> > Has anyone ever done any ocr in R?? I have some scanned images that I >> > would >> > like to convert to text!! >> > Thanks >> > >> > >> > -- >> > Le gach dea ghui, >> > Shane >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Le gach dea ghui, > Shane >
On Wed, 27 Jul 2016, Shane Carey wrote:> Cool, thanks Jim!! > I would love to be able to write my own script for this as I have many > images/ pdf's in a folder and would like to batch process them using an R > script!!The underlying engine is "tesseract" which is also available as a command-line tool and on other OSs. In principle, it is not hard to call it with a system() command and then readLines() the resulting text. However, it might be useful to play with the available options in the GUI first to see what works best for your images.> Thanks > > On Tuesday, July 26, 2016, Jim Lemon <drjimlemon at gmail.com> wrote: > >> Hi Shane, >> FreeOCR is a really good place to start. >> >> http://www.paperfile.net/ >> >> Jim >> >> >> On Wed, Jul 27, 2016 at 6:11 AM, Shane Carey <careyshan at gmail.com >> <javascript:;>> wrote: >>> Hi, >>> >>> Has anyone ever done any ocr in R?? I have some scanned images that I >> would >>> like to convert to text!! >>> Thanks >>> >>> >>> -- >>> Le gach dea ghui, >>> Shane >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and >> more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Le gach dea ghui, > Shane > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
https://cran.rstudio.com/web/packages/abbyyR/index.html https://github.com/greenore/ocR https://electricarchaeology.ca/2014/07/15/doing-ocr-within-r/ that was from a Google "r ocr" search. So, yes, there are options. On Tue, Jul 26, 2016 at 6:43 PM, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:> On Wed, 27 Jul 2016, Shane Carey wrote: > >> Cool, thanks Jim!! >> I would love to be able to write my own script for this as I have many >> images/ pdf's in a folder and would like to batch process them using an R >> script!! > > > The underlying engine is "tesseract" which is also available as a > command-line tool and on other OSs. In principle, it is not hard to call it > with a system() command and then readLines() the resulting text. However, it > might be useful to play with the available options in the GUI first to see > what works best for your images. > > >> Thanks >> >> On Tuesday, July 26, 2016, Jim Lemon <drjimlemon at gmail.com> wrote: >> >>> Hi Shane, >>> FreeOCR is a really good place to start. >>> >>> http://www.paperfile.net/ >>> >>> Jim >>> >>> >>> On Wed, Jul 27, 2016 at 6:11 AM, Shane Carey <careyshan at gmail.com >>> <javascript:;>> wrote: >>>> >>>> Hi, >>>> >>>> Has anyone ever done any ocr in R?? I have some scanned images that I >>> >>> would >>>> >>>> like to convert to text!! >>>> Thanks >>>> >>>> >>>> -- >>>> Le gach dea ghui, >>>> Shane >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org <javascript:;> mailing list -- To UNSUBSCRIBE and >>> >>> more, see >>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> >>> http://www.R-project.org/posting-guide.html >>>> >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> -- >> Le gach dea ghui, >> Shane >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.