Hello I have around 200 PDF-documents, containing data i want organized in R as a dataframe. The PDF-documents look like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg or like this; http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg So i want to pull out the data in coloured boxes it become organized like this (just in R instead of excel); http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and "PRRS-pos DK" or "DK" presented by a 1 in the colum PRRS-DK. And also with "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there should be a 1 in the colum DKsan. The first date for each "CHR-nr" should either be the earliest date ne the red box (as in the first picture), or the date with word "f?r" before the date (as in the second picture). All the 200 PDF-documents looks like the ones in the pictures, each reprenting a different "CHR-nr" Hope you can help me -- View this message in context: http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21667074.html Sent from the R help mailing list archive at Nabble.com.
joe1985 wrote:> Hello > > I have around 200 PDF-documents, containing data i want organized in R as a > dataframe. The PDF-documents look like this; > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg > > or like this; > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg > > So i want to pull out the data in coloured boxes it become organized like > this (just in R instead of excel); > > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg > > So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a > 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same with > "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and > "PRRS-pos DK" or "DK" presented by a 1 in the colum PRRS-DK. And also with > "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there > should be a 1 in the colum DKsan. The first date for each "CHR-nr" should > either be the earliest date ne the red box (as in the first picture), or the > date with word "f?r" before the date (as in the second picture). All the 200 > PDF-documents looks like the ones in the pictures, each reprenting a > different "CHR-nr" > > > Hope you can help meNot on the basis of .jpeg files, I think. We'd need some indication of what the PDF looks like inside. There's a tool called pdftotext, which might do something for you, IF you can figure out reliably where your data begin and end. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
You can convert the pdf to text, then manipulate the output to read only the data. In linux has pdftotext function, in linux you can download the xpdf zip, that contais such function. Best On 1/26/09, joe1985 <johannes@dsr.life.ku.dk> wrote:> > > Hello > > I have around 200 PDF-documents, containing data i want organized in R as a > dataframe. The PDF-documents look like this; > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver.jpeg > > or like this; > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bmed%2Bfarver%2B2.jpeg > > So i want to pull out the data in coloured boxes it become organized like > this (just in R instead of excel); > > > http://www.nabble.com/file/p21667074/PRRS-billede%2Bexcel.jpeg > > So the 0'es and 1'es represent when either "PRRS-neg" occurs presented by a > 0 in the colums PRRS-VAC and PRRS-DK on a particular date. And the same > with > "PRRS-pos VAC" or "Vac" presented by a 1 in the colum PRRS-VAC, and > "PRRS-pos DK" or "DK" presented by a 1 in the colum PRRS-DK. And also with > "sanVAC" there should be a 1 in the colum VACsan, and with "sanDK" there > should be a 1 in the colum DKsan. The first date for each "CHR-nr" should > either be the earliest date ne the red box (as in the first picture), or > the > date with word "før" before the date (as in the second picture). All the > 200 > PDF-documents looks like the ones in the pictures, each reprenting a > different "CHR-nr" > > > Hope you can help me > -- > View this message in context: > http://www.nabble.com/Getting-data-from-a-PDF-file-into-R-tp21667074p21667074.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
Possibly Parallel Threads
- Text in a character vector to indicate "ifelse" argument
- extracting parts of words or extraxting letter to use in ifelse-func.
- Odp: Odp: extracting parts of words or extraxting letter to use in ifelse-func.
- t.test in a loop
- [LLVMdev] Re: Building CFE in Mingw