Colleagues, I'm trying to convert a pdf to a text file with the following code. # pdf to excel library(pdftools) # pdf to excel library # set working directory setwd("C:/Users") # input pdf txt <- pdf_text("C:/Users/10619.pdf") cat(txt[1]) write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) When I examine the contents of cat(txt[1]) on the console, everything I need is displayed in the format I need. However when I execute write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) and examine the output, my output does not match cat(txt[1]). I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be the error. How can one output the contents of cat(txt[1]) and retain its format? Thomas Subia [[alternative HTML version deleted]]
Hi Thomas, Perhaps you should be doing something like writeLines(txt[1],...) or just: sink("10619.txt") cat(txt[1]) sink() Jim On Thu, Oct 31, 2019 at 4:48 PM Thomas Subia <tsubia at imgprecision.com> wrote:> > Colleagues, > > I'm trying to convert a pdf to a text file with the following code. > > # pdf to excel > library(pdftools) # pdf to excel library > # set working directory > setwd("C:/Users") > # input pdf > txt <- pdf_text("C:/Users/10619.pdf") > cat(txt[1]) > write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) > > When I examine the contents of cat(txt[1]) on the console, everything I need is displayed in the format I need. > > However when I execute write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) and examine the output, my output does not match cat(txt[1]). > I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be the error. > > How can one output the contents of cat(txt[1]) and retain its format? > > Thomas Subia > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Jim, That works well! Thanks again for your help! Thomas Subia -----Original Message----- From: Jim Lemon <drjimlemon at gmail.com> Sent: Wednesday, October 30, 2019 11:14 PM To: Thomas Subia <tsubia at imgprecision.com> Cc: r-help at r-project.org Subject: Re: [R] Help for pdf conversion Hi Thomas, Perhaps you should be doing something like writeLines(txt[1],...) or just: sink("10619.txt") cat(txt[1]) sink() Jim On Thu, Oct 31, 2019 at 4:48 PM Thomas Subia <tsubia at imgprecision.com> wrote:> > Colleagues, > > I'm trying to convert a pdf to a text file with the following code. > > # pdf to excel > library(pdftools) # pdf to excel library # set working directory > setwd("C:/Users") > # input pdf > txt <- pdf_text("C:/Users/10619.pdf") > cat(txt[1]) > write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names > =TRUE,col.names =FALSE) > > When I examine the contents of cat(txt[1]) on the console, everything I need is displayed in the format I need. > > However when I execute write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names =TRUE,col.names =FALSE) and examine the output, my output does not match cat(txt[1]). > I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be the error. > > How can one output the contents of cat(txt[1]) and retain its format? > > Thomas Subia > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.