Colleagues,
I'm trying to convert a pdf to a text file with the following code.
# pdf to excel
library(pdftools) # pdf to excel library
# set working directory
setwd("C:/Users")
# input pdf
txt <- pdf_text("C:/Users/10619.pdf")
cat(txt[1])
write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names
=TRUE,col.names =FALSE)
When I examine the contents of cat(txt[1]) on the console, everything I need is
displayed in the format I need.
However when I execute write.table(cat(txt[1]),file="10619.txt",sep=
"\t",row.names =TRUE,col.names =FALSE) and examine the output, my
output does not match cat(txt[1]).
I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might be
the error.
How can one output the contents of cat(txt[1]) and retain its format?
Thomas Subia
[[alternative HTML version deleted]]
Hi Thomas,
Perhaps you should be doing something like writeLines(txt[1],...) or just:
sink("10619.txt")
cat(txt[1])
sink()
Jim
On Thu, Oct 31, 2019 at 4:48 PM Thomas Subia <tsubia at imgprecision.com>
wrote:>
> Colleagues,
>
> I'm trying to convert a pdf to a text file with the following code.
>
> # pdf to excel
> library(pdftools) # pdf to excel library
> # set working directory
> setwd("C:/Users")
> # input pdf
> txt <- pdf_text("C:/Users/10619.pdf")
> cat(txt[1])
> write.table(cat(txt[1]),file="10619.txt",sep=
"\t",row.names =TRUE,col.names =FALSE)
>
> When I examine the contents of cat(txt[1]) on the console, everything I
need is displayed in the format I need.
>
> However when I execute
write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names
=TRUE,col.names =FALSE) and examine the output, my output does not match
cat(txt[1]).
> I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might
be the error.
>
> How can one output the contents of cat(txt[1]) and retain its format?
>
> Thomas Subia
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Jim,
That works well!
Thanks again for your help!
Thomas Subia
-----Original Message-----
From: Jim Lemon <drjimlemon at gmail.com>
Sent: Wednesday, October 30, 2019 11:14 PM
To: Thomas Subia <tsubia at imgprecision.com>
Cc: r-help at r-project.org
Subject: Re: [R] Help for pdf conversion
Hi Thomas,
Perhaps you should be doing something like writeLines(txt[1],...) or just:
sink("10619.txt")
cat(txt[1])
sink()
Jim
On Thu, Oct 31, 2019 at 4:48 PM Thomas Subia <tsubia at imgprecision.com>
wrote:>
> Colleagues,
>
> I'm trying to convert a pdf to a text file with the following code.
>
> # pdf to excel
> library(pdftools) # pdf to excel library # set working directory
> setwd("C:/Users")
> # input pdf
> txt <- pdf_text("C:/Users/10619.pdf")
> cat(txt[1])
> write.table(cat(txt[1]),file="10619.txt",sep=
"\t",row.names
> =TRUE,col.names =FALSE)
>
> When I examine the contents of cat(txt[1]) on the console, everything I
need is displayed in the format I need.
>
> However when I execute
write.table(cat(txt[1]),file="10619.txt",sep= "\t",row.names
=TRUE,col.names =FALSE) and examine the output, my output does not match
cat(txt[1]).
> I suspect that sep= "\t",row.names =TRUE,col.names =FALSE) might
be the error.
>
> How can one output the contents of cat(txt[1]) and retain its format?
>
> Thomas Subia
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.