Displaying 11 results from an estimated 11 matches for "textreadr".
2023 Dec 29
2
Help request: Parsing docx files for key words and appending to a spreadsheet
...However, I want to ensure that the keyword coverage meets the threshold of >= 50%; if not, then pass onto the next article in the directory. Rinse and repeat for the entire directory.
>
> So far, I've tried working through some Stack Overflow-based solutions, but most seem to use the textreadr package, which is now deprecated; others use either the officer or the officedown packages. However, these packages don't appear to do what I want the program to do, at least not in any of the examples I have found, nor in the vignettes and relevant package manuals I've looked at.
>
>...
2023 Dec 29
1
Help request: Parsing docx files for key words and appending to a spreadsheet
Hi Roy (& others)
Many thanks for the advice - well taken. Thanks also to the others who
have responded so quickly - I thought I might have to wait days!! :-)
I'm on a Linux (Mint) machine. Below, I document three attempts, two
using officer and the last now using textreadr
My attempts so far using 'officer':
##################
(1) First Attempt:
# Load libraries
library(tcltk)
library(tidyverse)
library(officer)
setwd(tk_choose.dir())
doc_path <- list.files(getwd(), pattern = ".docx", full.names = TRUE)
files <- list.files(getwd(), &quo...
2020 Oct 07
1
Adding text to existing PDF's created with R
...this text, the PDF should be unchanged (except for a new filename).
The intent is as follows:
I have multiple PDFs that I eventually merge into a single PDF, separating each one with a separator page.
The content of the separator pages comes from a Word document.
The task is performed with textreadr, officer, and pdftools.
I can insert page numbers into the separator pages (created as PDF documents).
I join the separator pages and the original PDFs using python's join command.
But I have not been able to figure out how to add page numbers to the existing PDF's.
Any help would be appre...
2023 Dec 29
2
Help request: Parsing docx files for key words and appending to a spreadsheet
...dsheet. However, I want to ensure that the keyword
coverage meets the threshold of >= 50%; if not, then pass onto the next
article in the directory. Rinse and repeat for the entire directory.
So far, I've tried working through some Stack Overflow-based solutions,
but most seem to use the textreadr package, which is now deprecated;
others use either the officer or the officedown packages. However, these
packages don't appear to do what I want the program to do, at least not
in any of the examples I have found, nor in the vignettes and relevant
package manuals I've looked at.
The...
2018 Jan 24
0
Newbie - Scrape Data From PDFs?
...hat was related.
I just did a quick search and found a few hits that might work for you.
1. https://medium.com/@CharlesBordet/how-to-extract-and-clean-data-from-pdf-files-in-r-da11964e252e
2. http://bxhorn.com/2016/extract-data-tables-from-pdf-files-in-r/
3. https://www.rdocumentation.org/packages/textreadr/versions/0.7.0/topics/read_pdf
HTH,
Eric
On Wed, Jan 24, 2018 at 3:58 AM, Scott Clausen <scottclausen at mac.com> wrote:
> Hello,
>
> I?m new to R and am using it with RStudio to learn the language. I?m doing so as I have quite a lot of traffic data I would like to explore. My prob...
2018 Jan 24
2
Newbie - Scrape Data From PDFs?
Hello,
I?m new to R and am using it with RStudio to learn the language. I?m doing so as I have quite a lot of traffic data I would like to explore. My problem is that all the data is located on a number of PDFs. Can someone point me to info on gathering data from other sources? I?ve been to the R FAQ and didn?t see anything and would appreciate your thoughts.
I am quite sure now that often,
2024 Jan 06
0
Help request: Parsing docx files for key words and appending to a spreadsheet
...ves me ideas about how to work it through for the
missing fields, which is one of the major sticking points I kept bumping
up against.
Thank you so much for this.
All the best
Andy
On 05/01/2024 13:59, Howard, Tim G (DEC) wrote:
> Here's a simplified version of how I would do it, using `textreadr` but otherwise base functions. I haven't done it
> all, but have a few examples of finding the correct row then extracting the right data.
> I made a duplicate of the file you provided, so this loops through the two identical files, extracts a few parts,
> then sticks those parts in a...
2018 Jan 24
1
Newbie - Scrape Data From PDFs?
...quick search and found a few hits that might work for you.
>
> 1.
> https://medium.com/@CharlesBordet/how-to-extract-and-clean-data-from-pdf-files-in-r-da11964e252e
> 2. http://bxhorn.com/2016/extract-data-tables-from-pdf-files-in-r/
> 3.
> https://www.rdocumentation.org/packages/textreadr/versions/0.7.0/topics/read_pdf
>
> HTH,
> Eric
>
> On Wed, Jan 24, 2018 at 3:58 AM, Scott Clausen <scottclausen at mac.com>
> wrote:
> > Hello,
> >
> > I?m new to R and am using it with RStudio to learn the language. I?m
> doing so as I have quite a lot...
2023 Dec 29
1
Help request: Parsing docx files for key words and appending to a spreadsheet
...nt to ensure that the keyword
> coverage meets the threshold of >= 50%; if not, then pass onto the next
> article in the directory. Rinse and repeat for the entire directory.
>
> So far, I've tried working through some Stack Overflow-based solutions,
> but most seem to use the textreadr package, which is now deprecated;
> others use either the officer or the officedown packages. However, these
> packages don't appear to do what I want the program to do, at least not
> in any of the examples I have found, nor in the vignettes and relevant
> package manuals I've...
2023 Dec 29
1
Help request: Parsing docx files for key words and appending to a spreadsheet
...word
> > coverage meets the threshold of >= 50%; if not, then pass onto the next
> > article in the directory. Rinse and repeat for the entire directory.
> >
> > So far, I've tried working through some Stack Overflow-based solutions,
> > but most seem to use the textreadr package, which is now deprecated;
> > others use either the officer or the officedown packages. However, these
> > packages don't appear to do what I want the program to do, at least not
> > in any of the examples I have found, nor in the vignettes and relevant
> > packa...
2023 Dec 30
3
Help request: Parsing docx files for key words and appending to a spreadsheet
An update: Running this block of code:
# Load libraries
library(tcltk)
library(tidyverse)
library(officer)
filepath <- setwd(tk_choose.dir())
filename <- "Now they want us to charge our electric cars from litter
bins.docx"
#full_filename <- paste0(filepath, filename)
full_filename <- paste(filepath, filename, sep="/")
if (!file.exists(full_filename)) {
?