search for: textreadr

Displaying 11 results from an estimated 11 matches for "textreadr".

2023 Dec 29
2
Help request: Parsing docx files for key words and appending to a spreadsheet
...However, I want to ensure that the keyword coverage meets the threshold of >= 50%; if not, then pass onto the next article in the directory. Rinse and repeat for the entire directory. > > So far, I've tried working through some Stack Overflow-based solutions, but most seem to use the textreadr package, which is now deprecated; others use either the officer or the officedown packages. However, these packages don't appear to do what I want the program to do, at least not in any of the examples I have found, nor in the vignettes and relevant package manuals I've looked at. > &gt...
2023 Dec 29
1
Help request: Parsing docx files for key words and appending to a spreadsheet
Hi Roy (& others) Many thanks for the advice - well taken. Thanks also to the others who have responded so quickly - I thought I might have to wait days!! :-) I'm on a Linux (Mint) machine. Below, I document three attempts, two using officer and the last now using textreadr My attempts so far using 'officer': ################## (1) First Attempt: # Load libraries library(tcltk) library(tidyverse) library(officer) setwd(tk_choose.dir()) doc_path <- list.files(getwd(), pattern = ".docx", full.names = TRUE) files <- list.files(getwd(), &quo...
2020 Oct 07
1
Adding text to existing PDF's created with R
...this text, the PDF should be unchanged (except for a new filename). The intent is as follows: I have multiple PDFs that I eventually merge into a single PDF, separating each one with a separator page. The content of the separator pages comes from a Word document. The task is performed with textreadr, officer, and pdftools. I can insert page numbers into the separator pages (created as PDF documents). I join the separator pages and the original PDFs using python's join command. But I have not been able to figure out how to add page numbers to the existing PDF's. Any help would be appre...
2023 Dec 29
2
Help request: Parsing docx files for key words and appending to a spreadsheet
...dsheet. However, I want to ensure that the keyword coverage meets the threshold of >= 50%; if not, then pass onto the next article in the directory. Rinse and repeat for the entire directory. So far, I've tried working through some Stack Overflow-based solutions, but most seem to use the textreadr package, which is now deprecated; others use either the officer or the officedown packages. However, these packages don't appear to do what I want the program to do, at least not in any of the examples I have found, nor in the vignettes and relevant package manuals I've looked at. The...
2018 Jan 24
0
Newbie - Scrape Data From PDFs?
...hat was related. I just did a quick search and found a few hits that might work for you. 1. https://medium.com/@CharlesBordet/how-to-extract-and-clean-data-from-pdf-files-in-r-da11964e252e 2. http://bxhorn.com/2016/extract-data-tables-from-pdf-files-in-r/ 3. https://www.rdocumentation.org/packages/textreadr/versions/0.7.0/topics/read_pdf HTH, Eric On Wed, Jan 24, 2018 at 3:58 AM, Scott Clausen <scottclausen at mac.com> wrote: > Hello, > > I?m new to R and am using it with RStudio to learn the language. I?m doing so as I have quite a lot of traffic data I would like to explore. My prob...
2018 Jan 24
2
Newbie - Scrape Data From PDFs?
Hello, I?m new to R and am using it with RStudio to learn the language. I?m doing so as I have quite a lot of traffic data I would like to explore. My problem is that all the data is located on a number of PDFs. Can someone point me to info on gathering data from other sources? I?ve been to the R FAQ and didn?t see anything and would appreciate your thoughts. I am quite sure now that often,
2024 Jan 06
0
Help request: Parsing docx files for key words and appending to a spreadsheet
...ves me ideas about how to work it through for the missing fields, which is one of the major sticking points I kept bumping up against. Thank you so much for this. All the best Andy On 05/01/2024 13:59, Howard, Tim G (DEC) wrote: > Here's a simplified version of how I would do it, using `textreadr` but otherwise base functions. I haven't done it > all, but have a few examples of finding the correct row then extracting the right data. > I made a duplicate of the file you provided, so this loops through the two identical files, extracts a few parts, > then sticks those parts in a...
2018 Jan 24
1
Newbie - Scrape Data From PDFs?
...quick search and found a few hits that might work for you. > > 1. > https://medium.com/@CharlesBordet/how-to-extract-and-clean-data-from-pdf-files-in-r-da11964e252e > 2. http://bxhorn.com/2016/extract-data-tables-from-pdf-files-in-r/ > 3. > https://www.rdocumentation.org/packages/textreadr/versions/0.7.0/topics/read_pdf > > HTH, > Eric > > On Wed, Jan 24, 2018 at 3:58 AM, Scott Clausen <scottclausen at mac.com> > wrote: > > Hello, > > > > I?m new to R and am using it with RStudio to learn the language. I?m > doing so as I have quite a lot...
2023 Dec 29
1
Help request: Parsing docx files for key words and appending to a spreadsheet
...nt to ensure that the keyword > coverage meets the threshold of >= 50%; if not, then pass onto the next > article in the directory. Rinse and repeat for the entire directory. > > So far, I've tried working through some Stack Overflow-based solutions, > but most seem to use the textreadr package, which is now deprecated; > others use either the officer or the officedown packages. However, these > packages don't appear to do what I want the program to do, at least not > in any of the examples I have found, nor in the vignettes and relevant > package manuals I've...
2023 Dec 29
1
Help request: Parsing docx files for key words and appending to a spreadsheet
...word > > coverage meets the threshold of >= 50%; if not, then pass onto the next > > article in the directory. Rinse and repeat for the entire directory. > > > > So far, I've tried working through some Stack Overflow-based solutions, > > but most seem to use the textreadr package, which is now deprecated; > > others use either the officer or the officedown packages. However, these > > packages don't appear to do what I want the program to do, at least not > > in any of the examples I have found, nor in the vignettes and relevant > > packa...
2023 Dec 30
3
Help request: Parsing docx files for key words and appending to a spreadsheet
An update: Running this block of code: # Load libraries library(tcltk) library(tidyverse) library(officer) filepath <- setwd(tk_choose.dir()) filename <- "Now they want us to charge our electric cars from litter bins.docx" #full_filename <- paste0(filepath, filename) full_filename <- paste(filepath, filename, sep="/") if (!file.exists(full_filename)) { ?