Ivan Krylov
2023-Dec-29 20:59 UTC
[R] Help request: Parsing docx files for key words and appending to a spreadsheet
? Fri, 29 Dec 2023 20:17:41 +0000 Andy <phaedrusv at gmail.com> ?????:> doc_in <- read_docx(files) > > Results in this error:Error in filetype %in% c("docx") && > grepl("^([fh]ttp)", file) :'length = 9' in coercion to 'logical(1)'help(read_docx) says that the function only imports one docx file. In order to read multiple files, use a for loop or the lapply function.> content <- officer::docx_summary("Now they want us to charge our > electric cars from litter bins.docx") # A title of one of the articles > > The error returned is:Error in x$doc_obj : $ operator is invalid for > atomic vectorsA similar problem here. help(docx_summary) says that the function accepts "rdocx" objects returned by read_docx, not file paths. A string in R is indeed an atomic vector of type character, length 1. docx_summary(read_docx("Now they want us to charge our electric cars from litter bins.docx")) should work. -- Best regards, Ivan
CALUM POLWART
2023-Dec-29 22:25 UTC
[R] Help request: Parsing docx files for key words and appending to a spreadsheet
help(read_docx) says that the function only imports one docx file. In> order to read multiple files, use a for loop or the lapply function. >I told you people will suggest better ways to loop!!> > docx_summary(read_docx("Now they want us to charge our electric cars > from litter bins.docx")) should work. >Ivan thanks for spotting my fail! Since the OP is new to all this I'm going to suggest a little tweak to this code which we can then build into a for loop: filepath <- getwd() #you will want to change this later. You are doing something with tcl to pick a directory which seems rather fancy! But keep doing it for now or set the directory here ending in a / filename <- "Now they want us to charge our electric cars from litter bins.docx" full_filename <- paste0(filepath, filename) #lets double check the file does exist! if (!file.exists(full_filename)) { message("File missing") } else { content <- read_docx(full_filename) |> docx_summary() # this reads docx for the full filename and # passes it ( |> command) to the next line # which summarises it. # the result is saved in a data frame object # called content which we shall show some # heading into from head(content) } Let's get this bit working before we try and loop>[[alternative HTML version deleted]]
Reasonably Related Threads
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet
- Help request: Parsing docx files for key words and appending to a spreadsheet