thr3ads.net - R help - [R] Help request: Parsing docx files for key words and appending to a spreadsheet [Dec 2023]

If this information is useful, please help other people find it:
Share via:

Ivan Krylov

2023-Dec-29 20:59 UTC

[R] Help request: Parsing docx files for key words and appending to a spreadsheet

? Fri, 29 Dec 2023 20:17:41 +0000
Andy <phaedrusv at gmail.com> ?????:
> doc_in <- read_docx(files)
> 
> Results in this error:Error in filetype %in% c("docx") &&
> grepl("^([fh]ttp)", file) :'length = 9' in coercion to
'logical(1)'
help(read_docx) says that the function only imports one docx file. In
order to read multiple files, use a for loop or the lapply function.
> content <- officer::docx_summary("Now they want us to charge our 
> electric cars from litter bins.docx") # A title of one of the articles
> 
> The error returned is:Error in x$doc_obj : $ operator is invalid for 
> atomic vectors
A similar problem here. help(docx_summary) says that the function
accepts "rdocx" objects returned by read_docx, not file paths. A
string
in R is indeed an atomic vector of type character, length 1.

docx_summary(read_docx("Now they want us to charge our electric cars
from litter bins.docx")) should work.

-- 
Best regards,
Ivan

CALUM POLWART

2023-Dec-29 22:25 UTC

head link

[R] Help request: Parsing docx files for key words and appending to a spreadsheet

help(read_docx) says that the function only imports one docx file.
In> order to read multiple files, use a for loop or the lapply function.
>
I told you people will suggest better ways to loop!!

>
> docx_summary(read_docx("Now they want us to charge our electric cars
> from litter bins.docx")) should work.
>
Ivan thanks for spotting my fail! Since the OP is new to all this I'm going
to suggest a little tweak to this code which we can then build into a for
loop:

filepath <- getwd() #you will want to change this later. You are doing
something with tcl to pick a directory which seems rather fancy! But keep
doing it for now or set the directory here ending in a /

filename <- "Now they want us to charge our electric cars from litter
bins.docx"

full_filename <- paste0(filepath, filename)

#lets double check the file does exist!
if (!file.exists(full_filename)) {
  message("File missing")
} else {
  content <- read_docx(full_filename) |>
    docx_summary()
    # this reads docx for the full filename and
    # passes it ( |> command) to the next line
    # which summarises it.
    # the result is saved in a data frame object
    # called content which we shall show some
    # heading into from

   head(content)
}

Let's get this bit working before we try and loop
>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Dec 2023 - Help request: Parsing docx files for key words and appending to a spreadsheet

[R] Help request: Parsing docx files for key words and appending to a spreadsheet

[R] Help request: Parsing docx files for key words and appending to a spreadsheet

Apparently Analagous Threads