CecĂlia Carmo
2023-Jul-05  10:12 UTC
[R] textual analysis - transforming several pdf to txt - naming the files
convertpdf2txt <- function(dirpath){
   files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$",
full.names
= TRUE)
   files <- chartr("\\", "/", files)
   x <- lapply(files, function(x){
     pdftools::pdf_text(x) %>%
       paste0(collapse = " ") %>%
       stringr::str_squish()
   })
   new_names <- tools::file_path_sans_ext(files)
   new_names <- paste(new_names, "txt", sep = ".")
   setNames(x, new_names)
}
# apply function
# note that my test files are in "~/Temp"
txts <- convertpdf2txt(here::here("~", "Temp"))
names(txts)
Thank you very much, but the following error appeared:
Error: unexpected '}' in "}"
Cec?lia Carmo
Universidade de Aveiro
	[[alternative HTML version deleted]]
Rui Barradas
2023-Jul-05  15:43 UTC
[R] textual analysis - transforming several pdf to txt - naming the files
?s 11:12 de 05/07/2023, Cec?lia Carmo escreveu:> convertpdf2txt <- function(dirpath){ > > files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$", full.names > = TRUE) > files <- chartr("\\", "/", files) > > x <- lapply(files, function(x){ > pdftools::pdf_text(x) %>% > paste0(collapse = " ") %>% > stringr::str_squish() > }) > new_names <- tools::file_path_sans_ext(files) > new_names <- paste(new_names, "txt", sep = ".") > setNames(x, new_names) > } > > # apply function > # note that my test files are in "~/Temp" > txts <- convertpdf2txt(here::here("~", "Temp")) > names(txts) > > > Thank you very much, but the following error appeared: > > Error: unexpected '}' in "}" > > > > > Cec?lia Carmo > > Universidade de Aveiro > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Hello, I had tested the code with a couple of PDF's and it ran with no errors or warnings. That error is telling that a "}" is not balanced but in my code they all are, RStudio checks it automatically. Can you try to check in an editor with syntax highlighting? Hope this helps, Rui Barradas
Seemingly Similar Threads
- textual analysis - transforming several pdf to txt - naming the files
- dictionary lookup
- [External] Re: Repeated library() of one package with different include.only= entries
- [External] Re: Repeated library() of one package with different include.only= entries
- Renaming names in R matrix