search for: pdf_text

Displaying 6 results from an estimated 6 matches for "pdf_text".

Did you mean: pdf2text
2024 Nov 25
1
Problemas usando paquete textreuse
...rpus() o TextReuseTextDocument(). En la documentación del paquete los archivos los cargan desde ¿Alguien sabe cómo se hace? He conseguido calcular la similitud de jaccard utilizando este paquete, pero para ello he empleado el siguiente código. library(pdftools) library(textreuse) text1 <- pdf_text("uno.pdf") text2 <- pdf_text("dos.pdf") full_text1 <- paste(text1, collapse = " ") full_text2 <- paste(text2, collapse = " ") a <- tokenize_words(full_text1) b <- tokenize_words(full_text2) jaccard_similarity(a, b) Gracias [[alternati...
2023 Jul 05
1
textual analysis - transforming several pdf to txt - naming the files
...ade de Aveiro - Portugal dirpath <- ("/Users/ceciliacarmo/documents/RTextualAnalysis/data/pdfs") library(pdftools) library(dplyr) convertpdf2txt <- function(dirpath){ files <- list.files(dirpath, full.names = T) x <- sapply(files, function(x){ x <- pdftools::pdf_text(x) %>% paste0(collapse = " ") %>% stringr::str_squish() return(x) }) } # apply function txts <- convertpdf2txt(here::here("data", "pdf/")) # add names to txt files names(txts) <- paste0(here::here("data","pdftext"), 1:...
2023 Jul 05
1
textual analysis - transforming several pdf to txt - naming the files
convertpdf2txt <- function(dirpath){ files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$", full.names = TRUE) files <- chartr("\\", "/", files) x <- lapply(files, function(x){ pdftools::pdf_text(x) %>% paste0(collapse = " ") %>% stringr::str_squish() }) new_names <- tools::file_path_sans_ext(files) new_names <- paste(new_names, "txt", sep = ".") setNames(x, new_names) } # apply function # note that my test files are in &q...
2024 Nov 26
0
Resumen de R-help-es, Vol 187, Envío 10
...ión del paquete los archivos los cargan desde > > ¿Alguien sabe cómo se hace? > > He conseguido calcular la similitud de jaccard utilizando este paquete, > pero para ello he empleado el siguiente código. > > library(pdftools) > > library(textreuse) > > text1 <- pdf_text("uno.pdf") > > text2 <- pdf_text("dos.pdf") > > full_text1 <- paste(text1, collapse = " ") > > full_text2 <- paste(text2, collapse = " ") > > a <- tokenize_words(full_text1) > > b <- tokenize_words(full_text2) >...
2004 Jul 01
1
PDF text strangeness (PR#7043)
Hi R-developers I have noticed a strange little bug/feature: I often create pdf's of plots, then edit them in Adobe Illustrator. Generally this works great, but whenever I have text that is aligned vertically (along the y-axis usually), the text is written out as lots of individual objects. When the text is horizontal (x-axis, other stuff), it is all one object. I would prefer one object
2008 Mar 29
1
A patch for extending pdf device to embed popup text and web links
...ons */ + /* * Fonts and encodings used on the device */ *************** *** 5149,5154 **** --- 5154,5166 ---- cidfontfamily defaultCIDFont; /* Record if fonts are used */ Rboolean fontUsed[100]; + + /* + * Current text geometry information (stored in PDF_Text) + */ + int text_size; + double text_a, text_b, text_x, text_y; + double text_ascent, text_descent, text_width; } PDFDesc; *************** *** 5188,5197 **** --- 5200,5217 ---- static double PDF_StrWidth(const char *str, const pGEcontext gc, pDevDesc dd); + s...