Displaying 4 results from an estimated 4 matches for "pdf_text".
Did you mean:
pdf2text
2023 Jul 05
1
textual analysis - transforming several pdf to txt - naming the files
...ade de Aveiro - Portugal
dirpath <- ("/Users/ceciliacarmo/documents/RTextualAnalysis/data/pdfs")
library(pdftools)
library(dplyr)
convertpdf2txt <- function(dirpath){
files <- list.files(dirpath, full.names = T)
x <- sapply(files, function(x){
x <- pdftools::pdf_text(x) %>%
paste0(collapse = " ") %>%
stringr::str_squish()
return(x)
})
}
# apply function
txts <- convertpdf2txt(here::here("data", "pdf/"))
# add names to txt files
names(txts) <- paste0(here::here("data","pdftext"), 1:...
2023 Jul 05
1
textual analysis - transforming several pdf to txt - naming the files
convertpdf2txt <- function(dirpath){
files <- list.files(dirpath, pattern = "Consoli.*\\.pdf$", full.names
= TRUE)
files <- chartr("\\", "/", files)
x <- lapply(files, function(x){
pdftools::pdf_text(x) %>%
paste0(collapse = " ") %>%
stringr::str_squish()
})
new_names <- tools::file_path_sans_ext(files)
new_names <- paste(new_names, "txt", sep = ".")
setNames(x, new_names)
}
# apply function
# note that my test files are in &q...
2004 Jul 01
1
PDF text strangeness (PR#7043)
Hi R-developers
I have noticed a strange little bug/feature: I often create pdf's of plots, then edit them in Adobe Illustrator. Generally this works great, but whenever I have text that is aligned vertically (along the y-axis usually), the text is written out as lots of individual objects. When the text is horizontal (x-axis, other stuff), it is all one object. I would prefer one object
2008 Mar 29
1
A patch for extending pdf device to embed popup text and web links
...ons */
+
/*
* Fonts and encodings used on the device
*/
***************
*** 5149,5154 ****
--- 5154,5166 ----
cidfontfamily defaultCIDFont;
/* Record if fonts are used */
Rboolean fontUsed[100];
+
+ /*
+ * Current text geometry information (stored in PDF_Text)
+ */
+ int text_size;
+ double text_a, text_b, text_x, text_y;
+ double text_ascent, text_descent, text_width;
}
PDFDesc;
***************
*** 5188,5197 ****
--- 5200,5217 ----
static double PDF_StrWidth(const char *str,
const pGEcontext gc,
pDevDesc dd);
+ s...