Andreas Blätte
2023-Jun-16 11:00 UTC
[Rd] download.file() issue with pdf docs on Windows: Set mode="wb" automatically?
Dear colleagues, Windows users in an R course I teach encountered issues with downloading a pdf document with `download.file()` when trying to open it with `pdftools::pdf_info()`. Indeed, on Windows pdf files downloaded using `download.file() are corrupted unless you set `mode="wb"`. This scenario is actually to be anticipated. The? documentation of download.file() says clearly: ? """ The choice of binary transfer (mode = "wb" or "ab") is important on Windows, since unlike Unix-alikes it does distinguish between text and binary files and for text transfers changes \n line endings to \r\n (aka ?CRLF?). On Windows, if mode is not supplied (missing()) and url ends in one of .gz, .bz2, .xz, .tgz, .zip, .jar, .rda, .rds or .RData, mode = "wb" is set so that a binary transfer is done to help unwary users. Code written to download binary files must use mode = "wb" (or "ab"), but the problems incurred by a text transfer will only be seen on Windows. """ However, many "unwary users" will not read the (very clear) documentation. So I suggest to consider including pdf documents into the list of documents for which mode = "wb" is set automatically. This would require to change this line of the R source code: https://github.com/wch/r-source/blob/197d25ca9c5a5132dbc366667137ed11255c099b/src/library/utils/R/windows/download.file.R#L30 As follows: if(missing(mode) && length(grep(\\.(gz|bz2|xz|tgz|zip|jar|rd[as]|RData|pdf)$, URLdecode(url)))) ??? mode <- "wb" I am not sure whether you would see this as arbitrarily violating some logic. Yet I am quite sure that many users not used to reading the documentation carefully have struggled with this issue. This is an issue I wrote for my course: https://github.com/ablaette/learningR/issues/24 And this is the code that we used: gruene_btw2021 <- "https://cms.gruene.de/uploads/documents/2021_Wahlprogrammentwurf.pdf" gruene_btw2021_local <- tempfile() download.file(url = gruene_btw2021, destfile = gruene_btw2021_local) pdftools::pdf_info(gruene_btw2021_local) Kind regards Andreas -- Prof. Dr. Andreas Blaette Professor of Public Policy University of Duisburg-Essen [[alternative HTML version deleted]]