Paul McQuesten
2025-Aug-10 18:37 UTC
[Rd] Including mode='wb' in download.file() for .xlsx files on Windows ?
Perhaps it would be simpler, and more future-proof, for R to always download as binary. Are there any modern consumers of text files that are bothered by '\r\n'? Or even Macintosh '\r' line terminators? On Sun, Aug 10, 2025 at 1:22?PM Hernando Cortina <hch at alum.mit.edu> wrote:> Yes, .docx and .pptx are part of the same specification. > > > > Kind regards > > Hernando > > > > *From: *Paul McQuesten <mcquesten at gmail.com> > *Date: *Sunday, August 10, 2025 at 1:34?PM > *To: *Hernando Cortina <hch at alum.mit.edu> > *Subject: *Re: [Rd] Including mode='wb' in download.file() for .xlsx > files on Windows ? > > IIUC, '.docx' files are also binary? > > > > On Sun, Aug 10, 2025 at 11:29?AM Hernando Cortina <hcortina71 at gmail.com> > wrote: > > Hello all, regarding download.file(): > > On Windows, if mode is not supplied (missing()) and url ends in one of > ??.gz??, ??.bz2??, ??.xz??, ??.tgz??, ??.zip??, ??.jar??, ??.rda??, > ??.rds??, ??.RData?? or ??.pdf??, mode = "wb" is set so that a binary > transfer is done to help unwary users. > > May I suggest possibly including .xlsx files to the list of extensions > that get this treatment? > > Downloading such files may be a quite common activity in the R > community and having to manually add mode=?wb? may indeed catch > Windows users unaware, particularly if they are coming from Linux or > Mac where this is not necessary. > > I understand that it?s hard to know when to stop when adding > additional extensions. That said, .xlsx is quite ubiquitous in the > wild and standardized under ECMA-376. > > I hope this might be helpful to others, and thank you for your > consideration. > Hernando > --------------- > > The change in src/library/utils/R/Windows/download.file.R would be: > > ? > > if(missing(mode) && > length(grep("\\.(gz|bz2|xz|tgz|zip|jar|rd[as]|RData|xlsx)$", > > URLdecode(url)))) > > mode <- "wb" > > ? > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >[[alternative HTML version deleted]]
Avraham Adler
2025-Aug-10 18:52 UTC
[Rd] Including mode='wb' in download.file() for .xlsx files on Windows ?
If I recall correctly, xlsx files are XML. It is the xls/xlsb files which are binary. https://learn.microsoft.com/en-us/openspecs/office_standards/ms-xlsx/2c5dee00-eff2-4b22-92b6-0738acd4475e Sent from my iPhone> On Aug 10, 2025, at 2:38?PM, Paul McQuesten <mcquesten at gmail.com> wrote: > > ?Perhaps it would be simpler, and more future-proof, for R to always > download as binary. > Are there any modern consumers of text files that are bothered by '\r\n'? > Or even Macintosh '\r' line terminators? > >> On Sun, Aug 10, 2025 at 1:22?PM Hernando Cortina <hch at alum.mit.edu> wrote: >> >> Yes, .docx and .pptx are part of the same specification. >> >> >> >> Kind regards >> >> Hernando >> >> >> >> *From: *Paul McQuesten <mcquesten at gmail.com> >> *Date: *Sunday, August 10, 2025 at 1:34?PM >> *To: *Hernando Cortina <hch at alum.mit.edu> >> *Subject: *Re: [Rd] Including mode='wb' in download.file() for .xlsx >> files on Windows ? >> >> IIUC, '.docx' files are also binary? >> >> >> >> On Sun, Aug 10, 2025 at 11:29?AM Hernando Cortina <hcortina71 at gmail.com> >> wrote: >> >> Hello all, regarding download.file(): >> >> On Windows, if mode is not supplied (missing()) and url ends in one of >> ??.gz??, ??.bz2??, ??.xz??, ??.tgz??, ??.zip??, ??.jar??, ??.rda??, >> ??.rds??, ??.RData?? or ??.pdf??, mode = "wb" is set so that a binary >> transfer is done to help unwary users. >> >> May I suggest possibly including .xlsx files to the list of extensions >> that get this treatment? >> >> Downloading such files may be a quite common activity in the R >> community and having to manually add mode=?wb? may indeed catch >> Windows users unaware, particularly if they are coming from Linux or >> Mac where this is not necessary. >> >> I understand that it?s hard to know when to stop when adding >> additional extensions. That said, .xlsx is quite ubiquitous in the >> wild and standardized under ECMA-376. >> >> I hope this might be helpful to others, and thank you for your >> consideration. >> Hernando >> --------------- >> >> The change in src/library/utils/R/Windows/download.file.R would be: >> >> ? >> >> if(missing(mode) && >> length(grep("\\.(gz|bz2|xz|tgz|zip|jar|rd[as]|RData|xlsx)$", >> >> URLdecode(url)))) >> >> mode <- "wb" >> >> ? >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel[[alternative HTML version deleted]]