Jeroen Ooms
2018-May-03 14:21 UTC
[Rd] download.file does not process gz files correctly (truncates them?)
On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson <henrik.bengtsson at gmail.com> wrote:> Use mode="wb" when you download the file. See > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30. > > R core, and others, is there a good argument for why we are not making this > the default download mode? It seems like a such a simple fix to such a > common "mistake".I'd like to second this feature request. This default behaviour is unexpected and often leads to r scripts that were written on mac/linux, to produce corrupted files on windows, checksum mismatches, etc. Even for text files, the default should be to download the file as-is. Trying to "fix" line-endings should be opt-in, never the default. Downloading a file via a browser or ftp client on windows also doesn't change the file, why should R? On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> Many downloads are text files (HTML, CSV, etc.), and if those are downloaded > in binary, a Windows user might end up with a file that Notepad can't > handle, because it would have Unix-style line endings.True but I don't think this is relevant. The same holds e.g. for the R files in source packages, which also have unix line endings. Most Windows users will use an actual editor that understands both types of line endings, or can convert between the two. Downloading-file should do just that.
Joris Meys
2018-May-03 14:27 UTC
[Rd] download.file does not process gz files correctly (truncates them?)
Thank you Henrik and Martin for explaining what was going on. Very insightful! On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms <jeroenooms at gmail.com> wrote:> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson > <henrik.bengtsson at gmail.com> wrote: > > Use mode="wb" when you download the file. See > > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30. > > > > R core, and others, is there a good argument for why we are not making > this > > the default download mode? It seems like a such a simple fix to such a > > common "mistake". > > I'd like to second this feature request. This default behaviour is > unexpected and often leads to r scripts that were written on > mac/linux, to produce corrupted files on windows, checksum mismatches, > etc. > > Even for text files, the default should be to download the file as-is. > Trying to "fix" line-endings should be opt-in, never the default. > Downloading a file via a browser or ftp client on windows also doesn't > change the file, why should R? >I third the feature request.> > > On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: > > Many downloads are text files (HTML, CSV, etc.), and if those are > downloaded > > in binary, a Windows user might end up with a file that Notepad can't > > handle, because it would have Unix-style line endings. > > True but I don't think this is relevant. The same holds e.g. for the R > files in source packages, which also have unix line endings. Most > Windows users will use an actual editor that understands both types of > line endings, or can convert between the two. > > Downloading-file should do just that. >Again, I agree. In my (limited) experience the only program that fails to properly display \n as a line ending, is Notepad. But it can still open the file regardless. If line ending conflicts cause bugs, it's almost always a unix-like OS struggling with Windows-style endings. I have yet to meet the first one the other way around. Cheers Joris -- Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2017-2018 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Henrik Bengtsson
2018-May-03 21:14 UTC
[Rd] download.file does not process gz files correctly (truncates them?)
Also, as mentioned in my https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when not specifying the mode argument, the default on Windows is mode = "w" *except* for certain, case-sensitive, filename extensions: if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url))) mode <- "wb" Just like the need for mode = "wb" on Windows, the above special-file-extension-hack is only happening on Windows, and is only documented in ?download.file if you're on Windows; so someone who's on Linux/macOS trying to help someone on Windows may not be aware of this. This adds to even more confusions, e.g. "works for me". /Henrik On Thu, May 3, 2018 at 7:27 AM, Joris Meys <jorismeys at gmail.com> wrote:> Thank you Henrik and Martin for explaining what was going on. Very > insightful! > > On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms <jeroenooms at gmail.com> wrote: >> >> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson >> <henrik.bengtsson at gmail.com> wrote: >> > Use mode="wb" when you download the file. See >> > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30. >> > >> > R core, and others, is there a good argument for why we are not making >> > this >> > the default download mode? It seems like a such a simple fix to such a >> > common "mistake". >> >> I'd like to second this feature request. This default behaviour is >> unexpected and often leads to r scripts that were written on >> mac/linux, to produce corrupted files on windows, checksum mismatches, >> etc. >> >> Even for text files, the default should be to download the file as-is. >> Trying to "fix" line-endings should be opt-in, never the default. >> Downloading a file via a browser or ftp client on windows also doesn't >> change the file, why should R? > > > I third the feature request. > >> >> >> >> On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch <murdoch.duncan at gmail.com> >> wrote: >> > Many downloads are text files (HTML, CSV, etc.), and if those are >> > downloaded >> > in binary, a Windows user might end up with a file that Notepad can't >> > handle, because it would have Unix-style line endings. >> >> True but I don't think this is relevant. The same holds e.g. for the R >> files in source packages, which also have unix line endings. Most >> Windows users will use an actual editor that understands both types of >> line endings, or can convert between the two. >> >> Downloading-file should do just that. > > > Again, I agree. In my (limited) experience the only program that fails to > properly display \n as a line ending, is Notepad. But it can still open the > file regardless. If line ending conflicts cause bugs, it's almost always a > unix-like OS struggling with Windows-style endings. I have yet to meet the > first one the other way around. > > Cheers > Joris > > > -- > Joris Meys > Statistical consultant > > Department of Data Analysis and Mathematical Modelling > Ghent University > Coupure Links 653, B-9000 Gent (Belgium) > > ----------- > Biowiskundedagen 2017-2018 > http://www.biowiskundedagen.ugent.be/ > > ------------------------------- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Reasonably Related Threads
- download.file does not process gz files correctly (truncates them?)
- download.file does not process gz files correctly (truncates them?)
- download.file does not process gz files correctly (truncates them?)
- download.file does not process gz files correctly (truncates them?)
- download.file does not process gz files correctly (truncates them?)