Scott Sherrill-Mix
2018-Aug-20 19:42 UTC
[R] download.file() problems with binary files containing EOF byte in Windows
Hello, I'm trying to get a package to pass win-builder and have been having a bit of trouble with Windows R and binary files (in my case a small .tar.gz used in testing). After a little debugging, I think I've narrowed it down to download.file() truncating files to the first '1a' byte (often used for EOF but I think a valid byte inside gzip files) on downloads from local "file://xxx". I'm trying to figure out if this is a known "feature" of Windows that I should just avoid or does this seem like a bug? For example: #write a file starting with byte 1a (decimal 26) writeBin(26:100,'tmp.bin',size=1) download.file('file://tmp.bin','download.bin') file.size('tmp.bin') file.size('download.bin') On Windows (session info below), I get file sizes of 75 and 0 and on Linux I get 75 and 75. As a more real world example, if I download.file() on a .gz file then a remote download seems to return different size files from a local download. For example for a gz file from a google hit about gzip (http://commandlinefanatic.com/cgi-bin/showarticle.cgi?article=art053): download.file('http://commandlinefanatic.com/gunzip.c.gz','gunzip.c.gz') download.file('file://gunzip.c.gz','dl.gz') file.size('gunzip.c.gz') file.size('dl.gz') I get a 4704 byte file for the remote download and 360 for the local download in Windows (versus 4704 and 4704 on Linux). Note that the 361st byte is 1a: readBin('gunzip.c.gz','raw',361) The various download.file options don't seem to fix this with the same 360 bytes for: download.file('file://gunzip.c.gz','dl.gz',mode='wb') file.size('dl.gz') download.file('file://gunzip.c.gz','dl.gz',mode='wb',method='internal') file.size('dl.gz') It looks like the 'auto' and 'internal' methods both resolve to the 'wininet' method on Windows and mode is automatically set to 'wb' for gz files so maybe not surprising those don't change things. Thanks, Scott ## Windows sessionInfo(): R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8.1 x64 (build 9600) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.1 ## Linux sessionInfo(): R version 3.4.4 (2018-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.5 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.4