Jon Clayden
2011-Sep-23 13:58 UTC
[Rd] Issue with seek() on gzipped connections in R-devel
Dear all, In R-devel (2011-09-23 r57050), I'm running into a serious problem with seek()ing on connections opened with gzfile(). A warning is generated and the file position does not seek to the requested location. It doesn't seem to occur all the time - I tried to create a small example file to illustrate it, but the problem didn't occur. However, it can be seen with a file I use for testing my packages, which is available through the URL <https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true>:> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb") > seek(con, 352)[1] 0 Warning message: In seek.connection(con, 352) : seek on a gzfile connection returned an internal error> seek(con, NA)[1] 190 The same commands with the same file work as expected in R 2.13.1, and have worked over many previous versions of R.> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb") > seek(con, 352)[1] 0> seek(con, NA)[1] 352 My sessionInfo() output is: R Under development (unstable) (2011-09-23 r57050) Platform: x86_64-apple-darwin11.1.0 (64-bit) locale: [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] tractor.nt_2.0.1 tractor.session_2.0.3 tractor.utils_2.0.0 [4] tractor.base_2.0.3 reportr_0.2.0 This seems to occur whether or not R is compiled with "--with-system-zlib". I see some zlib-related changes mentioned in the NEWS, but I don't see any indication that this is expected. Could anyone shed any light on it, please? Thanks and all the best, Jon
Prof Brian Ripley
2011-Sep-23 15:28 UTC
[Rd] Issue with seek() on gzipped connections in R-devel
Basically seek with zlib is flaky: we've stumbled on several errors. If it worked for you in the past, count yourself lucky. I'd suggest you avoid relying on it in your packages. On Fri, 23 Sep 2011, Jon Clayden wrote:> Dear all, > > In R-devel (2011-09-23 r57050), I'm running into a serious problem > with seek()ing on connections opened with gzfile(). A warning is > generated and the file position does not seek to the requested > location. It doesn't seem to occur all the time - I tried to create a > small example file to illustrate it, but the problem didn't occur. > However, it can be seen with a file I use for testing my packages, > which is available through the URL > <https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true>: > >> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb") >> seek(con, 352) > [1] 0 > Warning message: > In seek.connection(con, 352) : > seek on a gzfile connection returned an internal error >> seek(con, NA) > [1] 190 > > The same commands with the same file work as expected in R 2.13.1, and > have worked over many previous versions of R. > >> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb") >> seek(con, 352) > [1] 0 >> seek(con, NA) > [1] 352 > > My sessionInfo() output is: > > R Under development (unstable) (2011-09-23 r57050) > Platform: x86_64-apple-darwin11.1.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] tractor.nt_2.0.1 tractor.session_2.0.3 tractor.utils_2.0.0 > [4] tractor.base_2.0.3 reportr_0.2.0 > > This seems to occur whether or not R is compiled with > "--with-system-zlib". I see some zlib-related changes mentioned in the > NEWS, but I don't see any indication that this is expected. Could > anyone shed any light on it, please? > > Thanks and all the best, > Jon > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Jeffrey Ryan
2011-Sep-23 15:54 UTC
[Rd] Issue with seek() on gzipped connections in R-devel
seek() in general is a bad idea IMO if you are writing cross-platform code. ?seek Warning: Use of ?seek? on Windows is discouraged. We have found so many errors in the Windows implementation of file positioning that users are advised to use it only at their own risk, and asked not to waste the R developers' time with bug reports on Windows' deficiencies. Aside from making me laugh, the above highlights the core reason to not use IMO. For not zipped files, you can try the mmap package. ?mmap and ?types are good starting points. Allows for accessing binary data on disk with very simple R-like semantics, and is very fast. Not as fast as a sequential read... but fast. At present this is 'little endian' only though, but that describes most of the world today. Best, Jeff On Fri, Sep 23, 2011 at 8:58 AM, Jon Clayden <jon.clayden at gmail.com> wrote:> Dear all, > > In R-devel (2011-09-23 r57050), I'm running into a serious problem > with seek()ing on connections opened with gzfile(). A warning is > generated and the file position does not seek to the requested > location. It doesn't seem to occur all the time - I tried to create a > small example file to illustrate it, but the problem didn't occur. > However, it can be seen with a file I use for testing my packages, > which is available through the URL > <https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true>: > >> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb") >> seek(con, 352) > [1] 0 > Warning message: > In seek.connection(con, 352) : > ?seek on a gzfile connection returned an internal error >> seek(con, NA) > [1] 190 > > The same commands with the same file work as expected in R 2.13.1, and > have worked over many previous versions of R. > >> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb") >> seek(con, 352) > [1] 0 >> seek(con, NA) > [1] 352 > > My sessionInfo() output is: > > R Under development (unstable) (2011-09-23 r57050) > Platform: x86_64-apple-darwin11.1.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] splines ? stats ? ? graphics ?grDevices utils ? ? datasets ?methods > [8] base > > other attached packages: > [1] tractor.nt_2.0.1 ? ? ?tractor.session_2.0.3 tractor.utils_2.0.0 > [4] tractor.base_2.0.3 ? ?reportr_0.2.0 > > This seems to occur whether or not R is compiled with > "--with-system-zlib". I see some zlib-related changes mentioned in the > NEWS, but I don't see any indication that this is expected. Could > anyone shed any light on it, please? > > Thanks and all the best, > Jon > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Jeffrey Ryan jeffrey.ryan at lemnica.com www.lemnica.com www.esotericR.com