Mike Miller
2015-Jan-13 18:51 UTC
[R] seek(), Windows and Cygwin (was "a UNIX vs. Windows package question, please")
On Fri, 9 Jan 2015, Duncan Murdoch wrote:> On 09/01/2015 5:32 PM, Erin Hodgess wrote: >> Hello again. >> >> Here is another question that I am puzzled about: I had the >> (incorrect) impression that if I had Rtools on a Windows machine that I >> could use any tar.gz package. However, that is not true. >> >> In particular, I was looking at the rPython package. I do indeed have >> Python on this machine. But when I did R CMD INSTALL rPython, I got an >> error message that said, "this is a Unix package". Interesting. >> >> Should I just stay with my Ubuntu laptop and behave? > > No, but you should not use packages that misbehave. The ideal R package > will run on all platforms where R runs. Some require effort from the > user to provide prerequisites, but no good R package runs only on one > platform.That reminds me to ask if anyone here can provide more details about the limitations of seek(). I'm working on some functions that use seek() and I may have to tell Windows users not to use these functions.>From the manual page for seek():http://stat.ethz.ch/R-manual/R-devel/library/base/html/seek.html "Use of seek on Windows is discouraged. We have found so many errors in the Windows implementation of file positioning that users are advised to use it only at their own risk, and asked not to waste the R developers' time with bug reports on Windows' deficiencies." My question is about whether this limitation is caused by the Windows filesystem, typically NTFS, or if the problem is in the Windows OS. If the problem were in the filesystem, maybe the docs would have said so because NTFS can be used on other platforms. Secondly, can this problem be addressed at all by using Cygwin? I know that Cygwin is running in Windows, so it's still Windows, but R might be compiled differently, so I just thought I'd ask! ;-) And it doesn't matter which Windows version is used? Finally, if the problem is entirely in Windows, and R cannot possibly overcome it, I suppose that means that it is impossible to write a program to run under Windows that can seek (is it fseek in C?) reliably to a position in a file. If that is the case, it's going to be hard to develop good systems for managing bioinformatic data on Windows. Thanks in advance. Mike -- Michael B. Miller, Ph.D. University of Minnesota http://scholar.google.com/citations?user=EV_phq4AAAAJ
Jeff Newmiller
2015-Jan-13 20:20 UTC
[R] seek(), Windows and Cygwin (was "a UNIX vs. Windows package question, please")
I don't know why the R developers made that comment, and R-devel is probably a better place to follow up, but the usual problem is that Windows treats text files differently than binary files, so seeking n text files is a headache. Binary files ought to be okay, but that is a theoretical opinion, not from experience. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On January 13, 2015 10:51:18 AM PST, Mike Miller <mbmiller at umn.edu> wrote:>On Fri, 9 Jan 2015, Duncan Murdoch wrote: > >> On 09/01/2015 5:32 PM, Erin Hodgess wrote: >>> Hello again. >>> >>> Here is another question that I am puzzled about: I had the >>> (incorrect) impression that if I had Rtools on a Windows machine >that I >>> could use any tar.gz package. However, that is not true. >>> >>> In particular, I was looking at the rPython package. I do indeed >have >>> Python on this machine. But when I did R CMD INSTALL rPython, I got >an >>> error message that said, "this is a Unix package". Interesting. >>> >>> Should I just stay with my Ubuntu laptop and behave? >> >> No, but you should not use packages that misbehave. The ideal R >package >> will run on all platforms where R runs. Some require effort from the > >> user to provide prerequisites, but no good R package runs only on one > >> platform. > > >That reminds me to ask if anyone here can provide more details about >the >limitations of seek(). I'm working on some functions that use seek() >and >I may have to tell Windows users not to use these functions. > >>From the manual page for seek(): > >http://stat.ethz.ch/R-manual/R-devel/library/base/html/seek.html > >"Use of seek on Windows is discouraged. We have found so many errors in > >the Windows implementation of file positioning that users are advised >to >use it only at their own risk, and asked not to waste the R developers' > >time with bug reports on Windows' deficiencies." > >My question is about whether this limitation is caused by the Windows >filesystem, typically NTFS, or if the problem is in the Windows OS. If > >the problem were in the filesystem, maybe the docs would have said so >because NTFS can be used on other platforms. > >Secondly, can this problem be addressed at all by using Cygwin? I know > >that Cygwin is running in Windows, so it's still Windows, but R might >be >compiled differently, so I just thought I'd ask! ;-) > >And it doesn't matter which Windows version is used? > >Finally, if the problem is entirely in Windows, and R cannot possibly >overcome it, I suppose that means that it is impossible to write a >program >to run under Windows that can seek (is it fseek in C?) reliably to a >position in a file. If that is the case, it's going to be hard to >develop >good systems for managing bioinformatic data on Windows. > >Thanks in advance. > >Mike
Henrik Bengtsson
2015-Jan-13 21:33 UTC
[R] seek(), Windows and Cygwin (was "a UNIX vs. Windows package question, please")
I/we've been utilizing both read and write seek():s on *binary* connections across platforms and file systems, including Windows (at least NTFS, but probably also FAT/FAT32 back in the days) in the Aroma Framework (e.g. affxparser, R.huge) for ~8 years and counting. There should be thousands and thousands of Windows CPU hours for this by now and I still have to see a case/report where seek() was an issue. Without further references and pointers, I consider that claim in help("seek") mostly anecdotal (e.g. someone at some point in time had issues on some version on Windows and gave up on narrow it down). It did however made me add lots of internal sanity checks to catch a corner case where seek() on Windows is flaky - those assertions still haven't failed. I have little experience with seek() on *text* connections, so Jeff may have a point there. /Henrik On Tue, Jan 13, 2015 at 12:20 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I don't know why the R developers made that comment, and R-devel is probably a better place to follow up, but the usual problem is that Windows treats text files differently than binary files, so seeking n text files is a headache. Binary files ought to be okay, but that is a theoretical opinion, not from experience. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On January 13, 2015 10:51:18 AM PST, Mike Miller <mbmiller at umn.edu> wrote: >>On Fri, 9 Jan 2015, Duncan Murdoch wrote: >> >>> On 09/01/2015 5:32 PM, Erin Hodgess wrote: >>>> Hello again. >>>> >>>> Here is another question that I am puzzled about: I had the >>>> (incorrect) impression that if I had Rtools on a Windows machine >>that I >>>> could use any tar.gz package. However, that is not true. >>>> >>>> In particular, I was looking at the rPython package. I do indeed >>have >>>> Python on this machine. But when I did R CMD INSTALL rPython, I got >>an >>>> error message that said, "this is a Unix package". Interesting. >>>> >>>> Should I just stay with my Ubuntu laptop and behave? >>> >>> No, but you should not use packages that misbehave. The ideal R >>package >>> will run on all platforms where R runs. Some require effort from the >> >>> user to provide prerequisites, but no good R package runs only on one >> >>> platform. >> >> >>That reminds me to ask if anyone here can provide more details about >>the >>limitations of seek(). I'm working on some functions that use seek() >>and >>I may have to tell Windows users not to use these functions. >> >>>From the manual page for seek(): >> >>http://stat.ethz.ch/R-manual/R-devel/library/base/html/seek.html >> >>"Use of seek on Windows is discouraged. We have found so many errors in >> >>the Windows implementation of file positioning that users are advised >>to >>use it only at their own risk, and asked not to waste the R developers' >> >>time with bug reports on Windows' deficiencies." >> >>My question is about whether this limitation is caused by the Windows >>filesystem, typically NTFS, or if the problem is in the Windows OS. If >> >>the problem were in the filesystem, maybe the docs would have said so >>because NTFS can be used on other platforms. >> >>Secondly, can this problem be addressed at all by using Cygwin? I know >> >>that Cygwin is running in Windows, so it's still Windows, but R might >>be >>compiled differently, so I just thought I'd ask! ;-) >> >>And it doesn't matter which Windows version is used? >> >>Finally, if the problem is entirely in Windows, and R cannot possibly >>overcome it, I suppose that means that it is impossible to write a >>program >>to run under Windows that can seek (is it fseek in C?) reliably to a >>position in a file. If that is the case, it's going to be hard to >>develop >>good systems for managing bioinformatic data on Windows. >> >>Thanks in advance. >> >>Mike > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.