Henrik Bengtsson
2015-Jan-13 21:33 UTC
[R] seek(), Windows and Cygwin (was "a UNIX vs. Windows package question, please")
I/we've been utilizing both read and write seek():s on *binary* connections across platforms and file systems, including Windows (at least NTFS, but probably also FAT/FAT32 back in the days) in the Aroma Framework (e.g. affxparser, R.huge) for ~8 years and counting. There should be thousands and thousands of Windows CPU hours for this by now and I still have to see a case/report where seek() was an issue. Without further references and pointers, I consider that claim in help("seek") mostly anecdotal (e.g. someone at some point in time had issues on some version on Windows and gave up on narrow it down). It did however made me add lots of internal sanity checks to catch a corner case where seek() on Windows is flaky - those assertions still haven't failed. I have little experience with seek() on *text* connections, so Jeff may have a point there. /Henrik On Tue, Jan 13, 2015 at 12:20 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I don't know why the R developers made that comment, and R-devel is probably a better place to follow up, but the usual problem is that Windows treats text files differently than binary files, so seeking n text files is a headache. Binary files ought to be okay, but that is a theoretical opinion, not from experience. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On January 13, 2015 10:51:18 AM PST, Mike Miller <mbmiller at umn.edu> wrote: >>On Fri, 9 Jan 2015, Duncan Murdoch wrote: >> >>> On 09/01/2015 5:32 PM, Erin Hodgess wrote: >>>> Hello again. >>>> >>>> Here is another question that I am puzzled about: I had the >>>> (incorrect) impression that if I had Rtools on a Windows machine >>that I >>>> could use any tar.gz package. However, that is not true. >>>> >>>> In particular, I was looking at the rPython package. I do indeed >>have >>>> Python on this machine. But when I did R CMD INSTALL rPython, I got >>an >>>> error message that said, "this is a Unix package". Interesting. >>>> >>>> Should I just stay with my Ubuntu laptop and behave? >>> >>> No, but you should not use packages that misbehave. The ideal R >>package >>> will run on all platforms where R runs. Some require effort from the >> >>> user to provide prerequisites, but no good R package runs only on one >> >>> platform. >> >> >>That reminds me to ask if anyone here can provide more details about >>the >>limitations of seek(). I'm working on some functions that use seek() >>and >>I may have to tell Windows users not to use these functions. >> >>>From the manual page for seek(): >> >>http://stat.ethz.ch/R-manual/R-devel/library/base/html/seek.html >> >>"Use of seek on Windows is discouraged. We have found so many errors in >> >>the Windows implementation of file positioning that users are advised >>to >>use it only at their own risk, and asked not to waste the R developers' >> >>time with bug reports on Windows' deficiencies." >> >>My question is about whether this limitation is caused by the Windows >>filesystem, typically NTFS, or if the problem is in the Windows OS. If >> >>the problem were in the filesystem, maybe the docs would have said so >>because NTFS can be used on other platforms. >> >>Secondly, can this problem be addressed at all by using Cygwin? I know >> >>that Cygwin is running in Windows, so it's still Windows, but R might >>be >>compiled differently, so I just thought I'd ask! ;-) >> >>And it doesn't matter which Windows version is used? >> >>Finally, if the problem is entirely in Windows, and R cannot possibly >>overcome it, I suppose that means that it is impossible to write a >>program >>to run under Windows that can seek (is it fseek in C?) reliably to a >>position in a file. If that is the case, it's going to be hard to >>develop >>good systems for managing bioinformatic data on Windows. >> >>Thanks in advance. >> >>Mike > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Mike Miller
2015-Jan-13 22:05 UTC
[R] seek(), Windows and Cygwin (was "a UNIX vs. Windows package question, please")
Thanks, everyone. This is very good news from Henrik because I am interested only in binary connections. It sounds like a function that uses seek() is very likely to work well in Windows, so I won't bother to warn people. I should do a little testing just to see that it's working, though. Henrik -- I think you are saying that your experience has shown that the code you wrote for catching a corner case was not needed. Is that right? Mike On Tue, 13 Jan 2015, Henrik Bengtsson wrote:> I/we've been utilizing both read and write seek():s on *binary* > connections across platforms and file systems, including Windows (at > least NTFS, but probably also FAT/FAT32 back in the days) in the Aroma > Framework (e.g. affxparser, R.huge) for ~8 years and counting. There > should be thousands and thousands of Windows CPU hours for this by now > and I still have to see a case/report where seek() was an issue. > > Without further references and pointers, I consider that claim in > help("seek") mostly anecdotal (e.g. someone at some point in time had > issues on some version on Windows and gave up on narrow it down). It > did however made me add lots of internal sanity checks to catch a > corner case where seek() on Windows is flaky - those assertions still > haven't failed. > > I have little experience with seek() on *text* connections, so Jeff > may have a point there. > > /Henrik > > On Tue, Jan 13, 2015 at 12:20 PM, Jeff Newmiller > <jdnewmil at dcn.davis.ca.us> wrote: >> I don't know why the R developers made that comment, and R-devel is probably a better place to follow up, but the usual problem is that Windows treats text files differently than binary files, so seeking n text files is a headache. Binary files ought to be okay, but that is a theoretical opinion, not from experience. >> --------------------------------------------------------------------------- >> Jeff Newmiller The ..... ..... Go Live... >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... >> Live: OO#.. Dead: OO#.. Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k >> --------------------------------------------------------------------------- >> Sent from my phone. Please excuse my brevity. >> >> On January 13, 2015 10:51:18 AM PST, Mike Miller <mbmiller at umn.edu> wrote: >>> On Fri, 9 Jan 2015, Duncan Murdoch wrote: >>> >>>> On 09/01/2015 5:32 PM, Erin Hodgess wrote: >>>>> Hello again. >>>>> >>>>> Here is another question that I am puzzled about: I had the >>>>> (incorrect) impression that if I had Rtools on a Windows machine >>> that I >>>>> could use any tar.gz package. However, that is not true. >>>>> >>>>> In particular, I was looking at the rPython package. I do indeed >>> have >>>>> Python on this machine. But when I did R CMD INSTALL rPython, I got >>> an >>>>> error message that said, "this is a Unix package". Interesting. >>>>> >>>>> Should I just stay with my Ubuntu laptop and behave? >>>> >>>> No, but you should not use packages that misbehave. The ideal R >>> package >>>> will run on all platforms where R runs. Some require effort from the >>> >>>> user to provide prerequisites, but no good R package runs only on one >>> >>>> platform. >>> >>> >>> That reminds me to ask if anyone here can provide more details about >>> the >>> limitations of seek(). I'm working on some functions that use seek() >>> and >>> I may have to tell Windows users not to use these functions. >>> >>>> From the manual page for seek(): >>> >>> http://stat.ethz.ch/R-manual/R-devel/library/base/html/seek.html >>> >>> "Use of seek on Windows is discouraged. We have found so many errors in >>> >>> the Windows implementation of file positioning that users are advised >>> to >>> use it only at their own risk, and asked not to waste the R developers' >>> >>> time with bug reports on Windows' deficiencies." >>> >>> My question is about whether this limitation is caused by the Windows >>> filesystem, typically NTFS, or if the problem is in the Windows OS. If >>> >>> the problem were in the filesystem, maybe the docs would have said so >>> because NTFS can be used on other platforms. >>> >>> Secondly, can this problem be addressed at all by using Cygwin? I know >>> >>> that Cygwin is running in Windows, so it's still Windows, but R might >>> be >>> compiled differently, so I just thought I'd ask! ;-) >>> >>> And it doesn't matter which Windows version is used? >>> >>> Finally, if the problem is entirely in Windows, and R cannot possibly >>> overcome it, I suppose that means that it is impossible to write a >>> program >>> to run under Windows that can seek (is it fseek in C?) reliably to a >>> position in a file. If that is the case, it's going to be hard to >>> develop >>> good systems for managing bioinformatic data on Windows. >>> >>> Thanks in advance. >>> >>> Mike >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Henrik Bengtsson
2015-Jan-13 23:20 UTC
[R] seek(), Windows and Cygwin (was "a UNIX vs. Windows package question, please")
On Tue, Jan 13, 2015 at 2:05 PM, Mike Miller <mbmiller+l at gmail.com> wrote:> Thanks, everyone. This is very good news from Henrik because I am > interested only in binary connections. It sounds like a function that uses > seek() is very likely to work well in Windows, so I won't bother to warn > people. I should do a little testing just to see that it's working, though. > > Henrik -- I think you are saying that your experience has shown that the > code you wrote for catching a corner case was not needed. Is that right?Yes. From my recollection (=without having time to dig into all the code), I pretty sure I/we haven't written any Windows-specific workarounds. I was looking for some of these internally sanity checks, but I can't find them in current versions so they seem to have been removed at some time in history. Also, for your own sanity, specify argument 'rw' explicitly in every call to seek(), cf. https://github.com/HenrikBengtsson/affxparser/blob/master/R/updateCel.R#L344-L352 # 2006-08-19 # o BUG FIX: Wow wow wow. This one was tricky to find. If not specifying # the 'rw' argument in seek() it defaults to "", which is not "read" as # I naively though (because I did not read the inner details of ?seek), # but the latest call to seek. In other words, since I at the end of # every "chunk" loop call seek(..., rw="write") the seek(..., [rw=""]) # was equal to a seek(..., rw="write"), but I wanted seek(..., rw="read")! # That made updateCel() do funny things and write to the wrongs parts # of the file etc. /Henrik> > Mike > > > > On Tue, 13 Jan 2015, Henrik Bengtsson wrote: > >> I/we've been utilizing both read and write seek():s on *binary* >> connections across platforms and file systems, including Windows (at >> least NTFS, but probably also FAT/FAT32 back in the days) in the Aroma >> Framework (e.g. affxparser, R.huge) for ~8 years and counting. There >> should be thousands and thousands of Windows CPU hours for this by now >> and I still have to see a case/report where seek() was an issue. >> >> Without further references and pointers, I consider that claim in >> help("seek") mostly anecdotal (e.g. someone at some point in time had >> issues on some version on Windows and gave up on narrow it down). It >> did however made me add lots of internal sanity checks to catch a >> corner case where seek() on Windows is flaky - those assertions still >> haven't failed. >> >> I have little experience with seek() on *text* connections, so Jeff >> may have a point there. >> >> /Henrik >> >> On Tue, Jan 13, 2015 at 12:20 PM, Jeff Newmiller >> <jdnewmil at dcn.davis.ca.us> wrote: >>> >>> I don't know why the R developers made that comment, and R-devel is >>> probably a better place to follow up, but the usual problem is that Windows >>> treats text files differently than binary files, so seeking n text files is >>> a headache. Binary files ought to be okay, but that is a theoretical >>> opinion, not from experience. >>> >>> --------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>> Live... >>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>> Go... >>> Live: OO#.. Dead: OO#.. Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>> rocks...1k >>> >>> --------------------------------------------------------------------------- >>> Sent from my phone. Please excuse my brevity. >>> >>> On January 13, 2015 10:51:18 AM PST, Mike Miller <mbmiller at umn.edu> >>> wrote: >>>> >>>> On Fri, 9 Jan 2015, Duncan Murdoch wrote: >>>> >>>>> On 09/01/2015 5:32 PM, Erin Hodgess wrote: >>>>>> >>>>>> Hello again. >>>>>> >>>>>> Here is another question that I am puzzled about: I had the >>>>>> (incorrect) impression that if I had Rtools on a Windows machine >>>> >>>> that I >>>>>> >>>>>> could use any tar.gz package. However, that is not true. >>>>>> >>>>>> In particular, I was looking at the rPython package. I do indeed >>>> >>>> have >>>>>> >>>>>> Python on this machine. But when I did R CMD INSTALL rPython, I got >>>> >>>> an >>>>>> >>>>>> error message that said, "this is a Unix package". Interesting. >>>>>> >>>>>> Should I just stay with my Ubuntu laptop and behave? >>>>> >>>>> >>>>> No, but you should not use packages that misbehave. The ideal R >>>> >>>> package >>>>> >>>>> will run on all platforms where R runs. Some require effort from the >>>> >>>> >>>>> user to provide prerequisites, but no good R package runs only on one >>>> >>>> >>>>> platform. >>>> >>>> >>>> >>>> That reminds me to ask if anyone here can provide more details about >>>> the >>>> limitations of seek(). I'm working on some functions that use seek() >>>> and >>>> I may have to tell Windows users not to use these functions. >>>> >>>>> From the manual page for seek(): >>>> >>>> >>>> http://stat.ethz.ch/R-manual/R-devel/library/base/html/seek.html >>>> >>>> "Use of seek on Windows is discouraged. We have found so many errors in >>>> >>>> the Windows implementation of file positioning that users are advised >>>> to >>>> use it only at their own risk, and asked not to waste the R developers' >>>> >>>> time with bug reports on Windows' deficiencies." >>>> >>>> My question is about whether this limitation is caused by the Windows >>>> filesystem, typically NTFS, or if the problem is in the Windows OS. If >>>> >>>> the problem were in the filesystem, maybe the docs would have said so >>>> because NTFS can be used on other platforms. >>>> >>>> Secondly, can this problem be addressed at all by using Cygwin? I know >>>> >>>> that Cygwin is running in Windows, so it's still Windows, but R might >>>> be >>>> compiled differently, so I just thought I'd ask! ;-) >>>> >>>> And it doesn't matter which Windows version is used? >>>> >>>> Finally, if the problem is entirely in Windows, and R cannot possibly >>>> overcome it, I suppose that means that it is impossible to write a >>>> program >>>> to run under Windows that can seek (is it fseek in C?) reliably to a >>>> position in a file. If that is the case, it's going to be hard to >>>> develop >>>> good systems for managing bioinformatic data on Windows. >>>> >>>> Thanks in advance. >>>> >>>> Mike >>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >