ligges@statistik.uni-dortmund.de
2005-May-28 18:11 UTC
[Rd] (PR#7899) seek(con, 0, "end", rw="r") does not always work
Tony Plate wrote:> ligges@statistik.uni-dortmund.de wrote: > >> tplate@blackmesacapital.com wrote: >> >> >>> I've noticed that seek(con, 0, "end", rw="r") on a file connection >>> does not always work correctly after a write (R 2.1.0 on Windows). >>> >>> [Is a call to fflush() needed inside file_seek() in main/connections.c?] >> >> >> >> >> If you have an idea where to fflush() precisely and your patch works, >> please tell it! I'll happily run some test cases where seeking matters. >> > > I couldn't see why the current code was returning a bad value under some > conditions. (That's why didn't offer anything more than a suggestion). > My suggestion to use an fflush() was a guess (hence the question mark, > but evidence for the guess being correct was that doing a flush at the R > command line made the whole thing work correctly.) To be safe, I would > try to put a flush() right at the beginning of file_seek(), before the > call to f_tell(). I tried this, and with the modification the test case > I gave produced correct output. Here's how the beginning of my modified > file_seek() function (in main/connections.c) looks:>> static double file_seek(Rconnection con, double where, int origin, int rw) > { > Rfileconn this = con->private; > FILE *fp = this->fp; > #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) > off_t pos; > #else > #ifdef Win32 > off64_t pos; > #else > long pos; > #endif > #endif > int whence = SEEK_SET; > fflush(fp); > pos = f_tell(fp); > > /* make sure both positions are set */ >Works for your example, but I found another one where it introduces a worse bug when using origin="current". Hence it's not that easy. After reviewing this issue more closely, I think writeLines() into a binary connection might be the real problem and a misuse in this case. See the last paragrpah in the Details Section of ?writeLines. Hence, this might also be an issue related to the text mode connection problem on Windows. Using simple writeChar and readChar statements works as expected for me (at least, I was not able to produce anything unexpected). I'm no longer convinced that this is a bug in R.>> Note that ?seek currently tells us "The value returned by >> seek(where=NA) appears to be unreliable on Windows systems, at least >> for text files." >> It would be nice if this comment could be removed, of course .... > > > May the explanation could be given that this happens with text files > because Windows inserts extra characters at end-of-lines when reading > "text" mode files (but with binary files, things should be fine.) This > particular issue is documented in Microsoft Windows documentation (e.g., > at http://msdn2.microsoft.com/library/75yw9bf3(en-us,vs.80).aspx, found > by searching on Google using the terms "fseek windows documentation"). > Are there any known issues using seek with binary files under Windows? > If there are not, then the caveat could be made specific to text files > and all vagueness removed.Hmm, all I find (including your link) is Windows CE related ... Uwe Ligges> > -- Tony Plate > >> >> Uwe Ligges >> >> >> >> >>> Example (see the lines with the "***WRONG***" comment) >>> >>> > # seek(, rw="r") on a file does not always work correctly after a >>> write >>> > f <- file("tmp3.txt", "w+b") >>> > # Write something earlier in the file >>> > seek(f, 10, rw="w") >>> [1] 0 >>> > writeLines(c("ghi", "jkl"), f) >>> > seek(f, 20, rw="w") >>> [1] 18 >>> > writeLines(c("abc"), f) >>> > seek(f, 0, "end", rw="w") >>> [1] 24 >>> > # Try to read at the end of the file >>> > seek(f, 0, "end", rw="r") >>> [1] 0 >>> > readLines(f, -1) >>> character(0) >>> > seek(f, 0, "end", rw="w") >>> [1] 18 >>> > # write something at the end of the file >>> > writeLines(c("def"), f) >>> > # Try to read at the end of the file >>> > # flush(f) # flushing here makes the seek work correctly >>> > seek(f, 0, "end", rw="r") >>> [1] 24 >>> > seek(f, NA, rw="r") # ***WRONG*** (should return 28) >>> [1] 24 >>> > readLines(f, -1) # ***WRONG*** (should return character(0)) >>> [1] "def" >>> > seek(f, 20, rw="r") >>> [1] 28 >>> > readLines(f, -1) >>> [1] "abc" "def" >>> > seek(f, 0, "end", rw="r") # now it works correctly >>> [1] 28 >>> > seek(f, NA, rw="r") >>> [1] 28 >>> > readLines(f, -1) >>> character(0) >>> > close(f) >>> > >>> > version >>> _ >>> platform i386-pc-mingw32 >>> arch i386 >>> os mingw32 >>> system i386, mingw32 >>> status >>> major 2 >>> minor 1.0 >>> year 2005 >>> month 04 >>> day 18 >>> language R >>> > >>> >>> -- Tony Plate >>> >>> ______________________________________________ >>> R-devel@stat.math.ethz.ch mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> >> ______________________________________________ >> R-devel@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >>
tplate@blackmesacapital.com
2005-May-31 19:13 UTC
[Rd] (PR#7899) seek(con, 0, "end", rw="r") does not always work
Uwe Ligges wrote:> Tony Plate wrote: > >> ligges@statistik.uni-dortmund.de wrote: >> >>> tplate@blackmesacapital.com wrote: >>> >>> >>>> I've noticed that seek(con, 0, "end", rw="r") on a file connection >>>> does not always work correctly after a write (R 2.1.0 on Windows). >>>> >>>> [Is a call to fflush() needed inside file_seek() in >>>> main/connections.c?] >>> >>> >>> >>> >>> >>> If you have an idea where to fflush() precisely and your patch works, >>> please tell it! I'll happily run some test cases where seeking matters. >>> >> >> I couldn't see why the current code was returning a bad value under >> some conditions. (That's why didn't offer anything more than a >> suggestion). My suggestion to use an fflush() was a guess (hence the >> question mark, but evidence for the guess being correct was that doing >> a flush at the R command line made the whole thing work correctly.) >> To be safe, I would try to put a flush() right at the beginning of >> file_seek(), before the call to f_tell(). I tried this, and with the >> modification the test case I gave produced correct output. Here's how >> the beginning of my modified file_seek() function (in >> main/connections.c) looks: > > > > >> static double file_seek(Rconnection con, double where, int origin, int >> rw) >> { >> Rfileconn this = con->private; >> FILE *fp = this->fp; >> #if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE) >> off_t pos; >> #else >> #ifdef Win32 >> off64_t pos; >> #else >> long pos; >> #endif >> #endif >> int whence = SEEK_SET; >> fflush(fp); >> pos = f_tell(fp); >> >> /* make sure both positions are set */ >> > > > Works for your example, but I found another one where it introduces a > worse bug when using origin="current". Hence it's not that easy. > > After reviewing this issue more closely, I think writeLines() into a > binary connection might be the real problem and a misuse in this case. > See the last paragrpah in the Details Section of ?writeLines. Hence, > this might also be an issue related to the text mode connection problem > on Windows. > > Using simple writeChar and readChar statements works as expected for me > (at least, I was not able to produce anything unexpected). I'm no longer > convinced that this is a bug in R.I see the same (buggy) behavior when I replace the writeLines() statements by writeChar() statments (but continue using readLines()). I also see the same buggy behavior when I explicitly supply a 'sep' argument to writeLines(). Transcripts of both of these are below. [Also, in both cases, calling the R function flush() at indicated position in the transcript results in correct output.] Regarding the documentation for writeLines, it states: Normally 'writeLines' is used with a text connection, and the default separator is converted to the normal separator for that platform (LF on Unix/Linux, CRLF on Windows, CR on Classic MacOS). For more control, open a binary connection and specify the precise value you want written to the file in 'sep'. For even more control, use 'writeChar' on a binary connection. The sentence beginning "For more control" seems to permit the use of writeLines() for binary connections. What suggested to you that it was a misuse? > # seek(, rw="r") on a file does not always work a write > # even when sep="\n" is supplied to writeLines > f <- file("tmp3.txt", "w+b") > # Write something earlier in the file > seek(f, 10, rw="w") [1] 0 > writeLines(c("ghi", "jkl"), f, sep="\n") > seek(f, 20, rw="w") [1] 18 > writeLines(c("abc"), f, sep="\n") > seek(f, 0, "end", rw="w") [1] 24 > # Try to read at the end of the file > seek(f, 0, "end", rw="r") [1] 0 > readLines(f, -1) character(0) > seek(f, 0, "end", rw="w") [1] 18 > # write something at the end of the file > writeLines(c("def"), f, sep="\n") > # Try to read at the end of the file > # flush(f) # flushing here makes the seek work correctly > seek(f, 0, "end", rw="r") [1] 24 > seek(f, NA, rw="r") # ***WRONG*** (should return 28) [1] 24 > readLines(f, -1) # ***WRONG*** (should return character(0)) [1] "def" > seek(f, 20, rw="r") [1] 28 > readLines(f, -1) [1] "abc" "def" > seek(f, 0, "end", rw="r") # now it works correctly [1] 28 > seek(f, NA, rw="r") [1] 28 > readLines(f, -1) character(0) > close(f) > > # seek(, rw="r") on a file does not always work a write > # even when writeChar is used instead of writeLines > f <- file("tmp3.txt", "w+b") > # Write something earlier in the file > seek(f, 10, rw="w") [1] 0 > writeChar(c("ghi\n", "jkl\n"), f, eos=NULL) > seek(f, 20, rw="w") [1] 18 > writeChar(c("abc\n"), f, eos=NULL) > seek(f, 0, "end", rw="w") [1] 24 > # Try to read at the end of the file > seek(f, 0, "end", rw="r") [1] 0 > readLines(f, -1) character(0) > seek(f, 0, "end", rw="w") [1] 18 > # write something at the end of the file > writeChar(c("def\n"), f, eos=NULL) > # Try to read at the end of the file > # flush(f) # flushing here makes the seek work correctly > seek(f, 0, "end", rw="r") [1] 24 > seek(f, NA, rw="r") # ***WRONG*** (should return 28) [1] 24 > readLines(f, -1) # ***WRONG*** (should return character(0)) [1] "def" > seek(f, 20, rw="r") [1] 28 > readLines(f, -1) [1] "abc" "def" > seek(f, 0, "end", rw="r") # now it works correctly [1] 28 > seek(f, NA, rw="r") [1] 28 > readLines(f, -1) character(0) > close(f) > > version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 1.0 year 2005 month 04 day 18 language R > -- Tony Plate
Seemingly Similar Threads
- seek(con, 0, "end", rw="r") does not always work correctly (PR#7901)
- problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)
- problems with truncate() with files > 2Gb under Windows (PR#7880)
- seek(con, 0, "end", rw="r") does not always work correctly (PR#7899)
- fix for broken largefile seek() on 32-bit linux (PR#9883)