alex at apeironsports.com
2008-Aug-29 22:40 UTC
[Rd] scan after seek in text files (PR#12640)
Full_Name: Dr. Alex Sheppard Version: 2.7.1 OS: Linux Debian Lenny Submission from: (NULL) (79.73.224.62) After scanning from an open (text) connection, then seeking, then scanning again, the second scan returns incorrect result. It looks like the first byte scanned was from the pre-seek file position, then it continues to read from the post-seek file position. To reproduce: #Put 3x3 matrix in a file> write.matrix(t(matrix(1:9,nrow=3)),file="TEST.txt",sep="\t")#Open file as text> fd <- file("TEST.txt",open="rt")#scan a couple of fields - this looks fine so far> scan(file=fd,what=double(),n=2)Read 2 items [1] 1 2 #seek back to start of file> seek(con=fd,where=0,origin="start")[1] 5 #scan fields again - this doesn't work properly> scan(file=fd,what=double(),n=2)Read 2 items [1] 31 2 This happens when either n or nmax arguments are used to control number of fields read. Problem does not occur when using nlines argument instead. The seek appears to work ok, as doing readChar(fd,n=1) after the seek operation correctly returns "1". Also, if the file is opened as binary, i.e. fd=file("TEST.txt",open="rb") , all works fine.
The issue is the pushback on text files (which readChar does not use). I've altered the logic in 2.7.2 patched so that seek() clears the pushback. NB: 1) Your code is incomplete (you need library(MASS)), 2) You are asked not to report on obsolete versions of R. On Sat, 30 Aug 2008, alex at apeironsports.com wrote:> Full_Name: Dr. Alex Sheppard > Version: 2.7.1 > OS: Linux Debian Lenny > Submission from: (NULL) (79.73.224.62) > > > After scanning from an open (text) connection, then seeking, then scanning > again, the second scan returns incorrect result. It looks like the first byte > scanned was from the pre-seek file position, then it continues to read from the > post-seek file position. > > > To reproduce: > > #Put 3x3 matrix in a file >> write.matrix(t(matrix(1:9,nrow=3)),file="TEST.txt",sep="\t") > > #Open file as text >> fd <- file("TEST.txt",open="rt") > > #scan a couple of fields - this looks fine so far >> scan(file=fd,what=double(),n=2) > Read 2 items > [1] 1 2 > > #seek back to start of file >> seek(con=fd,where=0,origin="start") > [1] 5 > > #scan fields again - this doesn't work properly >> scan(file=fd,what=double(),n=2) > Read 2 items > [1] 31 2 > > This happens when either n or nmax arguments are used to control number of > fields read. Problem does not occur when using nlines argument instead. The seek > appears to work ok, as doing readChar(fd,n=1) after the seek operation correctly > returns "1". > Also, if the file is opened as binary, i.e. fd=file("TEST.txt",open="rb") , all > works fine. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Seemingly Similar Threads
- Buffering in R 3.5 connections causes incorrect data in readChar
- Buffering in R 3.5 connections causes incorrect data in readChar
- read.table() causes segfault with incorrect data (PR#11627)
- scan() vs readChar() speed
- (PR#7899) seek(con, 0, "end", rw="r") does not always work