Hi, I am looking for an efficient way of skipping big chunks of lines on a connection (not necessarily at the beginning of the file). One way is to use read lines, e.g. readLines(1e6), but a) this incurs the overhead of construction of the return char vector and b) has a (fairly remote) potential to blow up the memory. Another way would be to use scan(), e.g. scan(con, skip=1e6, nmax=0) but somehow this doesn't work:> scan(con, skip=10, nmax=0)Error in scan(con, skip = 10, nmax = 0) : "scan" expected a real, got "A;12;0;" I can stick to readLines, but am curious if there is a better way. I use R-1.8.1 on RH-7.3. Thanks, Vadim [[alternative HTML version deleted]]
?seek Vadim Ogranovich <vograno <at> evafunds.com> writes: : : Hi, : : I am looking for an efficient way of skipping big chunks of lines on a : connection (not necessarily at the beginning of the file). One way is to : use read lines, e.g. readLines(1e6), but a) this incurs the overhead of : construction of the return char vector and b) has a (fairly remote) : potential to blow up the memory. : : Another way would be to use scan(), e.g. : : scan(con, skip=1e6, nmax=0) : : but somehow this doesn't work: : : > scan(con, skip=10, nmax=0) : Error in scan(con, skip = 10, nmax = 0) : : "scan" expected a real, got "A;12;0;" : : I can stick to readLines, but am curious if there is a better way. : : I use R-1.8.1 on RH-7.3. : : Thanks, : Vadim
Unfortunately, seek only works in terms of bytes not lines and I only know how many lines I need to skip, but not bytes. -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendieck at myway.com] Sent: Saturday, May 01, 2004 3:44 PM To: r-help at stat.math.ethz.ch Subject: Re: [R] skip lines on a connection ?seek Vadim Ogranovich <vograno <at> evafunds.com> writes: : : Hi, : : I am looking for an efficient way of skipping big chunks of lines on a : connection (not necessarily at the beginning of the file). One way is to : use read lines, e.g. readLines(1e6), but a) this incurs the overhead of : construction of the return char vector and b) has a (fairly remote) : potential to blow up the memory. : : Another way would be to use scan(), e.g. : : scan(con, skip=1e6, nmax=0) : : but somehow this doesn't work: : : > scan(con, skip=10, nmax=0) : Error in scan(con, skip = 10, nmax = 0) : : "scan" expected a real, got "A;12;0;" : : I can stick to readLines, but am curious if there is a better way. : : I use R-1.8.1 on RH-7.3. : : Thanks, : Vadim ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Your scan() call doesn't work because default argument what=0; i.e., it expects numeric data. You probably can just use what="". The other alternative is to just loop readLines() n times, reading one line at a time. It probably won't be too bad in terms of time, and surely will save on memory usage. (Try using replicate().) HTH, Andy> From: Vadim Ogranovich > > Unfortunately, seek only works in terms of bytes not lines and I only > know how many lines I need to skip, but not bytes. > > > -----Original Message----- > From: Gabor Grothendieck [mailto:ggrothendieck at myway.com] > Sent: Saturday, May 01, 2004 3:44 PM > To: r-help at stat.math.ethz.ch > Subject: Re: [R] skip lines on a connection > > > > > ?seek > > Vadim Ogranovich <vograno <at> evafunds.com> writes: > > : > : Hi, > : > : I am looking for an efficient way of skipping big chunks of > lines on a > : connection (not necessarily at the beginning of the file). > One way is > to > : use read lines, e.g. readLines(1e6), but a) this incurs the overhead > of > : construction of the return char vector and b) has a (fairly remote) > : potential to blow up the memory. > : > : Another way would be to use scan(), e.g. > : > : scan(con, skip=1e6, nmax=0) > : > : but somehow this doesn't work: > : > : > scan(con, skip=10, nmax=0) > : Error in scan(con, skip = 10, nmax = 0) : > : "scan" expected a real, got "A;12;0;" > : > : I can stick to readLines, but am curious if there is a better way. > : > : I use R-1.8.1 on RH-7.3. > : > : Thanks, > : Vadim > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Andy, It is surprising that scan() attempts to read anything at all: note that I set nmax=0, which AFAIK means read no lines. Thank you for a reference to replicate(). I didn't know about it. Thanks, Vadim -----Original Message----- From: Liaw, Andy [mailto:andy_liaw at merck.com] Sent: Saturday, May 01, 2004 5:28 PM To: Vadim Ogranovich; r-help at stat.math.ethz.ch Subject: RE: [R] skip lines on a connection Your scan() call doesn't work because default argument what=0; i.e., it expects numeric data. You probably can just use what="". The other alternative is to just loop readLines() n times, reading one line at a time. It probably won't be too bad in terms of time, and surely will save on memory usage. (Try using replicate().) HTH, Andy> From: Vadim Ogranovich > > Unfortunately, seek only works in terms of bytes not lines and I only > know how many lines I need to skip, but not bytes. > > > -----Original Message----- > From: Gabor Grothendieck [mailto:ggrothendieck at myway.com] > Sent: Saturday, May 01, 2004 3:44 PM > To: r-help at stat.math.ethz.ch > Subject: Re: [R] skip lines on a connection > > > > > ?seek > > Vadim Ogranovich <vograno <at> evafunds.com> writes: > > : > : Hi, > : > : I am looking for an efficient way of skipping big chunks of > lines on a > : connection (not necessarily at the beginning of the file). > One way is > to > : use read lines, e.g. readLines(1e6), but a) this incurs the overhead > of > : construction of the return char vector and b) has a (fairly remote) > : potential to blow up the memory. > : > : Another way would be to use scan(), e.g. > : > : scan(con, skip=1e6, nmax=0) > : > : but somehow this doesn't work: > : > : > scan(con, skip=10, nmax=0) > : Error in scan(con, skip = 10, nmax = 0) : > : "scan" expected a real, got "A;12;0;" > : > : I can stick to readLines, but am curious if there is a better way. > : > : I use R-1.8.1 on RH-7.3. > : > : Thanks, > : Vadim > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >------------------------------------------------------------------------ ------ Notice: This e-mail message, together with any attachments,...{{dropped}}
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] > Sent: Saturday, May 01, 2004 11:44 PM > You will be telling us next you think the default nmax=-1 > means to read a negative number of lines!No, I won't. Your extrapolation is inaccurate.> ... So reading no > lines would mean not calling scan at all, and what would be > the point of that?It would mean skipping the number of lines specified in the skip argument thus advancing the read point on the connection to where I want it to be. I guess you wouldn't argue that seek(con, where) has no meaning.> > nmax <= 0 and nlines <= 0 are ignored. > > Note carefully what nmax actually means, and it is not what `nlines' > means!I had noted that. If one reads no "data value" one reads no line, so the two should have the same effect in the case at hand.> Do read the documentation for scan, too, please.I had. For your convenience this is what it says about nmax. nmax: the maximum number of data values to be read, or if 'what' is a list, the maximum number of records to be read. If omitted (and 'nlines' is not set to a positive value), 'scan' will read to the end of 'file'. It is hard to see from the text that nmax=0 is ignored since "omitted" means leaving it set to -1. BTW, the paragraph regarding 'nlines' doesn't mention that nlines=0 is a special case either. nlines: the maximum number of lines of data to be read.> Note that to read *lines* you do need to read every byte on > the file to > find the EOL marker(s) so readLines() or scan() with NULL in > "what" are as > good as anything. You can use them in blocks of lines, in a loop.This is a very nice trick indeed! Just what I've been looking for. Thank you very much, Vadim