Juan Pablo Lewinger
2007-May-15 23:02 UTC
[R] Efficiently reading random lines form a large file
I need to read two different random lines at a time from a large
ASCII file (120 x 296976) containing space delimited 0-1 entries.
The following code does the job and it's reasonable fast for my needs:
lineNumber = sample(120, 2)
line1 = scan(filename, what = "integer", skip=lineNumber[1]-1,
nlines=1)
line2 = scan(filename, what = "integer", skip=lineNumber[2]-1,
nlines=1)
> system.time(for (i in 50){
+ lineNumber = sample(120, 2)
+ line1 = scan(filename, what = "integer", skip=lineNumber[1]-1,
nlines=1)
+ line2 = scan(filename, what = "integer", skip=lineNumber[2]-1,
nlines=1)
+ })
Read 296976 items
Read 296976 items
[1] 14.24 0.12 14.51 NA NA
However, I'm wondering if there's an even faster way to do this. Is
there?
> sessionInfo()
R version 2.4.1 (2006-12-18)
i386-pc-mingw32
Juan Pablo Lewinger
Department of Preventive Medicine
Keck School of Medicine
University of Southern California
1540 Alcazar Street, CHP-220
Los Angeles, CA 90089-9011, USA
Marc Schwartz
2007-May-15 23:19 UTC
[R] Efficiently reading random lines form a large file
On Tue, 2007-05-15 at 16:02 -0700, Juan Pablo Lewinger wrote:> I need to read two different random lines at a time from a large > ASCII file (120 x 296976) containing space delimited 0-1 entries. > > The following code does the job and it's reasonable fast for my needs: > > lineNumber = sample(120, 2) > line1 = scan(filename, what = "integer", skip=lineNumber[1]-1, nlines=1) > line2 = scan(filename, what = "integer", skip=lineNumber[2]-1, nlines=1) > > > system.time(for (i in 50){ > + lineNumber = sample(120, 2) > + line1 = scan(filename, what = "integer", skip=lineNumber[1]-1, nlines=1) > + line2 = scan(filename, what = "integer", skip=lineNumber[2]-1, nlines=1) > + }) > > Read 296976 items > Read 296976 items > [1] 14.24 0.12 14.51 NA NA > > However, I'm wondering if there's an even faster way to do this. Is there?You might want to take a look at this post by Jim Holtman from earlier in the year for some ideas: http://tolstoy.newcastle.edu.au/R/e2/help/07/02/9709.html HTH, Marc Schwartz