thr3ads.net - R help - [Rd] readBin is much slower for raw input than for a file [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Jon Clayden

2007-Jan-26 11:25 UTC

[Rd] readBin is much slower for raw input than for a file

Dear all,

I'm trying to write an efficient binary file reader for a file type
that is made up of several fields of variable length, and so requires
many small reads. Doing this on the file directly using a sequence of
readBin() calls is a bit too slow for my needs, so I tried buffering
the file into a raw vector and reading from that ("loc" is the
equivalent of the file pointer):

fileSize <- file.info(fileName)$size
connection <- file(fileName, "rb")
bytes <- readBin(connection, "raw", n=fileSize)
loc <- 0
close(connection)

--

# within a custom read function:
if (loc == 0)
    data <- readBin(bytes, what, n, size, ...)
else if (loc > 0)
    data <- readBin(bytes[-(1:loc)], what, n, size, ...)

However, this method runs almost 10 times slower for me than the
sequence of file reads did. The initial call to readBin() - for
reading in the file - is very quick, but running Rprof shows that the
vast majority of the run time in doing the full parse is spent in
readBin, so it does seem to be that that's slowing things down. Can
anyone shed any light on why this is?

I'm not expecting miracles here - and I realise that writing the whole
read routine in C would be much quicker - but surely reading from a
raw vector should work out faster than reading from a file? The system
is R-2.4.1/Linux, Xeon 3.2 GHz, 2 GiB RAM; typical file size is 44
KiB.

Thanks in advance,
Jon

Jon Clayden

2007-Jan-31 11:03 UTC

head link

[R] readBin is much slower for raw input than for a file

This hasn't generated any feedback after a few days on R-devel, so I'm
forwarding it to R-help in case anyone here has any ideas...

Thanks,
Jon

---------- Forwarded message ----------
From: Jon Clayden <jon.clayden at gmail.com>
Date: 26-Jan-2007 11:25
Subject: readBin is much slower for raw input than for a file
To: r-devel at r-project.org

Dear all,

I'm trying to write an efficient binary file reader for a file type
that is made up of several fields of variable length, and so requires
many small reads. Doing this on the file directly using a sequence of
readBin() calls is a bit too slow for my needs, so I tried buffering
the file into a raw vector and reading from that ("loc" is the
equivalent of the file pointer):

fileSize <- file.info(fileName)$size
connection <- file(fileName, "rb")
bytes <- readBin(connection, "raw", n=fileSize)
loc <- 0
close(connection)

--

# within a custom read function:
if (loc == 0)
    data <- readBin(bytes, what, n, size, ...)
else if (loc > 0)
    data <- readBin(bytes[-(1:loc)], what, n, size, ...)

However, this method runs almost 10 times slower for me than the
sequence of file reads did. The initial call to readBin() - for
reading in the file - is very quick, but running Rprof shows that the
vast majority of the run time in doing the full parse is spent in
readBin, so it does seem to be that that's slowing things down. Can
anyone shed any light on why this is?

I'm not expecting miracles here - and I realise that writing the whole
read routine in C would be much quicker - but surely reading from a
raw vector should work out faster than reading from a file? The system
is R-2.4.1/Linux, Xeon 3.2 GHz, 2 GiB RAM; typical file size is 44
KiB.

Thanks in advance,
Jon

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jan 2007 - readBin is much slower for raw input than for a file

[Rd] readBin is much slower for raw input than for a file

[R] readBin is much slower for raw input than for a file

Possibly Parallel Threads