thr3ads.net - R help - [R] seek(), skip by bits (not by bytes) in binary file [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Ben quant

2012-Jun-19 15:54 UTC

[R] seek(), skip by bits (not by bytes) in binary file

Hello,

Has a function been built that will skip to a certain bit in a binary file?

As of 2009 the answer was 'no':
http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html

If you feel I don't need to (like in the links above), please provide some
help. (Note this is my first time working with binary files.)

I'm still working on the script, but here is where I am right now. The for
loop is being used because:

1) I have to get down to correct position then get the info I want/need.
The stuff I am reading through (x) is not fully understood and it is a mix
of various chars, floats, integers, etc. of various sizes etc. so I don't
know who many bytes to read in unless I read them bit by bit. (The
information and structure of the information changes daily so I'm skipping
over it.)
2) If I skip all in one readBin() my 'n' value is often up to 20 times
too
big (I get an error) and/or R won't let me "allocate a vector of
size...."
etc. So I split it up into chunks (divide by 20 etc.) and read each chuck
then trash each part that is readBin()'d. Then the last line I get the data
that I want (data1).

Here is my working code:

# I have to read 'junk' bits from the to.read file which is huge integer
so
I divide it up and loop through to.read in parts (jb_part).
  divr = 20
  mod = junk %% divr

  jb_part = as.integer(junk/divr)
  jb_part_mod = jb_part + mod # catch the remainder/modulus

  to.read =
file(paste(dbs_path,"/",dbs_file,sep=""),"rb") #
connect to the
binary file
# loop in chunks to where I want to be
  for(i in 1:(divr-1)){
    x = readBin(to.read,"raw",n=jb_part,size=1)
    x = NULL # trash the result b/c I don't want it
  }
# read a a little more to include the remainder/modulus bits left over by
dividing by 20 above
  x = readBin(to.read,'raw',n=jb_part_mod,size=1)
  x = NULL # trash it

# finally get the data that I want
data1 = readBin(to.read,double(),n=some_number,size=size_to_use)

This works, but it is SLOW!  Any ideas on how to get down to the correct
bit a bit quicker (pun intended). :)

Thanks for any help!

Ben

	[[alternative HTML version deleted]]

jim holtman

2012-Jun-19 16:10 UTC

head link

[R] seek(), skip by bits (not by bytes) in binary file

I am not sure why reading through 'bit-by-bit' gets you to where you
want to be.  I assume that the file has some structure, even though it
may be changing daily.  You mentioned the various types of data that
it might contain; are they all in 'byte' sized chucks?  If you really
have data that begins in the middle of a byte and then extends over
several bytes, you will have to write some functions that will pull
out this data and then reconstruct it into an object (e.g., integer,
numeric, ...) that R understands.  Can you provide some more
definition of what the data actually looks like and how you would find
the "pattern" of the data.  Almost all systems read at the lowest
level byte sized chucks, and if you really have to get down to the bit
level to reconstruct the data, then you have to write the unpack/pack
functions.  This can all be done once you understand the structure of
the data.  So some examples would be useful if you want someone to
propose a solution.

On Tue, Jun 19, 2012 at 11:54 AM, Ben quant <ccquant at gmail.com>
wrote:> Hello,
>
> Has a function been built that will skip to a certain bit in a binary file?
>
> As of 2009 the answer was 'no':
> http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
> https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html
>
> If you feel I don't need to (like in the links above), please provide
some
> help. (Note this is my first time working with binary files.)
>
> I'm still working on the script, but here is where I am right now. The
for
> loop is being used because:
>
> 1) I have to get down to correct position then get the info I want/need.
> The stuff I am reading through (x) is not fully understood and it is a mix
> of various chars, floats, integers, etc. of various sizes etc. so I
don't
> know who many bytes to read in unless I read them bit by bit. (The
> information and structure of the information changes daily so I'm
skipping
> over it.)
> 2) If I skip all in one readBin() my 'n' value is often up to 20
times too
> big (I get an error) and/or R won't let me "allocate a vector of
size...."
> etc. So I split it up into chunks (divide by 20 etc.) and read each chuck
> then trash each part that is readBin()'d. Then the last line I get the
data
> that I want (data1).
>
> Here is my working code:
>
> # I have to read 'junk' bits from the to.read file which is huge
integer so
> I divide it up and loop through to.read in parts (jb_part).
> ?divr = 20
> ?mod = junk %% divr
>
> ?jb_part = as.integer(junk/divr)
> ?jb_part_mod = jb_part + mod # catch the remainder/modulus
>
> ?to.read =
file(paste(dbs_path,"/",dbs_file,sep=""),"rb") #
connect to the
> binary file
> # loop in chunks to where I want to be
> ?for(i in 1:(divr-1)){
> ? ?x = readBin(to.read,"raw",n=jb_part,size=1)
> ? ?x = NULL # trash the result b/c I don't want it
> ?}
> # read a a little more to include the remainder/modulus bits left over by
> dividing by 20 above
> ?x = readBin(to.read,'raw',n=jb_part_mod,size=1)
> ?x = NULL # trash it
>
> # finally get the data that I want
> data1 = readBin(to.read,double(),n=some_number,size=size_to_use)
>
> This works, but it is SLOW! ?Any ideas on how to get down to the correct
> bit a bit quicker (pun intended). :)
>
> Thanks for any help!
>
> Ben
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Maybe Matching Threads

Search for more possibly parallel threads

R help - Jun 2012 - seek(), skip by bits (not by bytes) in binary file

[R] seek(), skip by bits (not by bytes) in binary file

[R] seek(), skip by bits (not by bytes) in binary file

Maybe Matching Threads