thr3ads.net - R help - [R] dealing with NA in readBin() and writeBin() [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Mike Miller

2015-Jan-04 22:13 UTC

[R] dealing with NA in readBin() and writeBin()

The help doc for readBin writeBin tells me this:

Handling R's missing and special (Inf, -Inf and NaN) values is discussed 
in the ?R Data Import/Export? manual.

So I go here:

http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values

Unfortunately, I don't really understand that.  Suppose I am using 
single-byte integers and I want 255 (binary 11111111) to be translated to 
NA.  Is it possible to do that?  Of course I could always do something 
like this:

X[ X==255 ] <- NA

The problem with that is that I want to process the data on the fly, 
dividing the integer to produce a double in the range from 0 to 2:

X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)/127

It looks like this still works:

X[ X==255/127 ] <- NA

It would be neater if there were some kind of translation option for the 
input stream, like the way GNU tr (Linux/UNIX) works.  I'm looking around 
and not finding such a thing.  I can use gsub() to translate on the fly 
and then coerce back to integer format:

X <- as.integer(gsub("255", NA, readBin( file,
what="integer", n=N, size=1, signed=FALSE)))/127

What is your opinion of that tactic?  Is there a better way?  I don't know 
if that has any advantage on the postprocessing tactic above.  Maybe what 
I need is something like gsub() that can operate on numeric values...

X <- numsub(255, NA, readBin( file, what="integer", n=N, size=1,
signed=FALSE))/127

...but if that isn't better in terms of speed or memory usage than 
postprocessing like this...

X[ X==255/127 ] <- NA

...then I really don't need it (for this, but it would be good to know 
about).


The na.strings = "NA" functionality of scan() is neat, but I guess
that
doesn't work with the binary read system.  I don't think I can scan the 
readBin input because it isn't a file or stdin.

Mike

Duncan Murdoch

2015-Jan-04 22:27 UTC

head link

[R] dealing with NA in readBin() and writeBin()

On 04/01/2015 5:13 PM, Mike Miller wrote:> The help doc for readBin writeBin tells me this:
> 
> Handling R's missing and special (Inf, -Inf and NaN) values is
discussed
> in the ?R Data Import/Export? manual.
> 
> So I go here:
> 
> http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values
> 
> Unfortunately, I don't really understand that.  Suppose I am using 
> single-byte integers and I want 255 (binary 11111111) to be translated to 
> NA.  Is it possible to do that?  Of course I could always do something 
> like this:
> 
> X[ X==255 ] <- NA
> 
> The problem with that is that I want to process the data on the fly, 
> dividing the integer to produce a double in the range from 0 to 2:
> 
> X <- readBin( file, what="integer", n=N, size=1,
signed=FALSE)/127
Why?  Why not do it in three steps, i.e.

X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)
X[ X==255 ] <- NA
X <- X/127

If you are worried about the extra typing, then write a function to
handle all three steps.
> 
> It looks like this still works:
> 
> X[ X==255/127 ] <- NA
I suspect that would work on all current platforms, but I wouldn't trust
it.  Don't use == on floating point values unless you know they are
fractions with 2^n in the denominator.
> It would be neater if there were some kind of translation option for the 
> input stream, like the way GNU tr (Linux/UNIX) works.  I'm looking
around
> and not finding such a thing.  I can use gsub() to translate on the fly 
> and then coerce back to integer format:
It's really trivial to write a wrapper for readBin to do what you want:

myReadBin <- function(...) {
  X <- readBin(...)
  X[ X==255 ] <- NA
  X
}

Duncan Murdoch
> 
> X <- as.integer(gsub("255", NA, readBin( file,
what="integer", n=N, size=1, signed=FALSE)))/127
> 
> What is your opinion of that tactic?  Is there a better way?  I don't
know
> if that has any advantage on the postprocessing tactic above.  Maybe what 
> I need is something like gsub() that can operate on numeric values...
> 
> X <- numsub(255, NA, readBin( file, what="integer", n=N,
size=1, signed=FALSE))/127
> 
> ...but if that isn't better in terms of speed or memory usage than 
> postprocessing like this...
> 
> X[ X==255/127 ] <- NA
> 
> ...then I really don't need it (for this, but it would be good to know 
> about).
> 
> 
> The na.strings = "NA" functionality of scan() is neat, but I
guess that
> doesn't work with the binary read system.  I don't think I can scan
the
> readBin input because it isn't a file or stdin.
> 
> Mike
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Mike Miller

2015-Jan-04 22:40 UTC

head link

[R] dealing with NA in readBin() and writeBin()

On Sun, 4 Jan 2015, Duncan Murdoch wrote:
> On 04/01/2015 5:13 PM, Mike Miller wrote:
>> The help doc for readBin writeBin tells me this:
>>
>> Handling R's missing and special (Inf, -Inf and NaN) values is
discussed
>> in the ?R Data Import/Export? manual.
>>
>> So I go here:
>>
>>
http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values
>>
>> Unfortunately, I don't really understand that.  Suppose I am using
>> single-byte integers and I want 255 (binary 11111111) to be translated
to
>> NA.  Is it possible to do that?  Of course I could always do something
>> like this:
>>
>> X[ X==255 ] <- NA
>>
>> The problem with that is that I want to process the data on the fly,
>> dividing the integer to produce a double in the range from 0 to 2:
>>
>> X <- readBin( file, what="integer", n=N, size=1,
signed=FALSE)/127
>
> Why?  Why not do it in three steps, i.e.
>
> X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)
> X[ X==255 ] <- NA
> X <- X/127
>
> If you are worried about the extra typing, then write a function to 
> handle all three steps.
The thing I was concerned about is the memory usage, not the typing, 
because everything will be scripted.  But maybe memory isn't an issue and 
I never have to hold two copies in memory simultaneously.  There will be 
about 50 million elements, typically.

I think in terms of processing numbers that are streaming into memory, but 
that might not be what R is doing.  For example, with scan() and 
na.strings="NA", I picture it changing strings to NA as they are read,
it
might load the whole file as character, then do all the work with things 
like what=numeric() and na.strings="NA" after the fact.  Maybe that 
doesn't impose an extra memory burden.

>> It looks like this still works:
>>
>> X[ X==255/127 ] <- NA
>
> I suspect that would work on all current platforms, but I wouldn't
trust
> it.  Don't use == on floating point values unless you know they are 
> fractions with 2^n in the denominator.
Good point about platforms.  I was concerned about the use of ==, and 
you've convinced me it is not trustworthy.

Thanks very much.

Mike

R help - Jan 2015 - dealing with NA in readBin() and writeBin()

[R] dealing with NA in readBin() and writeBin()

[R] dealing with NA in readBin() and writeBin()

[R] dealing with NA in readBin() and writeBin()