The help doc for readBin writeBin tells me this: Handling R's missing and special (Inf, -Inf and NaN) values is discussed in the ?R Data Import/Export? manual. So I go here: http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values Unfortunately, I don't really understand that. Suppose I am using single-byte integers and I want 255 (binary 11111111) to be translated to NA. Is it possible to do that? Of course I could always do something like this: X[ X==255 ] <- NA The problem with that is that I want to process the data on the fly, dividing the integer to produce a double in the range from 0 to 2: X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)/127 It looks like this still works: X[ X==255/127 ] <- NA It would be neater if there were some kind of translation option for the input stream, like the way GNU tr (Linux/UNIX) works. I'm looking around and not finding such a thing. I can use gsub() to translate on the fly and then coerce back to integer format: X <- as.integer(gsub("255", NA, readBin( file, what="integer", n=N, size=1, signed=FALSE)))/127 What is your opinion of that tactic? Is there a better way? I don't know if that has any advantage on the postprocessing tactic above. Maybe what I need is something like gsub() that can operate on numeric values... X <- numsub(255, NA, readBin( file, what="integer", n=N, size=1, signed=FALSE))/127 ...but if that isn't better in terms of speed or memory usage than postprocessing like this... X[ X==255/127 ] <- NA ...then I really don't need it (for this, but it would be good to know about). The na.strings = "NA" functionality of scan() is neat, but I guess that doesn't work with the binary read system. I don't think I can scan the readBin input because it isn't a file or stdin. Mike
On 04/01/2015 5:13 PM, Mike Miller wrote:> The help doc for readBin writeBin tells me this: > > Handling R's missing and special (Inf, -Inf and NaN) values is discussed > in the ?R Data Import/Export? manual. > > So I go here: > > http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values > > Unfortunately, I don't really understand that. Suppose I am using > single-byte integers and I want 255 (binary 11111111) to be translated to > NA. Is it possible to do that? Of course I could always do something > like this: > > X[ X==255 ] <- NA > > The problem with that is that I want to process the data on the fly, > dividing the integer to produce a double in the range from 0 to 2: > > X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)/127Why? Why not do it in three steps, i.e. X <- readBin( file, what="integer", n=N, size=1, signed=FALSE) X[ X==255 ] <- NA X <- X/127 If you are worried about the extra typing, then write a function to handle all three steps.> > It looks like this still works: > > X[ X==255/127 ] <- NAI suspect that would work on all current platforms, but I wouldn't trust it. Don't use == on floating point values unless you know they are fractions with 2^n in the denominator.> It would be neater if there were some kind of translation option for the > input stream, like the way GNU tr (Linux/UNIX) works. I'm looking around > and not finding such a thing. I can use gsub() to translate on the fly > and then coerce back to integer format:It's really trivial to write a wrapper for readBin to do what you want: myReadBin <- function(...) { X <- readBin(...) X[ X==255 ] <- NA X } Duncan Murdoch> > X <- as.integer(gsub("255", NA, readBin( file, what="integer", n=N, size=1, signed=FALSE)))/127 > > What is your opinion of that tactic? Is there a better way? I don't know > if that has any advantage on the postprocessing tactic above. Maybe what > I need is something like gsub() that can operate on numeric values... > > X <- numsub(255, NA, readBin( file, what="integer", n=N, size=1, signed=FALSE))/127 > > ...but if that isn't better in terms of speed or memory usage than > postprocessing like this... > > X[ X==255/127 ] <- NA > > ...then I really don't need it (for this, but it would be good to know > about). > > > The na.strings = "NA" functionality of scan() is neat, but I guess that > doesn't work with the binary read system. I don't think I can scan the > readBin input because it isn't a file or stdin. > > Mike > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Sun, 4 Jan 2015, Duncan Murdoch wrote:> On 04/01/2015 5:13 PM, Mike Miller wrote: >> The help doc for readBin writeBin tells me this: >> >> Handling R's missing and special (Inf, -Inf and NaN) values is discussed >> in the ?R Data Import/Export? manual. >> >> So I go here: >> >> http://cran.r-project.org/doc/manuals/r-release/R-data.html#Special-values >> >> Unfortunately, I don't really understand that. Suppose I am using >> single-byte integers and I want 255 (binary 11111111) to be translated to >> NA. Is it possible to do that? Of course I could always do something >> like this: >> >> X[ X==255 ] <- NA >> >> The problem with that is that I want to process the data on the fly, >> dividing the integer to produce a double in the range from 0 to 2: >> >> X <- readBin( file, what="integer", n=N, size=1, signed=FALSE)/127 > > Why? Why not do it in three steps, i.e. > > X <- readBin( file, what="integer", n=N, size=1, signed=FALSE) > X[ X==255 ] <- NA > X <- X/127 > > If you are worried about the extra typing, then write a function to > handle all three steps.The thing I was concerned about is the memory usage, not the typing, because everything will be scripted. But maybe memory isn't an issue and I never have to hold two copies in memory simultaneously. There will be about 50 million elements, typically. I think in terms of processing numbers that are streaming into memory, but that might not be what R is doing. For example, with scan() and na.strings="NA", I picture it changing strings to NA as they are read, it might load the whole file as character, then do all the work with things like what=numeric() and na.strings="NA" after the fact. Maybe that doesn't impose an extra memory burden.>> It looks like this still works: >> >> X[ X==255/127 ] <- NA > > I suspect that would work on all current platforms, but I wouldn't trust > it. Don't use == on floating point values unless you know they are > fractions with 2^n in the denominator.Good point about platforms. I was concerned about the use of ==, and you've convinced me it is not trustworthy. Thanks very much. Mike