Hi, A piece of my code that uses readBin() to read a certain file type is behaving strangely with R 2.7.0. This seems to be because of a failure to match() strings after using rawToChar() when the original was terminated with a "\0" character. Direct equality testing with =still works as expected. I can reproduce this as follows:> x <- "foo" > y <- c(charToRaw("foo"),as.raw(0)) > z <- rawToChar(y) > z==x[1] TRUE> z=="foo"[1] TRUE> z %in% c("foo","bar")[1] FALSE> z %in% c("foo","bar","foo\0")[1] FALSE But without the nul character it works fine:> zz <- rawToChar(charToRaw("foo")) > zz %in% c("foo","bar")[1] TRUE I don't see anything about this in the latest NEWS, but is this expected behaviour? Or is it, as I suspect, a bug? This seems to be new to R 2.7.0, as I said. Regards, Jon
Apologies for missing out the sessionInfo(): R version 2.7.0 (2008-04-22) i386-apple-darwin8.10.1 locale: en_GB.UTF-8/en_US.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base 2008/4/28 Jon Clayden <jon.clayden at gmail.com>:> Hi, > > A piece of my code that uses readBin() to read a certain file type is > behaving strangely with R 2.7.0. This seems to be because of a failure > to match() strings after using rawToChar() when the original was > terminated with a "\0" character. Direct equality testing with => still works as expected. I can reproduce this as follows: > > > x <- "foo" > > y <- c(charToRaw("foo"),as.raw(0)) > > z <- rawToChar(y) > > z==x > [1] TRUE > > z=="foo" > [1] TRUE > > z %in% c("foo","bar") > [1] FALSE > > z %in% c("foo","bar","foo\0") > [1] FALSE > > But without the nul character it works fine: > > > zz <- rawToChar(charToRaw("foo")) > > zz %in% c("foo","bar") > [1] TRUE > > I don't see anything about this in the latest NEWS, but is this > expected behaviour? Or is it, as I suspect, a bug? This seems to be > new to R 2.7.0, as I said. > > Regards, > Jon >
Prof Brian Ripley
2008-Apr-28 10:54 UTC
[Rd] R 2.7.0, match() and strings containing \0 - bug?
On Mon, 28 Apr 2008, Jon Clayden wrote:> Hi, > > A piece of my code that uses readBin() to read a certain file type is > behaving strangely with R 2.7.0. This seems to be because of a failure > to match() strings after using rawToChar() when the original was > terminated with a "\0" character. Direct equality testing with => still works as expected. I can reproduce this as follows: > >> x <- "foo" >> y <- c(charToRaw("foo"),as.raw(0)) >> z <- rawToChar(y) >> z==x > [1] TRUE >> z=="foo" > [1] TRUE >> z %in% c("foo","bar") > [1] FALSE >> z %in% c("foo","bar","foo\0") > [1] FALSE > > But without the nul character it works fine: > >> zz <- rawToChar(charToRaw("foo")) >> zz %in% c("foo","bar") > [1] TRUE > > I don't see anything about this in the latest NEWS, but is this > expected behaviour? Or is it, as I suspect, a bug? This seems to be > new to R 2.7.0, as I said.And so is the comment in ?match: Character inputs with embedded nul bytes will be truncated at the first nul. The bug is in the documentation here -- this was intentional. As support for embedded nuls in character strings is being removed in R 2.8.0, you should not rely on this. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi Jon, * On 2008-04-28 at 11:00 +0100 Jon Clayden wrote:> A piece of my code that uses readBin() to read a certain file type is > behaving strangely with R 2.7.0. This seems to be because of a failure > to match() strings after using rawToChar() when the original was > terminated with a "\0" character. Direct equality testing with => still works as expected. I can reproduce this as follows: > > > x <- "foo" > > y <- c(charToRaw("foo"),as.raw(0)) > > z <- rawToChar(y) > > z==x > [1] TRUE > > z=="foo" > [1] TRUE > > z %in% c("foo","bar") > [1] FALSE > > z %in% c("foo","bar","foo\0") > [1] FALSE > > But without the nul character it works fine: > > > zz <- rawToChar(charToRaw("foo")) > > zz %in% c("foo","bar") > [1] TRUE > > I don't see anything about this in the latest NEWS, but is this > expected behaviour? Or is it, as I suspect, a bug? This seems to be > new to R 2.7.0, as I said.The short answer is that your example works in R-2.6 and in the current R-devel. Whether the behavior in R-2.7 is a bug is perhaps in the eye of the beholder. Historically, R's internal string representation allowed for embedded nul characters. This was particularly useful before the raw vector type, RAWSXP, was introduced. Since the vast majority of R's internal string processing functions use standard C semantics and truncated at first nul there has always been some room for "interesting" behavior. The change in R-2.7 was an attempt to start resolving these inconsistencies. Since then the core team has agreed to remove the partial support for embedded nul in character strings -- raw can be used when this is desired, and having nul terminated strings will make the code more consistent and easier to maintain going forward. Best Wishes, + seth -- Seth Falcon | http://userprimary.net/user/
Hi Jon, Jon Clayden wrote:> Hi, > > A piece of my code that uses readBin() to read a certain file type is > behaving strangely with R 2.7.0. This seems to be because of a failure > to match() strings after using rawToChar() when the original was > terminated with a "\0" character. Direct equality testing with => still works as expected. I can reproduce this as follows: > >> x <- "foo" >> y <- c(charToRaw("foo"),as.raw(0)) >> z <- rawToChar(y) >> z==x > [1] TRUE >> z=="foo" > [1] TRUE >> z %in% c("foo","bar") > [1] FALSE >> z %in% c("foo","bar","foo\0") > [1] FALSEBut this gives TRUE: > z %in% c("foo","bar", z) [1] TRUE An additional problem you have here is that when the "foo\0" string literal is converted into a character string, then the string data that are after the first embedded nul are dropped: > identical("foo\0a\0b", "foo") [1] TRUE And to add to the endless source of surprises that come with embedded nuls: > dump("z", file="") z <- "foo\0" but of course sourcing the above dump into an R session will not restore 'z'. Dropping support for embedded nuls in R 2.8.0 sounds like good news to me. Cheers, H.> > But without the nul character it works fine: > >> zz <- rawToChar(charToRaw("foo")) >> zz %in% c("foo","bar") > [1] TRUE > > I don't see anything about this in the latest NEWS, but is this > expected behaviour? Or is it, as I suspect, a bug? This seems to be > new to R 2.7.0, as I said. > > Regards, > Jon > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >