I noticed the following peculiarity with `serialize()' when `ascii = TRUE' is used. In today's (svn r37299) R-devel, I get > set.seed(10) > x <- rnorm(10) > > a <- serialize(x, con = NULL, ascii = TRUE) > b <- unserialize(a) > > identical(x, b) ## FALSE [1] FALSE > x - b [1] -3.469447e-18 2.775558e-17 -4.440892e-16 0.000000e+00 5.551115e-17 [6] -5.551115e-17 -4.440892e-16 0.000000e+00 2.220446e-16 -5.551115e-17 I expected `x' and `b' to be identical, which is what I get when `ascii = FALSE': > a <- serialize(x, con = NULL, ascii = FALSE) > b <- unserialize(a) > > identical(x, b) ## TRUE [1] TRUE The same phenomenon occurs with `.saveRDS(ascii = TRUE)', > .saveRDS(x, file = "asdf", ascii = TRUE) > d <- .readRDS("asdf") > > identical(x, d) ## FALSE [1] FALSE > Has anyone noticed this before? I didn't see anything in the docs for `serialize()' that would indicate this behavior should be expected. I'm on Linux Fedora Core 4. -roger -- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
Prof Brian Ripley
2006-Feb-09 07:18 UTC
[Rd] corruption of data with serialize(ascii=TRUE)
It is known (happens with save() too and did in earlier save formats). Nothing particularly clever is done (the format is "%.16g\n") and similarly as.character/parse are not inverses. Perhaps more relevant is> b/x -1[1] 0.000000e+00 -1.110223e-16 2.220446e-16 0.000000e+00 0.000000e+00 [6] 2.220446e-16 4.440892e-16 0.000000e+00 2.220446e-16 0.000000e+00 so the error (on my system) is about what you would expect from floating-point computations. There is a comment in serialize.c /* 16: full precision; 17 gives 999, 000 &c */ which suggests that the format is optimized for size not maximal possible accuracy. Really all you have said is `floating point operations are subject to rounding error'. On Wed, 8 Feb 2006, Roger D. Peng wrote:> I noticed the following peculiarity with `serialize()' when `ascii = TRUE' is > used. In today's (svn r37299) R-devel, I get > > > set.seed(10) > > x <- rnorm(10) > > > > a <- serialize(x, con = NULL, ascii = TRUE) > > b <- unserialize(a) > > > > identical(x, b) ## FALSE > [1] FALSE > > x - b > [1] -3.469447e-18 2.775558e-17 -4.440892e-16 0.000000e+00 5.551115e-17 > [6] -5.551115e-17 -4.440892e-16 0.000000e+00 2.220446e-16 -5.551115e-17 > > > I expected `x' and `b' to be identical, which is what I get when `ascii = FALSE': > > > a <- serialize(x, con = NULL, ascii = FALSE) > > b <- unserialize(a) > > > > identical(x, b) ## TRUE > [1] TRUE > > > The same phenomenon occurs with `.saveRDS(ascii = TRUE)', > > > .saveRDS(x, file = "asdf", ascii = TRUE) > > d <- .readRDS("asdf") > > > > identical(x, d) ## FALSE > [1] FALSE > > > > Has anyone noticed this before? I didn't see anything in the docs for > `serialize()' that would indicate this behavior should be expected. > > I'm on Linux Fedora Core 4. > > -roger >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595