Henrik Bengtsson
2007-Aug-31 19:45 UTC
[Rd] Consistency of serialize(): please enlighten me
Hi,
I am puzzled with serialize(). It comes down generating identical
hash codes for (apparently) identical objects using digest::digest(),
which in turn relies on serialize(). Here is an example illustration
the issue:
ser <- function(object, ...) {
list(
names = names(object),
namesRaw = charToRaw(names(object)),
ser = serialize(names(object), connection=NULL, ascii=FALSE)
)
} # ser()
# Object to be serialized
key <- key0 <- list(abc="Hello");
# Store results
d <- list();
# 1. As is
d[[1]] <- ser(key);
# 2. Set names and redo (hardwired: identical to what's already there)
names(key) <- "abc";
d[[2]] <- ser(key);
# 3. Set names and redo (generic: char->raw->char)
key <- key0;
names(key) <- sapply(names(key), FUN=function(name)
rawToChar(charToRaw(name)));
d[[3]] <- ser(key);
# All names are identical
for (kk in 2:length(d))
stopifnot(identical(d[[1]]$names, d[[kk]]$names));
# All raw names are identical
for (kk in 2:length(d))
stopifnot(identical(d[[1]]$namesRaw, d[[kk]]$namesRaw));
# But, the serialized names differ.
print(identical(d[[1]]$ser, d[[2]]$ser));
print(identical(d[[1]]$ser, d[[3]]$ser));
print(identical(d[[2]]$ser, d[[3]]$ser));
So, it seems like there is some extra information in the names
attribute that is part of the serialization. Is it possible to show
they differ at the R level? What is that extra information?
Promises...?
Please enlighten me.
Henrik
Henrik Bengtsson
2007-Aug-31 19:49 UTC
[Rd] Consistency of serialize(): please enlighten me
Forgot... On 8/31/07, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:> Hi, > > I am puzzled with serialize(). It comes down generating identical > hash codes for (apparently) identical objects using digest::digest(), > which in turn relies on serialize(). Here is an example illustration > the issue: > > ser <- function(object, ...) { > list( > names = names(object), > namesRaw = charToRaw(names(object)), > ser = serialize(names(object), connection=NULL, ascii=FALSE) > ) > } # ser() > > # Object to be serialized > key <- key0 <- list(abc="Hello"); > > # Store results > d <- list(); > > # 1. As is > d[[1]] <- ser(key); > > # 2. Set names and redo (hardwired: identical to what's already there) > names(key) <- "abc"; > d[[2]] <- ser(key); > > # 3. Set names and redo (generic: char->raw->char) > key <- key0; > names(key) <- sapply(names(key), FUN=function(name) rawToChar(charToRaw(name))); > d[[3]] <- ser(key); > > # All names are identical > for (kk in 2:length(d)) > stopifnot(identical(d[[1]]$names, d[[kk]]$names)); > > # All raw names are identical > for (kk in 2:length(d)) > stopifnot(identical(d[[1]]$namesRaw, d[[kk]]$namesRaw)); > > # But, the serialized names differ. > print(identical(d[[1]]$ser, d[[2]]$ser)); > print(identical(d[[1]]$ser, d[[3]]$ser)); > print(identical(d[[2]]$ser, d[[3]]$ser));With R version 2.6.0 Under development (unstable) (2007-08-23 r42614) I get: [1] TRUE [1] FALSE [1] FALSE and with R version 2.5.1 Patched (2007-07-19 r42284): [1] FALSE [1] FALSE [1] TRUE> > So, it seems like there is some extra information in the names > attribute that is part of the serialization. Is it possible to show > they differ at the R level? What is that extra information? > Promises...? > > Please enlighten me. > > Henrik >