Henrik Bengtsson
2007-Aug-31 19:45 UTC
[Rd] Consistency of serialize(): please enlighten me
Hi, I am puzzled with serialize(). It comes down generating identical hash codes for (apparently) identical objects using digest::digest(), which in turn relies on serialize(). Here is an example illustration the issue: ser <- function(object, ...) { list( names = names(object), namesRaw = charToRaw(names(object)), ser = serialize(names(object), connection=NULL, ascii=FALSE) ) } # ser() # Object to be serialized key <- key0 <- list(abc="Hello"); # Store results d <- list(); # 1. As is d[[1]] <- ser(key); # 2. Set names and redo (hardwired: identical to what's already there) names(key) <- "abc"; d[[2]] <- ser(key); # 3. Set names and redo (generic: char->raw->char) key <- key0; names(key) <- sapply(names(key), FUN=function(name) rawToChar(charToRaw(name))); d[[3]] <- ser(key); # All names are identical for (kk in 2:length(d)) stopifnot(identical(d[[1]]$names, d[[kk]]$names)); # All raw names are identical for (kk in 2:length(d)) stopifnot(identical(d[[1]]$namesRaw, d[[kk]]$namesRaw)); # But, the serialized names differ. print(identical(d[[1]]$ser, d[[2]]$ser)); print(identical(d[[1]]$ser, d[[3]]$ser)); print(identical(d[[2]]$ser, d[[3]]$ser)); So, it seems like there is some extra information in the names attribute that is part of the serialization. Is it possible to show they differ at the R level? What is that extra information? Promises...? Please enlighten me. Henrik
Henrik Bengtsson
2007-Aug-31 19:49 UTC
[Rd] Consistency of serialize(): please enlighten me
Forgot... On 8/31/07, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:> Hi, > > I am puzzled with serialize(). It comes down generating identical > hash codes for (apparently) identical objects using digest::digest(), > which in turn relies on serialize(). Here is an example illustration > the issue: > > ser <- function(object, ...) { > list( > names = names(object), > namesRaw = charToRaw(names(object)), > ser = serialize(names(object), connection=NULL, ascii=FALSE) > ) > } # ser() > > # Object to be serialized > key <- key0 <- list(abc="Hello"); > > # Store results > d <- list(); > > # 1. As is > d[[1]] <- ser(key); > > # 2. Set names and redo (hardwired: identical to what's already there) > names(key) <- "abc"; > d[[2]] <- ser(key); > > # 3. Set names and redo (generic: char->raw->char) > key <- key0; > names(key) <- sapply(names(key), FUN=function(name) rawToChar(charToRaw(name))); > d[[3]] <- ser(key); > > # All names are identical > for (kk in 2:length(d)) > stopifnot(identical(d[[1]]$names, d[[kk]]$names)); > > # All raw names are identical > for (kk in 2:length(d)) > stopifnot(identical(d[[1]]$namesRaw, d[[kk]]$namesRaw)); > > # But, the serialized names differ. > print(identical(d[[1]]$ser, d[[2]]$ser)); > print(identical(d[[1]]$ser, d[[3]]$ser)); > print(identical(d[[2]]$ser, d[[3]]$ser));With R version 2.6.0 Under development (unstable) (2007-08-23 r42614) I get: [1] TRUE [1] FALSE [1] FALSE and with R version 2.5.1 Patched (2007-07-19 r42284): [1] FALSE [1] FALSE [1] TRUE> > So, it seems like there is some extra information in the names > attribute that is part of the serialization. Is it possible to show > they differ at the R level? What is that extra information? > Promises...? > > Please enlighten me. > > Henrik >