Found the reason for the bug. Patch available online;
source("http://www.braju.com/R/patches/digest.R")
In digest() the .Call() statement takes the serialized objected
converted to a string as its second argument;
val <- .Call("digest", as.character(object),
as.integer(algoint),
as.integer(length), PACKAGE = "digest")
This relies on the fact that 'object' is a single character string not
a vector. Try object <- "a" and object <- c("a",
"b"), and you'll get
the same result.
To generate the 'object' string, digest() calls serialize() before.
Now, in R v2.3.1 serialize(input, connect=NULL, ascii=TRUE) returns a
single string, but in R v2.4.0 it returns a raw vector. This is [of
course ;)] document:
>From ?serialize Rv2.3.1:
For 'serialize', 'NULL' unless 'connection=NULL', when
the result is
stored in the first element of a character vector (but is not a normal
character string unless 'ascii = TRUE'
>From ?serialze Rv2.4.0:
For serialize, NULL unless connection=NULL, when the result is stored
in a raw vector.
So the quick a dirty fix of digest() is to do:
object <- serialize(object, connection=NULL, ascii=TRUE)
object <- paste(object, collapse="")
This should work in either R version. I've made this patch available
online. Just call:
source("http://www.braju.com/R/patches/digest.R")
Its possible that it is faster to serialize to a 'textConnection'.
However, it might be even faster if your internal code, i.e.
.Call("digest", ...), accepts vectors so this does not have to be done
at the R level?
Cheers
Henrik
On 7/27/06, Henrik Bengtsson <hb at stat.berkeley.edu>
wrote:> [cc:ing to the maintainer of digest]
>
> FYI, package 'digest' (v0.2.1 2005/11/04 04:45:53) generates the
same
> output regardless of input with R v2.4.0 devel (2006-07-25 r38698).
> Starting a vanilla R session you get:
>
> > library(digest)
> > digest(1)
> [1] "3416a75f4cea9109507cacd8e2f2aefc"
> > digest(2)
> [1] "3416a75f4cea9109507cacd8e2f2aefc"
> > digest(rnorm(10))
> [1] "3416a75f4cea9109507cacd8e2f2aefc"
>
> It works as expected with R v2.3.1 patched (2006-07-25 r38698):
> > library(digest)
> > digest(1)
> [1] "577e0eb2f3253fc5a8c4a287f7c10e7f"
> > digest(2)
> [1] "75eb91f4559682af50c21212d0dc013b"
>
> digest() uses serialize() internally, but it has nothing to do with
> that. I managed to track it down to the call to .Call("digest",
...).
>
> BTW, thanks for a very useful package.
>
> Henrik
>