On Sep 5, 2013, at 12:31 AM, Peter Meilstrup wrote:
> Some experimentation with the below function should convince you that the
> runtime of the bit inside sys.time is proportional to size*number*times. I
> think it should only be proportional to number*times. The function is only
> manipulating a list of references to vectors and not trying to make changes
> to the vectors themselves.
>
> overcopying <- function(size, number, times) {
> #Make a list of NUMBER vectors of SIZE, then reorder the list. The
> #vectors themselves are never touched, only their references are
> #moved around. If R implements "copy on write" correctly the
> #elapsed time should be ~number*times.
> L <- replicate(number, list(vector("numeric", size)),
simplify=FALSE)
> system.time(for (i in 1:times) {
> L[sample(number)] <- L
> })
> }
>
> I see that duplicate.c makes a recursive copy of each element when it
> encounters a VECSXP or a LISTSXP, which it seems there should be no need
> for (it should be sufficient to make a shallow copy and ensure NAMED is set
> on the elements.)
>
> Why is R making apparently unnecessary deep copies?
>
Because ref counting is not recursive. Changing that breaks a quite a few
packages that abuse the fact that copies are currently deep. Main problem with
that is that it's pretty much impossible to detect it ahead of time. That
said, there are efforts underway by Luke Tierney and Michael Lawrence to see how
far we can go without breaking everything.
Cheers,
Simon