Here's a basic question that doesn't seem to be completely answered in the docs, and which unfortunately I've not had time to figure out by wading through the R source code: In a vector (or array) element assignment such as z[3] <- 8 is there in actuality a full rewriting of the entire vector pointed to by z, as implied by z <- "[<-"(z,3,value=8) Assume that an element of z has already being changed previously, so that copy-on-change issues don't apply, with z being reassigned back to the same memory address. I seem to recall reading somewhere that recent R versions make some attempt to avoid rewriting the entire vector, and my timing experiments seem to suggest that it's true. So, is a full rewrite avoided? And where in the source code is this done? Thanks. Norm Matloff
On 03/04/2010 6:34 PM, Norm Matloff wrote:> Here's a basic question that doesn't seem to be completely answered in > the docs, and which unfortunately I've not had time to figure out by > wading through the R source code: > > In a vector (or array) element assignment such as > > z[3] <- 8 > > is there in actuality a full rewriting of the entire vector pointed to > by z, as implied by > > z <- "[<-"(z,3,value=8) > > Assume that an element of z has already being changed previously, so > that copy-on-change issues don't apply, with z being reassigned back to > the same memory address. > > I seem to recall reading somewhere that recent R versions make some > attempt to avoid rewriting the entire vector, and my timing experiments > seem to suggest that it's true. > > So, is a full rewrite avoided? And where in the source code is this > done?It depends. User-written assignment functions can't avoid the copy. They act like the expansion z <- "[<-"(z,3,value=8) and in that, R can't tell that the newly created result of "[<-"(z,3,value=8) will later overwrite z. However, if z is a regular vector without a class and you're using the built-in version of z[3] <- 8, it can take some shortcuts. This happens in multiple places; one is around line 488 of subassign.c another is around line 1336. In each of these places copies are made in some circumstances, but not in general. Duncan Murdoch
Thanks, Martin and Duncan, for the quick, cleary replies. Norm
Thanks very much. By the way, I tried setting a GDB breakpoint at duplicate1(), with the following: > x <- 1:10000000 > x[3] <- 8 > x[33] <- 88 I found that duplicate1() was called on both of the latter two lines. I was a bit surprised, since change-on-write would seem to imply that copying would be done in that second line but NOT on the third. Moreover, system.time() gave 0.284 user time for the second and 0 on the third. YET duplicate1() WAS called on the third, and in stepping through the code, there didn't seem to be an immediate exit. Thanks to both John and Duncan for their comment on the fact that using [<- directly is a very different situation. That's not what I asked, but the comment is useful to me for other reasons. Norm> Message: 4 > Date: Sat, 03 Apr 2010 17:54:58 -0700 > From: John Chambers <jmc at r-project.org> > To: r-devel at r-project.org > Subject: Re: [Rd] full copy on assignment?... ...> How often does y get duplicated? Hopefully not a million times. One can > look at this in gdb, by trapping calls to duplicate1. The answer is: > just once, to ensure that the object is local. Then the duplicated > version has only one reference and the primitive replacement doesn't > copy it....