Radford Neal
2010-Aug-25 00:23 UTC
[Rd] Correction to section 1.1.2 of R Internals doc, on NAMED
I think the explanation of the NAMED field in the R Internals document is incorrect. In Section 1.1.2, it says: The named field is set and accessed by the SET_NAMED and NAMED macros, and take values 0, 1 and 2. R has a `call by value' illusion, so an assignment like b <- a appears to make a copy of a and refer to it as b. However, if neither a nor b are subsequently altered there is no need to copy. What really happens is that a new symbol b is bound to the same value as a and the named field on the value object is set (in this case to 2). When an object is about to be altered, the named field is consulted. A value of 2 means that the object must be duplicated before being changed. (Note that this does not say that it is necessary to duplicate, only that it should be duplicated whether necessary or not.) A value of 0 means that it is known that no other SEXP shares data with this object, and so it may safely be altered. A value of 1 is used for situations like dim(a) <- c(7, 2) where in principle two copies of a exist for the duration of the computation as (in principle) a <- `dim<-`(a, c(7, 2)) but for no longer, and so some primitive functions can be optimized to avoid a copy in this case. The implication of this somewhat confusing explanation is that values of variables may have NAMED of 0, and that NAMED will be 1 only briefly, during a few operations like dim(a) <- c(7,2). But from my reading of the R source, this is wrong. It seems to me that NAMED will quite often be 1 for extended periods of time. For instance, after a <- c(7,2), the value stored in a will have NAMED of 1. If at this point a[2] <- 0 is executed, no copy is made, because NAMED is 1. If b <- a is then executed, the same value will be in both a and b, and to reflect this, NAMED is incremented to 2. If a[2] <- 0 is executed at this point, a copy is made, since NAMED is 2. Essentially, NAMED is a count of how many variables reference a value, except it's not necessarily accurate. First, once NAMED reaches 2, it doesn't get incremented any higher. Second, no attempt is made to decrement NAMED when a variable ceases to refer to a value. So the end result is that a copy needs to be made when changing a variable whose value has NAMED of 2, since it's possible that some other variable references the same value. There seems to be some confusion in the R source on this. In the do_for procedure, the value for the for loop variable is set up with NAMED being 0, though according to my explanation above, it ought to be set up with NAMED of 1. A bug is avoided here only because the procedures for getting values from variables check if NAMED is 0, and if so fix it up to being 1, which is the minimum that it ought to be for a value that's stored in a variable. Is my understanding of this correct? Or have I missed something? Radford Neal