thr3ads.net - R devel - [Rd] Assign in place and reference counting [Jan 2026]

If this information is useful, please help other people find it:
Share via:

Ivan Krylov

2026-Jan-06 10:39 UTC

[Rd] Assign in place and reference counting

? Mon, 5 Jan 2026 16:30:43 +1300
Ji?? Moravec <jiri.c.moravec at gmail.com> ?????:
> 1. Is there documentation of `reference counting`?
There is a short description at
<https://developer.r-project.org/Refcnt.html>. The general rule for
package developers is "Except in very special and well understood
circumstances, an argument passed down to C code should not be modified
if it has a positive reference count, even if that count is equal to
one".

For an example of when a reference count of 1 is not safe, consider:

foo <- bar <- baz <- list(x = 42+0) # make a fresh numeric vector
.Call(modify_me, foo$x)

foo$x has a reference count of only 1, so NOT_SHARED() is true. On the
other hand, since the bindings 'foo', 'bar', 'baz' all
share the same
list (whose reference count is 3), altering foo$x by reference from C
code would also change the values of 'bar' and 'baz', which
violates
the value semantics of lists in R.
> 2. Is the demonstrated behaviour a bug?
In this particular case, you've shown the duplication could have been
avoided, so at the very least you've got a feature request to make
complex assignment more efficient. Now the question is, why does the
duplication happen and how hard it is to avoid performing it without
breaking anything?

The complex assignment rules are described here:

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment-1

If you call tracemem(env$vec) and set a breakpoint in
memtrace_report(), you can see that env$vec is duplicated in eval.c,
function evalseq():

(gdb) l evalseq
(gdb) b 3201
Breakpoint 2 at 0x55555569dacb: file eval.c, line 3201.
(gdb) commands 2>call Rf_PrintValue(nexpr)
>c
>end(gdb) b 3209
Breakpoint 3 at 0x55555569dad3: file eval.c, line 3209.
(gdb) commands 3>call R_inspect(nval)
>call R_inspect(val)
>c
>end
In both cases, the expression being evaluated is `*tmp*`$vec, with
`*tmp*` aliased to `env` without incrementing its reference count. When
evaluating the first assignment, `env$vec[1] <- 5`, `nval` is the
vector being updated, and `val` is a special, non-reference-counting
pairlist containing `env` and `as.name("env")`:

Breakpoint 3, evalseq <...> at eval.c:3209
# first the 'nval', note REF(1)
@55555615f588 14 REALSXP g0c4 [REF(1)] (len=8, tl=0) 5,0,0,0,0,...
# next the 'val', note REF(1) for its first element
@555557df5fb0 02 LISTSXP g0c0 [STP]
  @555557d0a3b8 04 ENVSXP g0c0 [REF(1)] <0x555557d0a3b8>
<...>
  @555555a2ae88 01 SYMSXP g0c0 [MARK,REF(1785)] "env"

Next, after `env2 <- env`, we attempt an assignment again:

Breakpoint 3, evalseq <...> at eval.c:3209
# again, 'nval' has a reference count of 1
@55555615f588 14 REALSXP g0c4 [REF(1)] (len=8, tl=0) 5,0,0,0,0,...
# but now 'env' has a reference count of 2
@555557dfd108 02 LISTSXP g0c0 [STP]
  @555557d0a3b8 04 ENVSXP g0c0 [REF(2)] <0x555557d0a3b8> # <-- here
<...>
  @555555a2ae88 01 SYMSXP g0c0 [MARK,REF(1787)] "env"

Since `env` is referenced twice, it's MAYBE_SHARED, so the condition

if (MAYBE_REFERENCED(nval) &&
    (MAYBE_SHARED(nval) || MAYBE_SHARED(CAR(val))))

is true, and `nval` (env$x) is duplicated before the assignment.

This would've been necessary if 'env' was a list (or another
value-semantics object; see the first example above).
> 3. I would guess that assign in place in this case is 
> implementation-specific detail and not specified behaviour, so one 
> shouldn't rely on it.
True. R's copy-on-write is an optimisation, although a very useful one.
> 4. Is there way how to do this (i.e., fixed buffer) in base R without 
> relying on C with .Call?
This is a kludge, but if you allow your environment to be enclosed by
the base environment, you can perform the sub-assignment directly
inside it, without invoking complex assignment:

env3 <- new.env(parent = baseenv())
env3$vec <- vector("numeric", 8)
tracemem(env3$vec)
eval(substitute(vec[i] <- v, list(i = 1, v = 5)), env3)
env4 <- env3
eval(substitute(vec[i] <- v, list(i = 2, v = 6)), env3)
# still not duplicated

(I've also tried substitute(..., list(`<-` = base::`<-`)) for use in
an
empty environment, but that breaks when it tries to invoke `[<-`.)

What is the overall problem you would like to solve?

-- 
Best regards,
Ivan

Jiří Moravec

2026-Jan-06 20:29 UTC

head link

[Rd] Assign in place and reference counting

Hi Ivan,

can't say that I fully understand yet the described mechanism,
namely given what you have described at the end,
something I found myself with:

 ? env = parent.env() # doesn't work with emptyenv()
 ? env$vec = c("a","b")
 ? .Internal(address(env$vec))
 ? env2 = env
 ? with(env, {vec[1] = "foo"})

Where `with` runs eval(substitute())` internally.

---

I am just playing with fixed buffers or stacks, I previously was able to 
do stack with:

 ??? new_stack = function(){
 ??????? size = 0
 ??????? items = vector("character", 8)

 ??????? add = function(x){
 ??????????? size <<- size + 1
 ??????????? items[size] <<- x
 ??????????? }

 ??????? get = function(){
 ??????????? items[seq_len(size)]
 ??????????? }

 ??????? environment()
 ??????? }

 ??? stack = new_stack()
 ??? tracemem(stack$items)
 ??? stack2 = stack
 ??? .Internal(address(stack$items))
 ??? stack$add("foo")
 ??? stack$add("bar")
 ??? # Memory is the same
 ??? .Internal(address(stack$items))
 ??? # stack2 is the same as stack
 ??? stack2$get() # [1] "foo" "bar"

Which works, is really cool, and allows memory efficient (or so I hope) 
shared resources with reference-like schematic with other type that 
environments.
I just hoped that further simplification would be possible.

I believe this works because the function "add" is evaluated in the
same
environment (a with the `with`), but I don't fully get _why_.

I will spend some time reading the subset assignment section.


On 6/01/26 23:39, Ivan Krylov wrote:> ? Mon, 5 Jan 2026 16:30:43 +1300
> Ji?? Moravec <jiri.c.moravec at gmail.com> ?????:
>
>> 1. Is there documentation of `reference counting`?
> There is a short description at
> <https://developer.r-project.org/Refcnt.html>. The general rule for
> package developers is "Except in very special and well understood
> circumstances, an argument passed down to C code should not be modified
> if it has a positive reference count, even if that count is equal to
> one".
>
> For an example of when a reference count of 1 is not safe, consider:
>
> foo <- bar <- baz <- list(x = 42+0) # make a fresh numeric vector
> .Call(modify_me, foo$x)
>
> foo$x has a reference count of only 1, so NOT_SHARED() is true. On the
> other hand, since the bindings 'foo', 'bar', 'baz'
all share the same
> list (whose reference count is 3), altering foo$x by reference from C
> code would also change the values of 'bar' and 'baz', which
violates
> the value semantics of lists in R.
>
>> 2. Is the demonstrated behaviour a bug?
> In this particular case, you've shown the duplication could have been
> avoided, so at the very least you've got a feature request to make
> complex assignment more efficient. Now the question is, why does the
> duplication happen and how hard it is to avoid performing it without
> breaking anything?
>
> The complex assignment rules are described here:
>
>
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment-1
>
> If you call tracemem(env$vec) and set a breakpoint in
> memtrace_report(), you can see that env$vec is duplicated in eval.c,
> function evalseq():
>
> (gdb) l evalseq
> (gdb) b 3201
> Breakpoint 2 at 0x55555569dacb: file eval.c, line 3201.
> (gdb) commands 2
>> call Rf_PrintValue(nexpr)
>> c
>> end
> (gdb) b 3209
> Breakpoint 3 at 0x55555569dad3: file eval.c, line 3209.
> (gdb) commands 3
>> call R_inspect(nval)
>> call R_inspect(val)
>> c
>> end
> In both cases, the expression being evaluated is `*tmp*`$vec, with
> `*tmp*` aliased to `env` without incrementing its reference count. When
> evaluating the first assignment, `env$vec[1] <- 5`, `nval` is the
> vector being updated, and `val` is a special, non-reference-counting
> pairlist containing `env` and `as.name("env")`:
>
> Breakpoint 3, evalseq <...> at eval.c:3209
> # first the 'nval', note REF(1)
> @55555615f588 14 REALSXP g0c4 [REF(1)] (len=8, tl=0) 5,0,0,0,0,...
> # next the 'val', note REF(1) for its first element
> @555557df5fb0 02 LISTSXP g0c0 [STP]
>    @555557d0a3b8 04 ENVSXP g0c0 [REF(1)] <0x555557d0a3b8>
> <...>
>    @555555a2ae88 01 SYMSXP g0c0 [MARK,REF(1785)] "env"
>
> Next, after `env2 <- env`, we attempt an assignment again:
>
> Breakpoint 3, evalseq <...> at eval.c:3209
> # again, 'nval' has a reference count of 1
> @55555615f588 14 REALSXP g0c4 [REF(1)] (len=8, tl=0) 5,0,0,0,0,...
> # but now 'env' has a reference count of 2
> @555557dfd108 02 LISTSXP g0c0 [STP]
>    @555557d0a3b8 04 ENVSXP g0c0 [REF(2)] <0x555557d0a3b8> # <--
here
> <...>
>    @555555a2ae88 01 SYMSXP g0c0 [MARK,REF(1787)] "env"
>
> Since `env` is referenced twice, it's MAYBE_SHARED, so the condition
>
> if (MAYBE_REFERENCED(nval) &&
>      (MAYBE_SHARED(nval) || MAYBE_SHARED(CAR(val))))
>
> is true, and `nval` (env$x) is duplicated before the assignment.
>
> This would've been necessary if 'env' was a list (or another
> value-semantics object; see the first example above).
>
>> 3. I would guess that assign in place in this case is
>> implementation-specific detail and not specified behaviour, so one
>> shouldn't rely on it.
> True. R's copy-on-write is an optimisation, although a very useful one.
>
>> 4. Is there way how to do this (i.e., fixed buffer) in base R without
>> relying on C with .Call?
> This is a kludge, but if you allow your environment to be enclosed by
> the base environment, you can perform the sub-assignment directly
> inside it, without invoking complex assignment:
>
> env3 <- new.env(parent = baseenv())
> env3$vec <- vector("numeric", 8)
> tracemem(env3$vec)
> eval(substitute(vec[i] <- v, list(i = 1, v = 5)), env3)
> env4 <- env3
> eval(substitute(vec[i] <- v, list(i = 2, v = 6)), env3)
> # still not duplicated
>
> (I've also tried substitute(..., list(`<-` = base::`<-`)) for use
in an
> empty environment, but that breaks when it tries to invoke `[<-`.)
>
> What is the overall problem you would like to solve?
>

R devel - Jan 2026 - Assign in place and reference counting

[Rd] Assign in place and reference counting

[Rd] Assign in place and reference counting