Hi Ivan,
can't say that I fully understand yet the described mechanism,
namely given what you have described at the end,
something I found myself with:
? env = parent.env() # doesn't work with emptyenv()
? env$vec = c("a","b")
? .Internal(address(env$vec))
? env2 = env
? with(env, {vec[1] = "foo"})
Where `with` runs eval(substitute())` internally.
---
I am just playing with fixed buffers or stacks, I previously was able to
do stack with:
??? new_stack = function(){
??????? size = 0
??????? items = vector("character", 8)
??????? add = function(x){
??????????? size <<- size + 1
??????????? items[size] <<- x
??????????? }
??????? get = function(){
??????????? items[seq_len(size)]
??????????? }
??????? environment()
??????? }
??? stack = new_stack()
??? tracemem(stack$items)
??? stack2 = stack
??? .Internal(address(stack$items))
??? stack$add("foo")
??? stack$add("bar")
??? # Memory is the same
??? .Internal(address(stack$items))
??? # stack2 is the same as stack
??? stack2$get() # [1] "foo" "bar"
Which works, is really cool, and allows memory efficient (or so I hope)
shared resources with reference-like schematic with other type that
environments.
I just hoped that further simplification would be possible.
I believe this works because the function "add" is evaluated in the
same
environment (a with the `with`), but I don't fully get _why_.
I will spend some time reading the subset assignment section.
On 6/01/26 23:39, Ivan Krylov wrote:> ? Mon, 5 Jan 2026 16:30:43 +1300
> Ji?? Moravec <jiri.c.moravec at gmail.com> ?????:
>
>> 1. Is there documentation of `reference counting`?
> There is a short description at
> <https://developer.r-project.org/Refcnt.html>. The general rule for
> package developers is "Except in very special and well understood
> circumstances, an argument passed down to C code should not be modified
> if it has a positive reference count, even if that count is equal to
> one".
>
> For an example of when a reference count of 1 is not safe, consider:
>
> foo <- bar <- baz <- list(x = 42+0) # make a fresh numeric vector
> .Call(modify_me, foo$x)
>
> foo$x has a reference count of only 1, so NOT_SHARED() is true. On the
> other hand, since the bindings 'foo', 'bar', 'baz'
all share the same
> list (whose reference count is 3), altering foo$x by reference from C
> code would also change the values of 'bar' and 'baz', which
violates
> the value semantics of lists in R.
>
>> 2. Is the demonstrated behaviour a bug?
> In this particular case, you've shown the duplication could have been
> avoided, so at the very least you've got a feature request to make
> complex assignment more efficient. Now the question is, why does the
> duplication happen and how hard it is to avoid performing it without
> breaking anything?
>
> The complex assignment rules are described here:
>
>
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment-1
>
> If you call tracemem(env$vec) and set a breakpoint in
> memtrace_report(), you can see that env$vec is duplicated in eval.c,
> function evalseq():
>
> (gdb) l evalseq
> (gdb) b 3201
> Breakpoint 2 at 0x55555569dacb: file eval.c, line 3201.
> (gdb) commands 2
>> call Rf_PrintValue(nexpr)
>> c
>> end
> (gdb) b 3209
> Breakpoint 3 at 0x55555569dad3: file eval.c, line 3209.
> (gdb) commands 3
>> call R_inspect(nval)
>> call R_inspect(val)
>> c
>> end
> In both cases, the expression being evaluated is `*tmp*`$vec, with
> `*tmp*` aliased to `env` without incrementing its reference count. When
> evaluating the first assignment, `env$vec[1] <- 5`, `nval` is the
> vector being updated, and `val` is a special, non-reference-counting
> pairlist containing `env` and `as.name("env")`:
>
> Breakpoint 3, evalseq <...> at eval.c:3209
> # first the 'nval', note REF(1)
> @55555615f588 14 REALSXP g0c4 [REF(1)] (len=8, tl=0) 5,0,0,0,0,...
> # next the 'val', note REF(1) for its first element
> @555557df5fb0 02 LISTSXP g0c0 [STP]
> @555557d0a3b8 04 ENVSXP g0c0 [REF(1)] <0x555557d0a3b8>
> <...>
> @555555a2ae88 01 SYMSXP g0c0 [MARK,REF(1785)] "env"
>
> Next, after `env2 <- env`, we attempt an assignment again:
>
> Breakpoint 3, evalseq <...> at eval.c:3209
> # again, 'nval' has a reference count of 1
> @55555615f588 14 REALSXP g0c4 [REF(1)] (len=8, tl=0) 5,0,0,0,0,...
> # but now 'env' has a reference count of 2
> @555557dfd108 02 LISTSXP g0c0 [STP]
> @555557d0a3b8 04 ENVSXP g0c0 [REF(2)] <0x555557d0a3b8> # <--
here
> <...>
> @555555a2ae88 01 SYMSXP g0c0 [MARK,REF(1787)] "env"
>
> Since `env` is referenced twice, it's MAYBE_SHARED, so the condition
>
> if (MAYBE_REFERENCED(nval) &&
> (MAYBE_SHARED(nval) || MAYBE_SHARED(CAR(val))))
>
> is true, and `nval` (env$x) is duplicated before the assignment.
>
> This would've been necessary if 'env' was a list (or another
> value-semantics object; see the first example above).
>
>> 3. I would guess that assign in place in this case is
>> implementation-specific detail and not specified behaviour, so one
>> shouldn't rely on it.
> True. R's copy-on-write is an optimisation, although a very useful one.
>
>> 4. Is there way how to do this (i.e., fixed buffer) in base R without
>> relying on C with .Call?
> This is a kludge, but if you allow your environment to be enclosed by
> the base environment, you can perform the sub-assignment directly
> inside it, without invoking complex assignment:
>
> env3 <- new.env(parent = baseenv())
> env3$vec <- vector("numeric", 8)
> tracemem(env3$vec)
> eval(substitute(vec[i] <- v, list(i = 1, v = 5)), env3)
> env4 <- env3
> eval(substitute(vec[i] <- v, list(i = 2, v = 6)), env3)
> # still not duplicated
>
> (I've also tried substitute(..., list(`<-` = base::`<-`)) for use
in an
> empty environment, but that breaks when it tries to invoke `[<-`.)
>
> What is the overall problem you would like to solve?
>