Winston Chang
2016-Aug-05 16:14 UTC
[Rd] Extra copies of objects in environments when using $ operator?
My understanding is that R will not make copies of lists if there is only one reference to the object. However, I've encountered a case where R does make copies, even though (I think) there should be only one reference to the object. I hope that someone could shed some light on why this is happening. I'll start with a simple example. Below, x is a list with one element, and changing that element doesn't result in a copy. (We know this because nothing is printed when we do the assignment after the tracemem call.) This is as expected. x <- list(1) tracemem(x) # [1] "<0x1149e08f8>" x[[1]] <- 2 # (No output) Similarly, modifying a list contained in a list doesn't result in a copy: e <- list(x = list(1)) tracemem(e$x) # [1] "<0x11b3a4b38>" e$x[[1]] <- 2 # (No output) However, modifying a list contained in an environment *does* result in a copy -- tracemem prints out some info when we do the assignment: e <- new.env(parent = emptyenv()) e$x <- list(1) tracemem(e$x) # [1] "<0x1148c1708>" e$x[[1]] <- 2 # tracemem[0x1148c1708 -> 0x11b2fc1b8]: This is surprising to me. Why is a copy made in this case? It also results in slower performance for these situations. The most that I've been able to figure out is that it probably has something to do with how the $ operator works with environments (but not with lists). If you do the same operations without the $ operator, by evaluating code in environment e, then no copy is made: e <- new.env(parent = globalenv()) eval(quote({ x <- list(1) tracemem(x) x[[1]] <- 2 }), envir = e) # (No output) I'd appreciate it if someone could shed light on this. And if it's a bug, that would be good to know too. -Winston
luke-tierney at uiowa.edu
2016-Aug-05 16:59 UTC
[Rd] Extra copies of objects in environments when using $ operator?
On Fri, 5 Aug 2016, Winston Chang wrote:> My understanding is that R will not make copies of lists if there is > only one reference to the object. However, I've encountered a case > where R does make copies, even though (I think) there should be only > one reference to the object. I hope that someone could shed some light > on why this is happening. > > I'll start with a simple example. Below, x is a list with one element, > and changing that element doesn't result in a copy. (We know this > because nothing is printed when we do the assignment after the > tracemem call.) This is as expected. > x <- list(1) > tracemem(x) > # [1] "<0x1149e08f8>" > x[[1]] <- 2 > # (No output) > > Similarly, modifying a list contained in a list doesn't result in a copy: > e <- list(x = list(1)) > tracemem(e$x) > # [1] "<0x11b3a4b38>" > e$x[[1]] <- 2 > # (No output) > > However, modifying a list contained in an environment *does* result in > a copy -- tracemem prints out some info when we do the assignment: > e <- new.env(parent = emptyenv()) > e$x <- list(1) > tracemem(e$x) > # [1] "<0x1148c1708>" > e$x[[1]] <- 2 > # tracemem[0x1148c1708 -> 0x11b2fc1b8]:Currently e$x marks values as immutable if they have any references by setting NAMED to 2. You can see this with> e <- new.env(parent = emptyenv()) > e$x <- list(1) > .Internal(inspect(e))@30b2498 04 ENVSXP g0c0 [NAM(1)] <0x30b2498> ENCLOS: @2600e98 04 ENVSXP g0c0 [MARK,NAM(2)] <R_EmptyEnv> HASHTAB: @2e41540 19 VECSXP g0c7 [] (len=29, tl=1) @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @30b3370 02 LISTSXP g0c0 [] TAG: @2637870 01 SYMSXP g0c0 [MARK,NAM(2)] "x" @3569488 19 VECSXP g0c1 [NAM(1)] (len=1, tl=0) ## <--- NAM = 1 @35694e8 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1 ...> e$x[[1]] [1] 1> .Internal(inspect(e))@30b2498 04 ENVSXP g0c0 [NAM(1)] <0x30b2498> ENCLOS: @2600e98 04 ENVSXP g0c0 [MARK,NAM(2)] <R_EmptyEnv> HASHTAB: @2e41540 19 VECSXP g0c7 [] (len=29, tl=1) @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] @30b3370 02 LISTSXP g0c0 [] TAG: @2637870 01 SYMSXP g0c0 [MARK,NAM(2)] "x" @3569488 19 VECSXP g0c1 [NAM(2)] (len=1, tl=0) ## <--- NAM = 2 @35694e8 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1 ... It is not clear if this is needed or just done in an abundance of caution. If R is built to use reference counting for determining sharing information this does not happen, so this is likely to change and not force a copy by 3.4.0. Best, luke> This is surprising to me. Why is a copy made in this case? It also > results in slower performance for these situations. > > The most that I've been able to figure out is that it probably has > something to do with how the $ operator works with environments (but > not with lists). If you do the same operations without the $ operator, > by evaluating code in environment e, then no copy is made: > > e <- new.env(parent = globalenv()) > eval(quote({ > x <- list(1) > tracemem(x) > x[[1]] <- 2 > }), envir = e) > # (No output) > > > I'd appreciate it if someone could shed light on this. And if it's a > bug, that would be good to know too. > > -Winston > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Winston Chang
2016-Aug-05 17:27 UTC
[Rd] Extra copies of objects in environments when using $ operator?
> > >> However, modifying a list contained in an environment *does* result in >> a copy -- tracemem prints out some info when we do the assignment: >> e <- new.env(parent = emptyenv()) >> e$x <- list(1) >> tracemem(e$x) >> # [1] "<0x1148c1708>" >> e$x[[1]] <- 2 >> # tracemem[0x1148c1708 -> 0x11b2fc1b8]: >> > > Currently e$x marks values as immutable if they have any references by > setting NAMED to 2. You can see this with > > e <- new.env(parent = emptyenv()) >> e$x <- list(1) >> .Internal(inspect(e)) >> > @30b2498 04 ENVSXP g0c0 [NAM(1)] <0x30b2498> > ENCLOS: > @2600e98 04 ENVSXP g0c0 [MARK,NAM(2)] <R_EmptyEnv> > HASHTAB: > @2e41540 19 VECSXP g0c7 [] (len=29, tl=1) > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @30b3370 02 LISTSXP g0c0 [] > TAG: @2637870 01 SYMSXP g0c0 [MARK,NAM(2)] "x" > @3569488 19 VECSXP g0c1 [NAM(1)] (len=1, tl=0) ## <--- NAM = 1 > @35694e8 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1 > ... > >> e$x >> > [[1]] > [1] 1 > > .Internal(inspect(e)) >> > @30b2498 04 ENVSXP g0c0 [NAM(1)] <0x30b2498> > ENCLOS: > @2600e98 04 ENVSXP g0c0 [MARK,NAM(2)] <R_EmptyEnv> > HASHTAB: > @2e41540 19 VECSXP g0c7 [] (len=29, tl=1) > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @25c9628 00 NILSXP g0c0 [MARK,NAM(2)] > @30b3370 02 LISTSXP g0c0 [] > TAG: @2637870 01 SYMSXP g0c0 [MARK,NAM(2)] "x" > @3569488 19 VECSXP g0c1 [NAM(2)] (len=1, tl=0) ## <--- NAM = 2 > @35694e8 14 REALSXP g0c1 [NAM(2)] (len=1, tl=0) 1 > ... > > It is not clear if this is needed or just done in an abundance of > caution. If R is built to use reference counting for determining > sharing information this does not happen, so this is likely to change > and not force a copy by 3.4.0. >Excellent, that's great to hear! -Winston [[alternative HTML version deleted]]