Tal Galili
2010-Sep-01 15:09 UTC
[R] Why is vector assignment in R recreates the entire vector ?
Hello all, A friend recently brought to my attention that vector assignment actually recreates the entire vector on which the assignment is performed. So for example, the code: x[10]<- NA # The original call (short version) Is really doing this: x<- replace(x, list=10, values=NA) # The original call (long version) # assigning a whole new vector to x Which is actually doing this: x<- `[<-`(x, list=10, values=NA) # The actual call Assuming this can be explained reasonably to the lay man, my question is, why is it done this way ? Why won't it just change the relevant pointer in memory? On small vectors it makes no difference. But on big vectors this might be (so I suspect) costly (in terms of time). I'm curious for your responses on the subject. Best, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Bert Gunter
2010-Sep-01 15:35 UTC
[R] Why is vector assignment in R recreates the entire vector ?
On Wed, Sep 1, 2010 at 8:09 AM, Tal Galili <tal.galili at gmail.com> wrote:> Hello all, > > A friend recently brought to my attention that vector assignment actually > recreates the entire vector on which the assignment is performed. > > So for example, the code: > x[10]<- NA # The original call (short version) > > Is really doing this: > x<- replace(x, list=10, values=NA) # The original call (long version) > # assigning a whole new vector to xThis has been much discussed on this list. Short answer: R is a functional programming lanugage that uses call by value, not references. Longer answer: It depends. R will not create a copy if it can avoid it (usually?). Search the list archives for "call by value", "copy arguments", etc. for authoritative answers. -- Bert Gunter Genentech Nonclinical Statistics> Which is actually doing this: > x<- `[<-`(x, list=10, values=NA) # The actual call > > > Assuming this can be explained reasonably to the lay man, my question is, > why is it done this way ? > Why won't it just change the relevant pointer in memory? > > On small vectors it makes no difference. > But on big vectors this might be (so I suspect) costly (in terms of time). > > > I'm curious for your responses on the subject. > > Best, > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | ?972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Duncan Murdoch
2010-Sep-01 15:39 UTC
[R] Why is vector assignment in R recreates the entire vector ?
On 01/09/2010 11:09 AM, Tal Galili wrote:> Hello all, > > A friend recently brought to my attention that vector assignment actually > recreates the entire vector on which the assignment is performed. > > So for example, the code: > x[10]<- NA # The original call (short version) > > Is really doing this: > x<- replace(x, list=10, values=NA) # The original call (long version) > # assigning a whole new vector to x > > Which is actually doing this: > x<- `[<-`(x, list=10, values=NA) # The actual call > > > Assuming this can be explained reasonably to the lay man, my question is, > why is it done this way ? >Your friend misled you. The `[<-` function is primitive. It acts as though it does what you describe, but it is free to do internal optimizations, and in many cases it does. The replace() function is a regular R-level function so it has much less freedom and is likely to be a lot less efficient. For example, in evaluating the expression x[10] <- NA, in most cases R knows that the original vector x will never be needed again, so it won't be duplicated. But in evaluating replace(x, list=10, values=NA) R can't be sure, so it would make a duplicate copy. You can see the difference in the following code: > x <- 1:1000 > tracemem(x) [1] "<0x0547a6c0>" > x[10] <- NA > x <- replace(x, list=10, values=NA) tracemem[0x0547a6c0 -> 0x0488a768]: replace Only the second version caused x to be duplicated. One example that looks as though it is doing unnecessary duplication is this: > x[10] <- 3 tracemem[0x0488a768 -> 0x04881260]: tracemem[0x04881260 -> 0x05613368]: I can see that one duplication is necessary (x is being changed from type integer to type double), but why two? Duncan Murdoch> Why won't it just change the relevant pointer in memory? >> On small vectors it makes no difference. > But on big vectors this might be (so I suspect) costly (in terms of time). > > > I'm curious for your responses on the subject. > > Best, > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Tal Galili
2010-Sep-01 16:09 UTC
[R] Why is vector assignment in R recreates the entire vector ?
Thank you for the explanation Duncan - very interesting indeed! I wonder if someone in the list might know to answer your question regarding the double duplication. Best, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Wed, Sep 1, 2010 at 6:39 PM, Duncan Murdoch <murdoch.duncan@gmail.com>wrote:> On 01/09/2010 11:09 AM, Tal Galili wrote: > >> Hello all, >> >> A friend recently brought to my attention that vector assignment actually >> recreates the entire vector on which the assignment is performed. >> >> So for example, the code: >> x[10]<- NA # The original call (short version) >> >> Is really doing this: >> x<- replace(x, list=10, values=NA) # The original call (long version) >> # assigning a whole new vector to x >> >> Which is actually doing this: >> x<- `[<-`(x, list=10, values=NA) # The actual call >> >> >> Assuming this can be explained reasonably to the lay man, my question is, >> why is it done this way ? >> >> > > Your friend misled you. The `[<-` function is primitive. It acts as > though it does what you describe, but it is free to do internal > optimizations, and in many cases it does. The replace() function is a > regular R-level function so it has much less freedom and is likely to be a > lot less efficient. > > For example, in evaluating the expression x[10] <- NA, in most cases R > knows that the original vector x will never be needed again, so it won't be > duplicated. But in evaluating > > > replace(x, list=10, values=NA) > > R can't be sure, so it would make a duplicate copy. > > You can see the difference in the following code: > > > x <- 1:1000 > > tracemem(x) > [1] "<0x0547a6c0>" > > x[10] <- NA > > > x <- replace(x, list=10, values=NA) > tracemem[0x0547a6c0 -> 0x0488a768]: replace > > Only the second version caused x to be duplicated. > > One example that looks as though it is doing unnecessary duplication is > this: > > > x[10] <- 3 > tracemem[0x0488a768 -> 0x04881260]: > tracemem[0x04881260 -> 0x05613368]: > > I can see that one duplication is necessary (x is being changed from type > integer to type double), but why two? > > Duncan Murdoch > > > Why won't it just change the relevant pointer in memory? >> >> > > > On small vectors it makes no difference. >> But on big vectors this might be (so I suspect) costly (in terms of time). >> >> >> I'm curious for your responses on the subject. >> >> Best, >> Tal >> >> >> >> ----------------Contact >> Details:------------------------------------------------------- >> Contact me: Tal.Galili@gmail.com | 972-52-7275845 >> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | >> www.r-statistics.com (English) >> >> ---------------------------------------------------------------------------------------------- >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > >[[alternative HTML version deleted]]
Matt Shotwell
2010-Sep-01 16:19 UTC
[R] Why is vector assignment in R recreates the entire vector ?
Tal, For your first example, x is not duplicated in memory. If you compile R with --enable-memory-profiling, you have access to the tracemem() function, which will report whether x is duplicate()d:> x <- rep(1,100) > tracemem(x)[1] "<0x8f71c38>"> x[10] <- NAThis does not result in duplication of x, nor does assignment of x to y:> y <- xAt this point, y internally references x. It's not until we modify y, that x is duplicated, and y gets its own copy of the data:> y[10] <- NAtracemem[0x8f71c38 -> 0x91fff70]: Likewise, no duplication occurs using `[<-`:> x <- rep(1,100) > tracemem(x)[1] "<0x8e44900>"> x <- `[<-`(x, list=10, values=NA)But, R is not yet smart enough to avoid a duplication here:> x <- rep(1,100) > tracemem(x)[1] "<0x915d580>"> x <- replace(x, list=10, values=NA)tracemem[0x915d580 -> 0x915e090]: replace Beyond these simple tests, it's difficult to know when R copies memory. I mentioned in another post recently that subsetting a vector will copy memory, but this is not reported by tracemem(). For example:> tracemem(x)[1] "<0x915ed50>"> y <- x[1:100] > tracemem(y)[1] "<0x915f3f0>"> identical(x,y)[1] TRUE Fortunately, memory is fairly cheap, and memory operations are pretty fast in modern operating systems, like GNU Linux. I mostly find that the rate limiting steps in my code are computational routines, like exp(). -Matt On Wed, 2010-09-01 at 11:09 -0400, Tal Galili wrote:> Hello all, > > A friend recently brought to my attention that vector assignment actually > recreates the entire vector on which the assignment is performed. > > So for example, the code: > x[10]<- NA # The original call (short version) > > Is really doing this: > x<- replace(x, list=10, values=NA) # The original call (long version) > # assigning a whole new vector to x > > Which is actually doing this: > x<- `[<-`(x, list=10, values=NA) # The actual call > > > Assuming this can be explained reasonably to the lay man, my question is, > why is it done this way ? > Why won't it just change the relevant pointer in memory? > > On small vectors it makes no difference. > But on big vectors this might be (so I suspect) costly (in terms of time). > > > I'm curious for your responses on the subject. > > Best, > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina
Norm Matloff
2010-Sep-02 19:20 UTC
[R] Why is vector assignment in R recreates the entire vector ?
Tal wrote:> A friend recently brought to my attention that vector assignment actually > recreates the entire vector on which the assignment is performed.... I brought this up in r-devel a few months ago. You can read my posting, and the various replies, at http://www.mail-archive.com/r-devel at r-project.org/msg20089.html Some of the replies not only explain the process, but list lines in the source code where this takes place, enabling a closer look at how/when duplication occurs. Norm Matloff
Martin Maechler
2010-Sep-03 07:38 UTC
[R] Why is vector assignment in R recreates the entire vector ?
>>>>> "NM" == Norm Matloff <matloff at cs.ucdavis.edu> >>>>> on Thu, 2 Sep 2010 12:20:44 -0700 writes:NM> Tal wrote: >> A friend recently brought to my attention that vector assignment actually >> recreates the entire vector on which the assignment is performed. NM> ... NM> I brought this up in r-devel a few months ago. yes, thank you Norm, for the pointer. Indeed this whole topic really belongs to R-devel not R-help. Martin Maechler NM> You can read my posting, NM> and the various replies, at NM> http://www.mail-archive.com/r-devel at r-project.org/msg20089.html NM> Some of the replies not only explain the process, but list lines in the NM> source code where this takes place, enabling a closer look at how/when NM> duplication occurs. NM> Norm Matloff