thr3ads.net - R help - [R] Why is vector assignment in R recreates the entire vector ? [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Tal Galili

2010-Sep-01 15:09 UTC

[R] Why is vector assignment in R recreates the entire vector ?

Hello all,

A friend recently brought to my attention that vector assignment actually
recreates the entire vector on which the assignment is performed.

So for example, the code:
x[10]<- NA # The original call (short version)

Is really doing this:
x<- replace(x, list=10, values=NA) # The original call (long version)
# assigning a whole new vector to x

Which is actually doing this:
x<- `[<-`(x, list=10, values=NA) # The actual call


Assuming this can be explained reasonably to the lay man, my question is,
why is it done this way ?
Why won't it just change the relevant pointer in memory?

On small vectors it makes no difference.
But on big vectors this might be (so I suspect) costly (in terms of time).


I'm curious for your responses on the subject.

Best,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------

	[[alternative HTML version deleted]]

Bert Gunter

2010-Sep-01 15:35 UTC

head link

[R] Why is vector assignment in R recreates the entire vector ?

On Wed, Sep 1, 2010 at 8:09 AM, Tal Galili <tal.galili at gmail.com>
wrote:> Hello all,
>
> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed.
>
> So for example, the code:
> x[10]<- NA # The original call (short version)
>
> Is really doing this:
> x<- replace(x, list=10, values=NA) # The original call (long version)
> # assigning a whole new vector to xThis has been much discussed on this list. Short answer: R is a
functional programming lanugage that uses call by value, not
references.

Longer answer: It depends. R will not create a copy if it can avoid it
(usually?). Search the list archives for "call by value", "copy
arguments", etc. for authoritative answers.

-- 
Bert Gunter
Genentech Nonclinical Statistics
> Which is actually doing this:
> x<- `[<-`(x, list=10, values=NA) # The actual call
>
>
> Assuming this can be explained reasonably to the lay man, my question is,
> why is it done this way ?
> Why won't it just change the relevant pointer in memory?
>
> On small vectors it makes no difference.
> But on big vectors this might be (so I suspect) costly (in terms of time).
>
>
> I'm curious for your responses on the subject.
>
> Best,
> Tal
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com | ?972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
----------------------------------------------------------------------------------------------
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Duncan Murdoch

2010-Sep-01 15:39 UTC

head link

[R] Why is vector assignment in R recreates the entire vector ?

On 01/09/2010 11:09 AM, Tal Galili wrote:> Hello all,
>
> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed.
>
> So for example, the code:
> x[10]<- NA # The original call (short version)
>
> Is really doing this:
> x<- replace(x, list=10, values=NA) # The original call (long version)
> # assigning a whole new vector to x
>
> Which is actually doing this:
> x<- `[<-`(x, list=10, values=NA) # The actual call
>
>
> Assuming this can be explained reasonably to the lay man, my question is,
> why is it done this way ?
>   
Your friend misled you.  The `[<-` function is primitive.  It acts as 
though it does what you describe, but it is free to do internal 
optimizations, and in many cases it does.  The replace() function is a 
regular R-level function so it has much less freedom and is likely to be 
a lot less efficient.

For example, in evaluating the expression x[10] <- NA, in most cases R 
knows that the original vector x will never be needed again, so it won't 
be duplicated.  But in evaluating

replace(x, list=10, values=NA)

R can't be sure, so it would make a duplicate copy.

You can see the difference in the following code:

 > x <- 1:1000
 > tracemem(x)
[1] "<0x0547a6c0>"
 > x[10] <- NA
 > x <- replace(x, list=10, values=NA)
tracemem[0x0547a6c0 -> 0x0488a768]: replace

Only the second version caused x to be duplicated.

One example that looks as though it is doing unnecessary duplication is 
this:

 > x[10] <- 3
tracemem[0x0488a768 -> 0x04881260]:
tracemem[0x04881260 -> 0x05613368]:

I can see that one duplication is necessary (x is being changed from 
type integer to type double), but why two?

Duncan Murdoch
> Why won't it just change the relevant pointer in memory?
>   
> On small vectors it makes no difference.
> But on big vectors this might be (so I suspect) costly (in terms of time).
>
>
> I'm curious for your responses on the subject.
>
> Best,
> Tal
>
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
----------------------------------------------------------------------------------------------
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Tal Galili

2010-Sep-01 16:09 UTC

head link

[R] Why is vector assignment in R recreates the entire vector ?

Thank you for the explanation Duncan - very interesting indeed!

I wonder if someone in the list might know to answer your question regarding
the double duplication.

Best,
Tal

----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Wed, Sep 1, 2010 at 6:39 PM, Duncan Murdoch
<murdoch.duncan@gmail.com>wrote:
> On 01/09/2010 11:09 AM, Tal Galili wrote:
>
>> Hello all,
>>
>> A friend recently brought to my attention that vector assignment
actually
>> recreates the entire vector on which the assignment is performed.
>>
>> So for example, the code:
>> x[10]<- NA # The original call (short version)
>>
>> Is really doing this:
>> x<- replace(x, list=10, values=NA) # The original call (long
version)
>> # assigning a whole new vector to x
>>
>> Which is actually doing this:
>> x<- `[<-`(x, list=10, values=NA) # The actual call
>>
>>
>> Assuming this can be explained reasonably to the lay man, my question
is,
>> why is it done this way ?
>>
>>
>
> Your friend misled you.  The `[<-` function is primitive.  It acts as
> though it does what you describe, but it is free to do internal
> optimizations, and in many cases it does.  The replace() function is a
> regular R-level function so it has much less freedom and is likely to be a
> lot less efficient.
>
> For example, in evaluating the expression x[10] <- NA, in most cases R
> knows that the original vector x will never be needed again, so it
won't be
> duplicated.  But in evaluating
>
>
> replace(x, list=10, values=NA)
>
> R can't be sure, so it would make a duplicate copy.
>
> You can see the difference in the following code:
>
> > x <- 1:1000
> > tracemem(x)
> [1] "<0x0547a6c0>"
> > x[10] <- NA
>
> > x <- replace(x, list=10, values=NA)
> tracemem[0x0547a6c0 -> 0x0488a768]: replace
>
> Only the second version caused x to be duplicated.
>
> One example that looks as though it is doing unnecessary duplication is
> this:
>
> > x[10] <- 3
> tracemem[0x0488a768 -> 0x04881260]:
> tracemem[0x04881260 -> 0x05613368]:
>
> I can see that one duplication is necessary (x is being changed from type
> integer to type double), but why two?
>
> Duncan Murdoch
>
>
>  Why won't it just change the relevant pointer in memory?
>>
>>
>
>
>  On small vectors it makes no difference.
>> But on big vectors this might be (so I suspect) costly (in terms of
time).
>>
>>
>> I'm curious for your responses on the subject.
>>
>> Best,
>> Tal
>>
>>
>>
>> ----------------Contact
>> Details:-------------------------------------------------------
>> Contact me: Tal.Galili@gmail.com |  972-52-7275845
>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew)
|
>> www.r-statistics.com (English)
>>
>>
----------------------------------------------------------------------------------------------
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
	[[alternative HTML version deleted]]

Matt Shotwell

2010-Sep-01 16:19 UTC

head link

[R] Why is vector assignment in R recreates the entire vector ?

Tal, 

For your first example, x is not duplicated in memory. If you compile R
with --enable-memory-profiling, you have access to the tracemem()
function, which will report whether x is duplicate()d:
> x <- rep(1,100)
> tracemem(x)
[1] "<0x8f71c38>"> x[10] <- NA
This does not result in duplication of x, nor does assignment of x to y:
> y <- x
At this point, y internally references x. It's not until we modify y,
that x is duplicated, and y gets its own copy of the data:
> y[10] <- NAtracemem[0x8f71c38 -> 0x91fff70]:

Likewise, no duplication occurs using `[<-`:
> x <- rep(1,100)
> tracemem(x)
[1] "<0x8e44900>"> x <- `[<-`(x, list=10, values=NA)
But, R is not yet smart enough to avoid a duplication here:
> x <- rep(1,100)
> tracemem(x)
[1] "<0x915d580>"> x <- replace(x, list=10, values=NA)tracemem[0x915d580 -> 0x915e090]: replace 

Beyond these simple tests, it's difficult to know when R copies memory.
I mentioned in another post recently that subsetting a vector will copy
memory, but this is not reported by tracemem(). For example:
> tracemem(x)
[1] "<0x915ed50>"> y <- x[1:100]
> tracemem(y)
[1] "<0x915f3f0>"> identical(x,y)[1] TRUE

Fortunately, memory is fairly cheap, and memory operations are pretty
fast in modern operating systems, like GNU Linux. I mostly find that the
rate limiting steps in my code are computational routines, like exp().

-Matt


On Wed, 2010-09-01 at 11:09 -0400, Tal Galili wrote:> Hello all,
> 
> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed.
> 
> So for example, the code:
> x[10]<- NA # The original call (short version)
> 
> Is really doing this:
> x<- replace(x, list=10, values=NA) # The original call (long version)
> # assigning a whole new vector to x
> 
> Which is actually doing this:
> x<- `[<-`(x, list=10, values=NA) # The actual call
> 
> 
> Assuming this can be explained reasonably to the lay man, my question is,
> why is it done this way ?
> Why won't it just change the relevant pointer in memory?
> 
> On small vectors it makes no difference.
> But on big vectors this might be (so I suspect) costly (in terms of time).
> 
> 
> I'm curious for your responses on the subject.
> 
> Best,
> Tal
> 
> 
> 
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
----------------------------------------------------------------------------------------------
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

Norm Matloff

2010-Sep-02 19:20 UTC

head link

[R] Why is vector assignment in R recreates the entire vector ?

Tal wrote:
> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed....

I brought this up in r-devel a few months ago.  You can read my posting,
and the various replies, at

http://www.mail-archive.com/r-devel at r-project.org/msg20089.html

Some of the replies not only explain the process, but list lines in the
source code where this takes place, enabling a closer look at how/when
duplication occurs. 

Norm Matloff

Martin Maechler

2010-Sep-03 07:38 UTC

head link

[R] Why is vector assignment in R recreates the entire vector ?

>>>>> "NM" == Norm Matloff <matloff at
cs.ucdavis.edu>
>>>>>     on Thu, 2 Sep 2010 12:20:44 -0700 writes:
    NM> Tal wrote:

    >> A friend recently brought to my attention that vector assignment
actually
    >> recreates the entire vector on which the assignment is performed.
    NM> ...

    NM> I brought this up in r-devel a few months ago.  

yes, thank you Norm, for the pointer.
Indeed this whole topic really belongs to R-devel not R-help.
Martin Maechler

    NM> You can read my posting,
    NM> and the various replies, at

    NM> http://www.mail-archive.com/r-devel at r-project.org/msg20089.html

    NM> Some of the replies not only explain the process, but list lines in
the
    NM> source code where this takes place, enabling a closer look at
how/when
    NM> duplication occurs. 

    NM> Norm Matloff

Seemingly Similar Threads

Search for more reasonably related threads

R help - Sep 2010 - Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

[R] Why is vector assignment in R recreates the entire vector ?

Seemingly Similar Threads