Jens Oehlschlägel
2014-Mar-02 17:37 UTC
[Rd] internal copying in R (soon to be released R-3.1.0
Dear core group, Which operation in R guarantees to get a true copy of an atomic vector, not just a second symbol pointing to the same shared memory? y <- x[] #? y <- x y[1] <- y[1] #? Is there any function that returns its argument as a non-shared atomic but only copies if the argument was shared? Given an atomic vector x, what is the best official way to find out whether other symbols share the vector RAM? Querying NAMED() < 2 doesn't work because .Call sets sxpinfo_struct.named to 2. It even sets it to 2 if the argument to .Call was a never-named expression!? > named(1:3) [1] 2 And it seems to set it permanently, pure read-access can trigger copy-on-modify: > x <- integer(1e8) > system.time(x[1]<-1L) User System verstrichen 0 0 0 > system.time(x[1]<-2L) User System verstrichen 0 0 0 having called .Call now leads to an unnecessary copy on the next assignment > named(x) [1] 2 > system.time(x[1]<-3L) User System verstrichen 0.14 0.07 0.20 > system.time(x[1]<-4L) User System verstrichen 0 0 0 this not only happens with user written functions doing read-access > is.unsorted(x) [1] TRUE > system.time(x[1]<-5L) User System verstrichen 0.11 0.09 0.21 Why don't you simply give package authors read-access to sxpinfo_struct.named in .Call (without setting it to 2)? That would give us more control and also save some unnecessary copying. I guess once R switches to reference-counting preventive increasing in .Call could not be continued anyhow. Kind regards Jens Oehlschl?gel P.S. please cc me in answers as I am not member of r-devel P.P.S. function named() was tentatively defined as follows: named <- function(x) .Call("R_bit_named", x, PACKAGE="bit") SEXP R_bit_named(SEXP x){ SEXP ret_; PROTECT( ret_ = allocVector(INTSXP,1) ); INTEGER(ret_)[0] = NAMED(x); UNPROTECT(1); return ret_; } > version _ platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status Under development (unstable) major 3 minor 1.0 year 2014 month 02 day 28 svn rev 65091 language R version.string R Under development (unstable) (2014-02-28 r65091) nickname Unsuffered Consequences
Simon Urbanek
2014-Mar-03 18:37 UTC
[Rd] internal copying in R (soon to be released R-3.1.0
On Mar 2, 2014, at 12:37 PM, Jens Oehlschl?gel <jens.oehlschlaegel at truecluster.com> wrote:> Dear core group, > > Which operation in R guarantees to get a true copy of an atomic vector, not just a second symbol pointing to the same shared memory? >None, there is no concept of "shared" memory at R level. You seem to be mixing C level API specifics and the R language. In the former duplicate() creates a new copy.> y <- x[] > #? > > y <- x > y[1] <- y[1] > #? > > Is there any function that returns its argument as a non-shared atomic but only copies if the argument was shared? > > Given an atomic vector x, what is the best official way to find out whether other symbols share the vector RAM? Querying NAMED() < 2 doesn't work because .Call sets sxpinfo_struct.named to 2. It even sets it to 2 if the argument to .Call was a never-named expression!? > > > named(1:3) > [1] 2 >Assuming that you are talking about the C API, please consider reading about the concepts involved. .Call() doesn't set named to 2 at all - it passes whatever object is passed so it is the C code's responsibility to handle incoming objects according to the desired semantics (see the previous post here).> And it seems to set it permanently, pure read-access can trigger copy-on-modify: > > > x <- integer(1e8) > > system.time(x[1]<-1L) > User System verstrichen > 0 0 0 > > system.time(x[1]<-2L) > User System verstrichen > 0 0 0 > > having called .Call now leads to an unnecessary copy on the next assignment > > > named(x) > [1] 2 > > system.time(x[1]<-3L) > User System verstrichen > 0.14 0.07 0.20 > > system.time(x[1]<-4L) > User System verstrichen > 0 0 0 > > this not only happens with user written functions doing read-access > > > is.unsorted(x) > [1] TRUE > > system.time(x[1]<-5L) > User System verstrichen > 0.11 0.09 0.21 > > Why don't you simply give package authors read-access to sxpinfo_struct.named in .Call (without setting it to 2)? That would give us more control and also save some unnecessary copying.Again, you're barking up the wrong tree - .Call() doesn't bump NAMED at all - it simply passes the object: #include <Rinternals.h> SEXP nam(SEXP x) { return ScalarInteger(NAMED(x)); }> .Call("nam", 1+1)[1] 0> x=1+1 > .Call("nam", x)[1] 1> y=x > .Call("nam", x)[1] 2 Cheers, Simon> I guess once R switches to reference-counting preventive increasing in .Call could not be continued anyhow. > > Kind regards > > > Jens Oehlschl?gel > > P.S. please cc me in answers as I am not member of r-devel > > > P.P.S. function named() was tentatively defined as follows: > > named <- function(x) > .Call("R_bit_named", x, PACKAGE="bit") > > SEXP R_bit_named(SEXP x){ > SEXP ret_; > PROTECT( ret_ = allocVector(INTSXP,1) ); > INTEGER(ret_)[0] = NAMED(x); > UNPROTECT(1); > return ret_; > } > > > > version > _ > platform x86_64-w64-mingw32 > arch x86_64 > os mingw32 > system x86_64, mingw32 > status Under development (unstable) > major 3 > minor 1.0 > year 2014 > month 02 > day 28 > svn rev 65091 > language R > version.string R Under development (unstable) (2014-02-28 r65091) > nickname Unsuffered Consequences > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Possibly Parallel Threads
- what are labels in struct sxpinfo_struct from Rinternals.h mean?
- effective way to return only the first argument of "which()"
- Quickest way to make a large "empty" file on disk?
- Attended transfer and 'pbx-invalid' - 1.4.26
- R 2.8.0 qqnorm produces error with object of class zoo?