Radford Neal
2011-Jul-25 15:53 UTC
[Rd] Best practices for writing R functions (really copying)
Gabriel Becker writes: AFAIK R does not automatically copy function arguments. R actually tries very hard to avoid copying while maintaining "pass by value" functionality. ... R only copies data when you modify an object, not when you simply pass it to a function. This is a bit misleading. R tries to avoid copying by maintaining a count of how many references there are to an object, so that x[i] <- 9 can be done without a copy if x is the only reference to the vector. However, it never decrements such counts. As a result, simply passing x to a function that accesses but does not change it will result in x being copied if x[i] is changed after that function returns. An exception is that this usually isn't the case if x is passed to a primitive function. But note that not all standard functions are technically "primitive". The end result is that it's rather difficult to tell when copying will be done. Try the following test, for example: cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } )) cat("b: "); print(system.time( { A[1,1]<-7; 0 } )) cat("c: "); print(system.time( { B <- sqrt(A); 0 } )) cat("d: "); print(system.time( { A[1,1]<-7; 0 } )) cat("e: "); print(system.time( { B <- t(A); 0 } )) cat("f: "); print(system.time( { A[1,1]<-7; 0 } )) cat("g: "); print(system.time( { A[1,1]<-7; 0 } )) You'll find that the time printed after b:, d:, and g: is near zero, but that there is non-negligible time for f:. This is because sqrt is primitive but t is not, so the modification to A after the call t(A) requires that a copy be made. Radford Neal
Matt Shotwell
2011-Jul-25 16:44 UTC
[Rd] Best practices for writing R functions (really copying)
Also consider subsetting: cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } )) cat("h: "); print(system.time( { sum(A[1:50000,1:1000]) } )) cat("i: "); print(system.time( { sum(A[]) } )) cat("j: "); print(system.time( { sum(A) } )) In contrast with Python's NumPy array, the R array type has no concept of 'viewing' the array contents in different ways. Instead, the contents are copied or adjusted. Subsetting and matrix transposition are examples of transformations that might be considered alternate 'views' of an array. This is especially painful in the example above, because A[1:5000,1:1000], A[], and A evaluate to identical() arrays. In case h: the array is copied element-wise. In i: A is duplicate()d. In case j: A is not copied. Matt On Mon, 2011-07-25 at 11:53 -0400, Radford Neal wrote:> Gabriel Becker writes: > > AFAIK R does not automatically copy function arguments. R actually tries > very hard to avoid copying while maintaining "pass by value" functionality. > > ... R only copies data when you modify an object, not > when you simply pass it to a function. > > This is a bit misleading. R tries to avoid copying by maintaining a > count of how many references there are to an object, so that x[i] <- 9 > can be done without a copy if x is the only reference to the vector. > However, it never decrements such counts. As a result, simply passing > x to a function that accesses but does not change it will result in x > being copied if x[i] is changed after that function returns. An > exception is that this usually isn't the case if x is passed to a > primitive function. But note that not all standard functions are > technically "primitive". > > The end result is that it's rather difficult to tell when copying will > be done. Try the following test, for example: > > cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } )) > cat("b: "); print(system.time( { A[1,1]<-7; 0 } )) > cat("c: "); print(system.time( { B <- sqrt(A); 0 } )) > cat("d: "); print(system.time( { A[1,1]<-7; 0 } )) > cat("e: "); print(system.time( { B <- t(A); 0 } )) > cat("f: "); print(system.time( { A[1,1]<-7; 0 } )) > cat("g: "); print(system.time( { A[1,1]<-7; 0 } )) > > You'll find that the time printed after b:, d:, and g: is near zero, > but that there is non-negligible time for f:. This is because sqrt > is primitive but t is not, so the modification to A after the call > t(A) requires that a copy be made. > > Radford Neal > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Henrik Bengtsson
2011-Jul-25 18:11 UTC
[Rd] Best practices for writing R functions (really copying)
Use tracemem() instead, i.e.> A <- matrix(c(1.0,1.1), nrow=5, ncol=10); > tracemem(A);[1] "<0x00000000047ab170"> A[1,1] <- 7; > B <- sqrt(A);tracemem[0x00000000047ab170 -> 0x000000000552f338]:> A[1,1] <- 7; > B <- t(A); > A[1,1] <- 7;tracemem[0x00000000047ab170 -> 0x00000000057ba588]:> A[1,1] <- 7; > A[1,1] <- 7;It looks like sqrt() creates the copy internally, which explains the difference. However, it is true that even if a new copy is not needed/created inside a function call, a function "touching" the object would trigger downstream copies, e.g. # Not touching the object:> foo <- function(X) { 0 } > B <- foo(A); > A[1,1] <- 7; > A[1,1] <- 7;# Touching the object:> bar <- function(X) { Y <- X; 0 } > B <- bar(A); > A[1,1] <- 7;tracemem[0x00000000039b5538 -> 0x000000000402c448]:> A[1,1] <- 7;However however, try doing the same with a vector instead of matrix, e.g. A <- 1:10, and/or assignment with A[1] <- 7 and you get a different behavior. The source code should explain why. I leave it at this. My $.02 /Henrik On Mon, Jul 25, 2011 at 8:53 AM, Radford Neal <radford at cs.toronto.edu> wrote:> Gabriel Becker writes: > > ?AFAIK R does not automatically copy function arguments. R actually tries > ?very hard to avoid copying while maintaining "pass by value" functionality. > > ?... R only copies data when you modify an object, not > ?when you simply pass it to a function. > > This is a bit misleading. ?R tries to avoid copying by maintaining a > count of how many references there are to an object, so that x[i] <- 9 > can be done without a copy if x is the only reference to the vector. > However, it never decrements such counts. ?As a result, simply passing > x to a function that accesses but does not change it will result in x > being copied if x[i] is changed after that function returns. ?An > exception is that this usually isn't the case if x is passed to a > primitive function. ?But note that not all standard functions are > technically "primitive". > > The end result is that it's rather difficult to tell when copying will > be done. ?Try the following test, for example: > > ?cat("a: "); print(system.time( { A <- matrix(c(1.0,1.1),50000,1000); 0 } )) > ?cat("b: "); print(system.time( { A[1,1]<-7; 0 } )) > ?cat("c: "); print(system.time( { B <- sqrt(A); 0 } )) > ?cat("d: "); print(system.time( { A[1,1]<-7; 0 } )) > ?cat("e: "); print(system.time( { B <- t(A); 0 } )) > ?cat("f: "); print(system.time( { A[1,1]<-7; 0 } )) > ?cat("g: "); print(system.time( { A[1,1]<-7; 0 } )) > > You'll find that the time printed after b:, d:, and g: is near zero, > but that there is non-negligible time for f:. ?This is because sqrt > is primitive but t is not, so the modification to A after the call > t(A) requires that a copy be made. > > ? Radford Neal > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >