Henrik Bengtsson
2010-Nov-23 14:12 UTC
[Rd] Possibility for memory improvement: x <- as.vector(x) always(?) duplicates
Hi, I've noticed that as.vector() always allocates a new object, e.g.> x <- 1:10; > x <- as.vector(x); > tracemem(x);[1] "<0x0000000005622db8"> x <- as.vector(x);tracemem[0x0000000005622db8 -> 0x0000000005622ec0]: as.vector> x <- as.vector(x);tracemem[0x0000000005622ec0 -> 0x0000000005622f18]: as.vector> x <- as.vector(x);tracemem[0x0000000005622f18 -> 0x000000000561c388]: as.vector> x <- as.vector(x);tracemem[0x000000000561c388 -> 0x000000000561c3e0]: as.vector>and so on. This also seems to be the reason for an extra copy being created in turning a vector into a matrix or an array, e.g.> x <- 1:10; > tracemem(x);[1] "<0x000000000561c750"> x <- matrix(x, nrow=5, ncol=2);tracemem[0x000000000561c750 -> 0x000000000561c7a8]: as.vector matrix>Example of how it could work (not sure if the test with is.vector() is enough): as.vector <- function(x, mode="any") { if (is.vector(x)) return(x); base::as.vector(x); } # as.vector() matrix <- base::matrix; environment(matrix) <- globalenv();> x <- 1:10; > tracemem(x);[1] "<0x0000000003965488"> x <- matrix(x, nrow=5, ncol=2); >Could this be generic improvement? Some years ago there similar improvements where done for as.integer(), as.numeric() etc. This is on R v2.12.0 patched (2010-11-09 r53543) and R v2.13.0 devel (2010-11-20 r53645) on Windows 7 Ultimate. /Henrik
Henrik Bengtsson
2010-Nov-30 06:40 UTC
[Rd] Possibility for memory improvement: x <- as.vector(x) always(?) duplicates
FYI, from the recent R devel NEWS file: as.vector() and as.double() etc duplicate less when they leave the mode unchanged but remove attributes. as.vector(mode = "any") no longer duplicates when it does not remove attributes. This helps memory usage in matrix() and array(). This improvement will cut down the memory allocation/garbage collection for most of us (I would like to add "a lot"). For instance, no more duplicated copies with the most common matrix() use cases. % Rerm R version 2.13.0 Under development (unstable) (2010-11-26 r53672) [...]> x <- 1:10; > x <- as.vector(x); > tracemem(x);[1] "<0x00000000038e1538"> x <- as.vector(x); > x <- as.vector(x); > x <- as.vector(x);and so on.> x <- 1:10; > tracemem(x);[1] "<0x00000000038e1590"> x <- matrix(x, nrow=5, ncol=2); > x <- as.matrix(x); > x <- as.matrix(x); > x <- as.matrix(x);and so on. Compare this with what I reported on in my previous message (below). Browsing the SVN logs for R devel I see that Brian Ripley is the one who has done all the great work related to this one. Thank you! /Henrik On Tue, Nov 23, 2010 at 6:12 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:> Hi, > > I've noticed that as.vector() always allocates a new object, e.g. > >> x <- 1:10; >> x <- as.vector(x); >> tracemem(x); > [1] "<0x0000000005622db8" >> x <- as.vector(x); > tracemem[0x0000000005622db8 -> 0x0000000005622ec0]: as.vector >> x <- as.vector(x); > tracemem[0x0000000005622ec0 -> 0x0000000005622f18]: as.vector >> x <- as.vector(x); > tracemem[0x0000000005622f18 -> 0x000000000561c388]: as.vector >> x <- as.vector(x); > tracemem[0x000000000561c388 -> 0x000000000561c3e0]: as.vector >> > > and so on. > > This also seems to be the reason for an extra copy being created in > turning a vector into a matrix or an array, e.g. > >> x <- 1:10; >> tracemem(x); > [1] "<0x000000000561c750" >> x <- matrix(x, nrow=5, ncol=2); > tracemem[0x000000000561c750 -> 0x000000000561c7a8]: as.vector matrix >> > > Example of how it could work (not sure if the test with is.vector() is enough): > > as.vector <- function(x, mode="any") { > ?if (is.vector(x)) return(x); > ?base::as.vector(x); > } # as.vector() > > matrix <- base::matrix; > environment(matrix) <- globalenv(); > >> x <- 1:10; >> tracemem(x); > [1] "<0x0000000003965488" >> x <- matrix(x, nrow=5, ncol=2); >> > > Could this be generic improvement? ?Some years ago there similar > improvements where done for as.integer(), as.numeric() etc. > > This is on R v2.12.0 patched (2010-11-09 r53543) and R v2.13.0 devel > (2010-11-20 r53645) on Windows 7 Ultimate. > > /Henrik >