MRipley
2013-Aug-16 16:16 UTC
[R] Is it possible to avoid copying arrays when calling list()?
Usually R is pretty good about not copying objects when it doesn't need to. However, the list() function seems to make unnecessary copies. For example: > system.time(x<-double(10^9)) user system elapsed 1.772 4.280 7.017 > system.time(y<-double(10^9)) user system elapsed 2.564 3.368 5.943 > system.time(z<-list(x,y)) user system elapsed 5.520 6.748 12.304 I have a function where I create two large arrays, manipulate them in certain ways, and then return both as a list. I'm optimizing the function, so I'd like to be able to build the return list quickly. The two large arrays drop out of scope immediately after I make the list and return it, so copying them is completely unnecessary. Is there some way to do this? I'm not familiar with manipulating lists through the .Call interface, and haven't been able to find much about this in the documentation. Might it be possible to write a fast (but possibly unsafe) list function using .Call that doesn't make copies of the arguments? PS A few things I've tried. First, this is not due to triggering garbage collection -- even if I call gc() before list(x,y), it still takes a long time. Also, I've tried rewriting the function by creating the list at the beginning as in: result <- list(x=double(10^9),y=double(10^9)) and then manipulating result$x and result$y but this made my code run slower, as R seemed to be making other unnecessary copies while manipulating elements of a list like this. I've considered (though not implemented) creating an environment rather than a list, and returning the environment, but I'd rather find a simple way of creating a list without making copies if possible.
Gang Peng
2013-Aug-16 21:23 UTC
[R] Is it possible to avoid copying arrays when calling list()?
If you don't want to copy the data, you can use environments. You can first define x and y in the global environment and then in the function, use function get() to get x, y in the global environment. When you change x and y in the function, x and y also change in the global environment. Best, Gang 2013/8/16 MRipley <mrip027@gmail.com>> Usually R is pretty good about not copying objects when it doesn't need > to. However, the list() function seems to make unnecessary copies. For > example: > > > system.time(x<-double(10^9)) > user system elapsed > 1.772 4.280 7.017 > > system.time(y<-double(10^9)) > user system elapsed > 2.564 3.368 5.943 > > system.time(z<-list(x,y)) > user system elapsed > 5.520 6.748 12.304 > > I have a function where I create two large arrays, manipulate them in > certain ways, and then return both as a list. I'm optimizing the function, > so I'd like to be able to build the return list quickly. The two large > arrays drop out of scope immediately after I make the list and return it, > so copying them is completely unnecessary. > > Is there some way to do this? I'm not familiar with manipulating lists > through the .Call interface, and haven't been able to find much about this > in the documentation. Might it be possible to write a fast (but possibly > unsafe) list function using .Call that doesn't make copies of the arguments? > > PS A few things I've tried. First, this is not due to triggering garbage > collection -- even if I call gc() before list(x,y), it still takes a long > time. > > Also, I've tried rewriting the function by creating the list at the > beginning as in: > result <- list(x=double(10^9),y=double(**10^9)) > and then manipulating result$x and result$y but this made my code run > slower, as R seemed to be making other unnecessary copies while > manipulating elements of a list like this. > > I've considered (though not implemented) creating an environment rather > than a list, and returning the environment, but I'd rather find a simple > way of creating a list without making copies if possible. > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]