Henrik Bengtsson
2007-Mar-28 21:25 UTC
[Rd] Suggestion for memory optimization and as.double() with friends
Hi, when doing as.double() on an object that is already a double, the object seems to be copied internally, doubling the memory requirement. See example below. Same for as.character() etc. Is this intended? Example: % R --vanilla> x <- double(1e7) > gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 234019 6.3 467875 12.5 350000 9.4 Vcells 10103774 77.1 11476770 87.6 10104223 77.1> x <- as.double(x) > gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 234113 6.3 467875 12.5 350000 9.4 Vcells 10103790 77.1 21354156 163.0 20103818 153.4 However, couldn't this easily be avoided by letting as.double() return the object as is if already a double? Example: % R --vanilla> as.double.double <- function(x, ...) x > x <- double(1e7) > gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 234019 6.3 467875 12.5 350000 9.4 Vcells 10103774 77.1 11476770 87.6 10104223 77.1> x <- as.double(x) > gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 234028 6.3 467875 12.5 350000 9.4 Vcells 10103779 77.1 12130608 92.6 10104223 77.1 What's the catch? The reason why I bring it up, is because many (most?) methods are using as.double() etc "just in case" when passing arguments to .Call(), .Fortran() etc, e.g. stats::smooth.spline(): fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff), x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>) Your memory usage is peaking in the actual call and the garbage collector cannot clean it up until after the call. This seems to be waste of memory, especially when the objects are large (100-1000MBs). Cheers Henrik
Henrik Bengtsson
2007-Mar-28 21:59 UTC
[Rd] Suggestion for memory optimization and as.double() with friends
On 3/28/07, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:> Hi, > > when doing as.double() on an object that is already a double, the > object seems to be copied internally, doubling the memory requirement. > See example below. Same for as.character() etc. Is this intended? > > Example: > > % R --vanilla > > x <- double(1e7) > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234019 6.3 467875 12.5 350000 9.4 > Vcells 10103774 77.1 11476770 87.6 10104223 77.1 > > x <- as.double(x) > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234113 6.3 467875 12.5 350000 9.4 > Vcells 10103790 77.1 21354156 163.0 20103818 153.4 > > However, couldn't this easily be avoided by letting as.double() return > the object as is if already a double? > > Example: > > % R --vanilla > > as.double.double <- function(x, ...) x > > x <- double(1e7) > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234019 6.3 467875 12.5 350000 9.4 > Vcells 10103774 77.1 11476770 87.6 10104223 77.1 > > x <- as.double(x) > > gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234028 6.3 467875 12.5 350000 9.4 > Vcells 10103779 77.1 12130608 92.6 10104223 77.1 > > What's the catch?Ok, one catch that my example didn't illustrate is: "as.double' attempts to coerce its argument to be of double type: like 'as.vector' it strips attributes including names." (from ?as.double). So, answering my own question, I can see how stripping the attributes "requires" a internal copy. Anyhow, when there are stripping attributes, the same idea still applies, with a more clever as.double() function. In the case when one want to coerce to a double, and keep existing attributes, one could extend as.double() with: as.double(x, stripAttributes=FALSE) and that code could be clever enough not to create and internal copy. /Henrik> > > The reason why I bring it up, is because many (most?) methods are > using as.double() etc "just in case" when passing arguments to > .Call(), .Fortran() etc, e.g. stats::smooth.spline(): > > fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff), > x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>) > > Your memory usage is peaking in the actual call and the garbage > collector cannot clean it up until after the call. This seems to be > waste of memory, especially when the objects are large (100-1000MBs). > > Cheers > > Henrik >
Duncan Murdoch
2007-Mar-28 22:04 UTC
[Rd] Suggestion for memory optimization and as.double() with friends
On 3/28/2007 5:25 PM, Henrik Bengtsson wrote:> Hi, > > when doing as.double() on an object that is already a double, the > object seems to be copied internally, doubling the memory requirement. > See example below. Same for as.character() etc. Is this intended? > > Example: > > % R --vanilla >> x <- double(1e7) >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234019 6.3 467875 12.5 350000 9.4 > Vcells 10103774 77.1 11476770 87.6 10104223 77.1 >> x <- as.double(x) >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234113 6.3 467875 12.5 350000 9.4 > Vcells 10103790 77.1 21354156 163.0 20103818 153.4 > > However, couldn't this easily be avoided by letting as.double() return > the object as is if already a double?as.double calls the internal as.vector, which also strips off attributes. But in the case where the output is identical to the input, this does seem like an easy optimization. I don't know if it would help most people, but it might help in the kinds of cases you mention. Duncan Murdoch> > Example: > > % R --vanilla >> as.double.double <- function(x, ...) x >> x <- double(1e7) >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234019 6.3 467875 12.5 350000 9.4 > Vcells 10103774 77.1 11476770 87.6 10104223 77.1 >> x <- as.double(x) >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 234028 6.3 467875 12.5 350000 9.4 > Vcells 10103779 77.1 12130608 92.6 10104223 77.1 > > What's the catch? > > > The reason why I bring it up, is because many (most?) methods are > using as.double() etc "just in case" when passing arguments to > .Call(), .Fortran() etc, e.g. stats::smooth.spline(): > > fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff), > x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>) > > Your memory usage is peaking in the actual call and the garbage > collector cannot clean it up until after the call. This seems to be > waste of memory, especially when the objects are large (100-1000MBs). > > Cheers > > Henrik > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel