Martin Maechler
1998-Nov-26 13:51 UTC
Saving memory usage -- .C(....., DUP = FALSE) danger?
Just found out [R 0.63, standard -v -n] : > rm(list=ls()) > gc() free total Ncells 96538 200000 Vcells 214008 250000 > hist(runif(50000)) Error: heap memory (1953 Kb) exhausted [needed 390 Kb more] which is a bit astonishing given that I still have room for 214000 double's > u1 <- runif(50000) > u2 <- runif(50000) > gc() free total Ncells 96534 200000 Vcells 114006 250000 debug(hist.default) quickly revealed that the error was produced when .C("bincount",....) was called. Looking at the help, help(.C) and then at the "DUP = TRUE" default argument to .C(.), I was reminded that every argument is first copied before being passed to bincount(). Setting the "DUP = FALSE" argument in hist.default made it work with the above 50000 doubles. But then I wondered ``more generally'' : What exactly happens / can happen when calling, e.g., r <- .C("foo", x=x, y=as.double(y), DUP = FALSE) Will 'x' be altered after the call to .C(*) if in C's foo(double *x, double *y) x is altered? Will 'y' be unaltered anyway, since "as.double(y)" produces a a different object than 'y' anway? I know that I could make experiments and find out, but hopefully, one of you will know much better and explain to all R-develers. Really useful might be a comprehensive list of recommendations on when "DUP = FALSE" is useful / advisable / detestable ... Thank you! Martin Martin Maechler <maechler@stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum SOL G1; Sonneggstr.33 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1086 <>< -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 26 Nov 1998, Martin Maechler wrote:> > But then I wondered ``more generally'' : > > What exactly happens / can happen when calling, e.g., > > r <- .C("foo", x=x, y=as.double(y), DUP = FALSE) > > Will 'x' be altered after the call to .C(*) if in C's > foo(double *x, double *y) > x is altered? > Will 'y' be unaltered anyway, since "as.double(y)" produces a > a different object than 'y' anway?x will be altered, y will not. If you want y altered then you have to assign it to storage model "double" earlier.> > Really useful might be a comprehensive list of recommendations > on when "DUP = FALSE" is useful / advisable / detestable ... >Here's a start. DUP=FALSE is dangerous. There are two important dangers with DUP=FALSE. The first is that garbage collection may move the object, resulting in the pointers pointing nowhere useful and causing hard-to-reproduce bugs. The second is that if you pass a formal parameter of the calling function to .C/.Fortran with DUP=FALSE I don't think it is necessarily copied. You may be able to change not only the local variable but the variable one level up. This will also be very hard to trace. 1) If your C/Fortran routine calls back any R function including S_alloc/R_alloc then do not use DUP=FALSE. Don't even think about it. Calling almost any R function could trigger garbage collection. 2) If you don't trigger garbage collection it is safe and useful to set DUP=FALSE if you don't change any of the variables that might be affected eg .C("Cfunction",input=x,output=numeric(10)) In this case the output variable didn't exist before the call so it can't cause trouble. If the input variable is not changed in Cfunction you are safe. I've commented before (but never actually done anything) that it would be a useful intermediate step to have analogues of the Fortran 90 INTENT IN and INTENT OUT declarations for these functions. In the example above there is no need to copy the input back after calling Cfunction and no need to copy the output before calling (just to allocate the space). Something like .C("Cfunction",input=x,output=numeric(10),IN=c(T,F),OUT=c(F,T)) might then say to copy x and allocate uninitialised space for numeric(10), call the function, and then copy output back again. The first component of the result would then be NULL, saving space in the local environment as well. These would be less efficient but less dangerous than DUP=FALSE as you couldn't mess up R's internal structures by getting the declarations wrong. Thomas Lumley Assistant Professor, Biostatistics University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._