Hello all- I'm having some problems with memory consumption under R. I've tried increasing the appropriate memory values, but it keeps asking for more; I've even upped the heap size to 600M, significantly eating into swap (256M real, 500+M swap). So, performance slows to a crawl. What I'm trying to do is run isoMDS on a 4000x4000 matrix. My first question is, how much memory should this matrix occupy? Is it ~4000^2 * sizeof(double)? I have an idea about what's going on, but I'm not sure; perhaps someone could correct me if this interpretation is wrong. Since R uses call-by-value, all data structures are first duplicated, and then handed off to the function called. I was getting out of memory errors in calling isoMDS; once I got around that, and waited to see what would happen next (after about 5 minutes of swapping), I got an error saying cmdscale not found. Since isoMDS begins: isoMDS <- function(d, y=cmdscale(d, 2), maxit=50, trace=TRUE) this means that the whole time I was waiting, was spent in copying d. Is this correct? Since the global environment is actually accessible from within functions, could I modify this to use call-by-reference? Are there any pitfalls I should watch out for? In case this is relevant, here are my system particulars: PII, 256M ram + ~500M swap RedHat 6.1 Linux R 0.65.1 Thanks in advance for your helpful replies! -John Barnett -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
"John D. Barnett" <jbarnett at wi.mit.edu> writes:> isoMDS <- function(d, y=cmdscale(d, 2), maxit=50, trace=TRUE) > > this means that the whole time I was waiting, was spent in copying d. > Is this correct?No. R has lazy evaluation, so it means that it didn't actually need the value y before the point where it stopped. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 28 Jan 2000, John D. Barnett wrote:> Hello all- > > I'm having some problems with memory consumption under R. I've tried > increasing the appropriate memory values, but it keeps asking for more; > I've even upped the heap size to 600M, significantly eating into swap > (256M real, 500+M swap). So, performance slows to a crawl. > > What I'm trying to do is run isoMDS on a 4000x4000 matrix.Oops. That is an O(n^4) algorithm in n points. Forget it for more than a few hundred points.> My first question is, how much memory should this matrix occupy? Is it > ~4000^2 * sizeof(double)?Yes, approximately 128Mb.> I have an idea about what's going on, but I'm not sure; perhaps someone > could correct me if this interpretation is wrong. Since R uses > call-by-value, all data structures are first duplicated, and then handed > off to the function called. I was getting out of memory errors in > calling isoMDS; once I got around that, and waited to see what would > happen next (after about 5 minutes of swapping), I got an error saying > cmdscale not found. Since isoMDS begins: > > isoMDS <- function(d, y=cmdscale(d, 2), maxit=50, trace=TRUE) > > this means that the whole time I was waiting, was spent in copying d. > Is this correct?No. It was also testing for any identical pairs of rows.> Since the global environment is actually accessible from within > functions, could I modify this to use call-by-reference? Are there any > pitfalls I should watch out for?You could do that (just leave it out from the argument list), and you could simplify the code, but the .C call would take for ever (not literally). I suspect you would never get that far (as cmdscale is also slow: it needs an eigendecomposition, and that is O(n^3) and by then you would have several versions of your 128Mb matrix around). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._