I recently started using R 2.14.0 on a new machine and i am experiencing what seems like unusually greedy memory use. It happens all the time, but to give a specific example, let's say i run the following code -------- for(j in 1:length(files)){ load(file.path(dump.dir, files[j])) mat.data[[j]]<-data } save(abind(mat.data, along=2), file.path(dump.dir, filename)) --------- It loads parts of multidimensional matrix into a list, then binds it along second dimension and saves on disk. Code works, although slowly, but what's strange is the amount of memory it uses. In particular, each chunk of data is between 50M to 100M, and altogether the binded matrix is 1.3G. One would expect that R would use roughly double that memory - to keep mat.data and its binded version separately, or 1G. I could imagine that for somehow it could use 3 times the size of matrix. But in fact it uses more than 5.5 times (almost all of my physical memory) and i think is swapping a lot to disk . For this particular task, my top output shows eating more than 7G of memory and using up 11G of virtual memory as well $top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8823 user 25 0 11g 7.2g 10m R 99.7 92.9 5:55.05 R 8590 root 15 0 154m 16m 5948 S 0.5 0.2 23:22.40 Xorg I have strong suspicion that something is off with my R binary, i don't think i experienced things like that in a long time. Is this in line with what i am supposed to experience? Are there any ideas for diagnosing what is going on? Would appreciate any suggestions Thanks Andre ================================= Here is what i am running on: CentOS release 5.5 (Final)> sessionInfo()R version 2.14.0 (2011-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] en_US.UTF-8 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1 R.oo_1.9.3 R.methodsS3_1.2.2 loaded via a namespace (and not attached): [1] codetools_0.2-8 tcltk_2.14.0 tools_2.14.0 I compiled R configure as follows /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no --with-tcltk --enable-R-shlib=yes [[alternative HTML version deleted]]
On Apr 12, 2012, at 00:53 , andre zege wrote:> I recently started using R 2.14.0 on a new machine and i am experiencing > what seems like unusually greedy memory use. It happens all the time, but > to give a specific example, let's say i run the following code > > -------- > > for(j in 1:length(files)){ > load(file.path(dump.dir, files[j])) > mat.data[[j]]<-data > } > save(abind(mat.data, along=2), file.path(dump.dir, filename))Hmm, did you preallocate mat.data? If not, you will be copying it repeatedly, and I'm not sure that this can be done by copying pointers only. Does it work better with mat.data <- lapply(files, function(name) {load(file.path(dump.dir, name); data}) ?> > --------- > > It loads parts of multidimensional matrix into a list, then binds it along > second dimension and saves on disk. Code works, although slowly, but what's > strange is the amount of memory it uses. > In particular, each chunk of data is between 50M to 100M, and altogether > the binded matrix is 1.3G. One would expect that R would use roughly double > that memory - to keep mat.data and its binded version separately, or 1G. I > could imagine that for somehow it could use 3 times the size of matrix. But > in fact it uses more than 5.5 times (almost all of my physical memory) and > i think is swapping a lot to disk . For this particular task, my top output > shows eating more than 7G of memory and using up 11G of virtual memory as > well > > $top > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 8823 user 25 0 11g 7.2g 10m R 99.7 92.9 > 5:55.05 > R > > 8590 root 15 0 154m 16m 5948 S 0.5 0.2 > 23:22.40 Xorg > > > I have strong suspicion that something is off with my R binary, i don't > think i experienced things like that in a long time. Is this in line with > what i am supposed to experience? Are there any ideas for diagnosing what > is going on? > Would appreciate any suggestions > > Thanks > Andre > > > =================================> > Here is what i am running on: > > > CentOS release 5.5 (Final) > > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1 R.oo_1.9.3 > R.methodsS3_1.2.2 > > loaded via a namespace (and not attached): > [1] codetools_0.2-8 tcltk_2.14.0 tools_2.14.0 > > > > I compiled R configure as follows > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no > --with-tcltk --enable-R-shlib=yes > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Leaving aside what's going on inside abind::abind(), maybe the following sheds some light on what's is being wasted: # Preallocate (probably doesn't make a difference because it's a list) mat.data <- vector("list", length=length(files)); for (j in 1:length(files)){ vars <- load(file.path(dump.dir, files[j])) mat.data[[j]]<-data; # Not needed anymore/remove everything loaded rm(list=vars); } data <- abind(mat.data, along=2); # Not needed anymore rm(mat.data); save(data, file.path(dump.dir, filename)) My $.02 /Henrik On Wed, Apr 11, 2012 at 3:53 PM, andre zege <andre.zege at gmail.com> wrote:> I recently started using R 2.14.0 on a new machine and i am ?experiencing > what seems like unusually greedy memory use. It happens all the time, but > to give a specific example, let's say i run the following code > > -------- > > for(j in 1:length(files)){ > ? ? ?load(file.path(dump.dir, files[j])) > ? ? ?mat.data[[j]]<-data > } > save(abind(mat.data, along=2), file.path(dump.dir, filename)) > > --------- > > It loads parts of multidimensional matrix into a list, then binds it along > second dimension and saves on disk. Code works, although slowly, but what's > strange is the amount of memory it uses. > In particular, each chunk of data is between 50M to 100M, and altogether > the binded matrix is 1.3G. One would expect that R would use roughly double > that memory - to keep mat.data and its binded version separately, or 1G. I > could imagine that for somehow it could use 3 times the size of matrix. But > in fact it uses more than 5.5 times (almost all of my physical memory) and > i think is swapping a lot to disk . For this particular task, my top output > shows eating more than 7G of memory and using up 11G of virtual memory as > well > > $top > > PID ? ?USER ? ? ?PR ?NI ?VIRT ? ?RES ?SHR ? S %CPU %MEM ? ?TIME+ ?COMMAND > 8823 ?user ? ? ? ?25 ? 0 ?11g ? ? 7.2g ?10m ? R ? 99.7 ? ? 92.9 > 5:55.05 > R > > 8590 ? root ? ? ? 15 ? 0 ?154m ? 16m ? 5948 ?S ?0.5 ? ? ?0.2 > 23:22.40 Xorg > > > I have strong suspicion that something is off with my R binary, i don't > think i experienced things like that in a long time. Is this in line with > what i am supposed to experience? Are there any ideas for diagnosing what > is going on? > Would appreciate any suggestions > > Thanks > Andre > > > =================================> > Here is what i am running on: > > > CentOS release 5.5 (Final) > > >> sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] en_US.UTF-8 > > attached base packages: > [1] stats ? ? graphics ?grDevices datasets ?utils ? ? methods ? base > > other attached packages: > [1] abind_1.4-0 ? ? ? rJava_0.9-3 ? ? ? R.utils_1.12.1 ? ?R.oo_1.9.3 > R.methodsS3_1.2.2 > > loaded via a namespace (and not attached): > [1] codetools_0.2-8 tcltk_2.14.0 ? ?tools_2.14.0 > > > > I compiled R configure as follows > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no > --with-tcltk --enable-R-shlib=yes > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel