maddox
2010-Dec-23 13:13 UTC
[R] speed issues? read R_inferno by Patrick Burns: & a memory query
Hi, I'm just starting out with R and came across R_inferno.pdf by Patrick Burns just yesterday - I recommend it! His description of how 'growing' objects (e.g. obj <- c(obj, additionalValue) eats up memory prompted me to rewrite a function (which made such calls ~210 times) so that it used indexing into a dimensioned object instead (i.e. obj[i, ] <- additionalValue). This transformed the process from old version: user system elapsed 133.436 14.257 155.807 new version: user system elapsed 16.041 1.180 18.535 To say I'm delighted is understatement. Thanks for putting the Inferno together, Patrick. However I'm misunderstanding the effect this has on memory use, (or misunderstanding the code I've highjacked to look at memory use). To look at virtual memory use I'm using the code below from this forum: cmd <- paste("ps -o vsz", Sys.getpid()) cat("\nVirtual size: ", system(cmd, intern = TRUE) [2], "\n", sep = "") I did three runs of the old version, and three with the new, preceding each with gc() & got the outputs below. In summary, the runs of old method required 17712, 17744 & 17744 & runs of new method required 13788, 15140 & 13656. Two questions: 1. why does each run of the same process not make the same demand on memory? They're doing exactly the same work & creating exactly the same new objects. 2. is the modest decrease in memory consumed by new method expected? (having read R_Inferno I was, perhaps naively, expecting more of an improvement) ? or am I missing something (more than likely! ) Thanks M> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 786300 21.0 1265230 33.8 1166886 31.2 Vcells 948412 7.3 3244126 24.8 3766604 28.8> cat("old version")Virtual size before call: 881692 user system elapsed 131.872 14.417 159.653 Virtual size after call: 899404> 899404-881692[1] 17712 ##################> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 786294 21.0 1265230 33.8 1166886 31.2 Vcells 948407 7.3 3244126 24.8 3766604 28.8> cat("old version")Virtual size before call: 881660 user system elapsed 133.281 14.473 159.661 Virtual size after call: 899440> 899440-881660[1] 17780 ##################> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 786294 21.0 1265230 33.8 1166886 31.2 Vcells 948407 7.3 3244126 24.8 3766604 28.8> cat("old version")Virtual size before call: 881696 user system elapsed 133.436 14.257 155.807 Virtual size after call: 899440> 899440-881696[1] 17744 ################## ##################> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 786413 21.0 1265230 33.8 1166886 31.2 Vcells 948460 7.3 3244126 24.8 3766604 28.8> cat("new version")Virtual size before call: 881696 user system elapsed 16.041 1.180 18.535 Virtual size after call: 895484> 895484-881696[1] 13788 ##################> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 786441 21.1 1265230 33.8 1166886 31.2 Vcells 948480 7.3 3244126 24.8 3766604 28.8> cat("new version")Virtual size before call: 882648 user system elapsed 16.321 1.068 18.136 Virtual size after call: 897788> 897788- 882648[1] 15140 ##################> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 786441 21.1 1265230 33.8 1166886 31.2 Vcells 948480 7.3 3244126 24.8 3766604 28.8> cat("new version")Virtual size before call: 882648 user system elapsed 16.581 0.992 19.351 Virtual size after call: 896304> 896304-882648[1] 13656 -- View this message in context: http://r.789695.n4.nabble.com/speed-issues-read-R-inferno-by-Patrick-Burns-a-memory-query-tp3162032p3162032.html Sent from the R help mailing list archive at Nabble.com.
Uwe Ligges
2010-Dec-23 16:23 UTC
[R] speed issues? read R_inferno by Patrick Burns: & a memory query
Actually the issue is not the size of memory that is consumed, but that memory allocation takes place and the object is copied in each iteration of the "bad" loop you have given below. This is not required for the second loop, where R can allocate the memory at once and does not need to copy the object around. Uwe Ligges On 23.12.2010 14:13, maddox wrote:> > Hi, > > I'm just starting out with R and came across R_inferno.pdf by Patrick Burns > just yesterday - I recommend it! > > His description of how 'growing' objects (e.g. obj<- c(obj, > additionalValue) eats up memory prompted me to rewrite a function (which > made such calls ~210 times) so that it used indexing into a dimensioned > object instead (i.e. obj[i, ]<- additionalValue). > > This transformed the process from > old version: > user system elapsed > 133.436 14.257 155.807 > > new version: > user system elapsed > 16.041 1.180 18.535 > > To say I'm delighted is understatement. Thanks for putting the Inferno > together, Patrick. > > However I'm misunderstanding the effect this has on memory use, (or > misunderstanding the code I've highjacked to look at memory use). To look at > virtual memory use I'm using the code below from this forum: > cmd<- paste("ps -o vsz", Sys.getpid()) > cat("\nVirtual size: ", system(cmd, intern = TRUE) [2], "\n", sep = "") > > I did three runs of the old version, and three with the new, preceding each > with gc()& got the outputs below. In summary, the runs of old method > required 17712, 17744& 17744& runs of new method required 13788, 15140& > 13656. > > Two questions: > 1. why does each run of the same process not make the same demand on memory? > They're doing exactly the same work& creating exactly the same new objects. > 2. is the modest decrease in memory consumed by new method expected? (having > read R_Inferno I was, perhaps naively, expecting more of an improvement) > > ? or am I missing something (more than likely! ) > > Thanks > > M > > > > >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 786300 21.0 1265230 33.8 1166886 31.2 > Vcells 948412 7.3 3244126 24.8 3766604 28.8 > >> cat("old version") > > Virtual size before call: 881692 > user system elapsed > 131.872 14.417 159.653 > > Virtual size after call: 899404 >> 899404-881692 > [1] 17712 > > ################## > >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 786294 21.0 1265230 33.8 1166886 31.2 > Vcells 948407 7.3 3244126 24.8 3766604 28.8 > >> cat("old version") > > Virtual size before call: 881660 > user system elapsed > 133.281 14.473 159.661 > > Virtual size after call: 899440 >> 899440-881660 > [1] 17780 > > ################## > >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 786294 21.0 1265230 33.8 1166886 31.2 > Vcells 948407 7.3 3244126 24.8 3766604 28.8 > >> cat("old version") > > Virtual size before call: 881696 > user system elapsed > 133.436 14.257 155.807 > > > Virtual size after call: 899440 >> 899440-881696 > [1] 17744 > > ################## ################## > >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 786413 21.0 1265230 33.8 1166886 31.2 > Vcells 948460 7.3 3244126 24.8 3766604 28.8 > >> cat("new version") > > Virtual size before call: 881696 > > user system elapsed > 16.041 1.180 18.535 > > > Virtual size after call: 895484 >> 895484-881696 > [1] 13788 > > ################## > >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 786441 21.1 1265230 33.8 1166886 31.2 > Vcells 948480 7.3 3244126 24.8 3766604 28.8 > >> cat("new version") > > Virtual size before call: 882648 > user system elapsed > 16.321 1.068 18.136 > > Virtual size after call: 897788 >> 897788- 882648 > [1] 15140 > > ################## > >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 786441 21.1 1265230 33.8 1166886 31.2 > Vcells 948480 7.3 3244126 24.8 3766604 28.8 > >> cat("new version") > > Virtual size before call: 882648 > > user system elapsed > 16.581 0.992 19.351 > > Virtual size after call: 896304 >> 896304-882648 > [1] 13656 > > > > >