Hello everybody, if I try to (r)bind a number of large dataframes I run out of memory because R wastes memory and seems to "forget" to release memory. For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols by 800 rows) which needs ~20 MB RAM if it is loaded as the only object. Now I try to bind all data frames to a large one and need more than 1165MB (!) RAM (To simplify the R code, I use the same file ten times): ________ start example 1 __________ load(myFile) ds.tmp <- ds for (Cycle in 1:10) { ds.tmp <- rbind(ds.tmp, ds) } ________ end example 1 __________ Stepping into details I found the following (comment shows RAM usage after this line was executed): load(myFile) # 40MB (19MB for R itself) ds.tmp <- ds # 40MB; => only a pointer seems to be copied x<-rbind(ds.tmp, ds) # 198MB x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to # 35MB more RAM usage - why? Now I played around, but I couldn't find a solution. For example I bound each dataframe step by step and removed the variables and cleared memory, but I still need 1140MB(!) RAM: ________ start example 2 __________ tmpFile<- paste(myFile,'.tmp',sep="") load(myFile) ds.tmp <- ds save(ds.tmp, file=tmpFile, compress=T) for (Cycle in 1:10) { ds <- NULL ds.tmp <- NULL rm(ds, ds.tmp) gc() load(tmpFile) load(myFile) ds.tmp <- rbind(ds.tmp, ds) save(ds.tmp,file=tmpFile, compress=T) cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n') } ________ end example 1 __________ platform i386-pc-solaris2.8 arch i386 os solaris2.8 system i386, solaris2.8 status major 1 minor 9.1 year 2004 month 06 day 21 language R How can I avoid to run in that memory problem? Any ideas are very appreciated. Thank you in advance & kind regards, Lutz Thieme AMD Saxony/ Product Engineering AMD Saxony Limited Liability Company & Co. KG phone: + 49-351-277-4269 M/S E22-PE, Wilschdorfer Landstr. 101 fax: + 49-351-277-9-4269 D-01109 Dresden, Germany
lutz.thieme at amd.com wrote:> Hello everybody, > > if I try to (r)bind a number of large dataframes I run out of memory because R > wastes memory and seems to "forget" to release memory. > > For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols > by 800 rows) which needs ~20 MB RAM if it is loaded as the only object. > Now I try to bind all data frames to a large one and need more than 1165MB (!) > RAM (To simplify the R code, I use the same file ten times): > > ________ start example 1 __________ > load(myFile) > ds.tmp <- ds > for (Cycle in 1:10) { > ds.tmp <- rbind(ds.tmp, ds) > } > ________ end example 1 __________ > > > > Stepping into details I found the following (comment shows RAM usage after this line > was executed): > load(myFile) # 40MB (19MB for R itself) > ds.tmp <- ds # 40MB; => only a pointer seems to be copied > x<-rbind(ds.tmp, ds) # 198MB > x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to > # 35MB more RAM usage - why? > > > Now I played around, but I couldn't find a solution. For example I bound each dataframe > step by step and removed the variables and cleared memory, but I still need 1140MB(!) > RAM: > > ________ start example 2 __________ > tmpFile<- paste(myFile,'.tmp',sep="") > load(myFile) > ds.tmp <- ds > save(ds.tmp, file=tmpFile, compress=T) > > for (Cycle in 1:10) { > ds <- NULL > ds.tmp <- NULL > rm(ds, ds.tmp) > gc() > load(tmpFile) > load(myFile) > ds.tmp <- rbind(ds.tmp, ds) > save(ds.tmp,file=tmpFile, compress=T) > cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n') > } > ________ end example 1 __________ > > > platform i386-pc-solaris2.8 > arch i386 > os solaris2.8 > system i386, solaris2.8 > status > major 1 > minor 9.1 > year 2004 > month 06 > day 21 > language R > > > > > How can I avoid to run in that memory problem? Any ideas are very appreciated. > Thank you in advance & kind regards,If you are going to look at the memory usage you should use gc(), and perhaps repeated calls to gc(), before checking the memory footprint. This will force a garbage collection. Also, you will probably save memory by treating your data frames as lists and concatenating them, then converting the result to a data frame.
lutz.thieme at amd.com wrote:> Hello everybody, > > if I try to (r)bind a number of large dataframes I run out of memory because R > wastes memory and seems to "forget" to release memory. > > For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols > by 800 rows) which needs ~20 MB RAM if it is loaded as the only object. > Now I try to bind all data frames to a large one and need more than 1165MB (!) > RAM (To simplify the R code, I use the same file ten times): > > ________ start example 1 __________ > load(myFile) > ds.tmp <- ds > for (Cycle in 1:10) { > ds.tmp <- rbind(ds.tmp, ds) > } > ________ end example 1 __________ > > > > Stepping into details I found the following (comment shows RAM usage after this line > was executed): > load(myFile) # 40MB (19MB for R itself) > ds.tmp <- ds # 40MB; => only a pointer seems to be copied > x<-rbind(ds.tmp, ds) # 198MB > x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to > # 35MB more RAM usage - why?I'm guessing your problem is fragmented memory. You are creating big objects, then making them bigger. This means R needs to go looking for large allocations for the replacements, but they won't fit in the spots left by the things you've deleted, so those are being left empty. A solution to this is to use two passes: first figure out how much space you need, then allocate it and fill it. E.g. for (Cycle in 1:10) { rows[Cycle] <- .... some calculation based on the data ... } ds.tmp <- data.frame(x=double(sum(rows)), y=double(sum(rows)), ... for (Cycle in 1:10) { ds.tmp[ appropriate rows, ] <- new data } Duncan Murdoch
Rather than 'rbind' in a loop, try putting your dataframes in a list and then doing something like 'do.call("rbind", list.of.data.frames")'. -roger lutz.thieme at amd.com wrote:> Hello everybody, > > if I try to (r)bind a number of large dataframes I run out of memory because R > wastes memory and seems to "forget" to release memory. > > For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols > by 800 rows) which needs ~20 MB RAM if it is loaded as the only object. > Now I try to bind all data frames to a large one and need more than 1165MB (!) > RAM (To simplify the R code, I use the same file ten times): > > ________ start example 1 __________ > load(myFile) > ds.tmp <- ds > for (Cycle in 1:10) { > ds.tmp <- rbind(ds.tmp, ds) > } > ________ end example 1 __________ > > > > Stepping into details I found the following (comment shows RAM usage after this line > was executed): > load(myFile) # 40MB (19MB for R itself) > ds.tmp <- ds # 40MB; => only a pointer seems to be copied > x<-rbind(ds.tmp, ds) # 198MB > x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to > # 35MB more RAM usage - why? > > > Now I played around, but I couldn't find a solution. For example I bound each dataframe > step by step and removed the variables and cleared memory, but I still need 1140MB(!) > RAM: > > ________ start example 2 __________ > tmpFile<- paste(myFile,'.tmp',sep="") > load(myFile) > ds.tmp <- ds > save(ds.tmp, file=tmpFile, compress=T) > > for (Cycle in 1:10) { > ds <- NULL > ds.tmp <- NULL > rm(ds, ds.tmp) > gc() > load(tmpFile) > load(myFile) > ds.tmp <- rbind(ds.tmp, ds) > save(ds.tmp,file=tmpFile, compress=T) > cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n') > } > ________ end example 1 __________ > > > platform i386-pc-solaris2.8 > arch i386 > os solaris2.8 > system i386, solaris2.8 > status > major 1 > minor 9.1 > year 2004 > month 06 > day 21 > language R > > > > > How can I avoid to run in that memory problem? Any ideas are very appreciated. > Thank you in advance & kind regards, > > > > Lutz Thieme > AMD Saxony/ Product Engineering AMD Saxony Limited Liability Company & Co. KG > phone: + 49-351-277-4269 M/S E22-PE, Wilschdorfer Landstr. 101 > fax: + 49-351-277-9-4269 D-01109 Dresden, Germany > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/