How do people deal with R and memory issues? I have tried using gc() to see how much memory is used at each step. Scanned Crawley R-Book and all other R books I have available and the FAQ on-line but no help really found. Running WinXP Pro (32 bit) with 4 GB RAM. One SATA drive pair is in RAID 0 configuration with 10000 MB allocated as virtual memory. I do have another machine set up with Ubuntu but it only has 2 GB RAM and have not been able to get R installed on that system. I can run smaller sample data sets w/o problems and everything plots as needed. However I need to review large data sets. Using latest R version 2.9.0 (2009-04-17) My data is in CSV format with a header row and is a big data set with 1,200,240 rows! E.g. below: Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc, 9.81,0,28.78,24.54,26.49,25.81,48.84,14.78, 4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83, 4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44, 10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05, 15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64, 14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32, 11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62, 14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06, 8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2, 3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83, 16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11, 16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95, 16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38, 5.12,107.16,32,29.41,30.46,29.41,134.45,20.88, 16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81 .... etc I am getting the following warning/error message: Error: cannot allocate vector of size 228.9 Mb Complete listing from R console below: > library(batcalls) Loading required package: ggplot2 Loading required package: proto Loading required package: grid Loading required package: reshape Loading required package: plyr Attaching package: 'ggplot2' The following object(s) are masked from package:grid : nullGrob > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 186251 5.0 407500 10.9 350000 9.4 Vcells 98245 0.8 786432 6.0 358194 2.8 > BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv") > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 188034 5.1 667722 17.9 378266 10.2 Vcells 9733249 74.3 20547202 156.8 20535538 156.7 > attach(BR) > library(ggplot2) > library(MASS) > library(batcalls) > BRC<-kde2d(Sc,Fc) Error: cannot allocate vector of size 228.9 Mb > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 198547 5.4 667722 17.9 378266 10.2 Vcells 19339695 147.6 106768803 814.6 124960863 953.4 > Tnx for any insight, Bruce
On Apr 26, 2009, at 11:20 AM, Neotropical bat risk assessments wrote:> > How do people deal with R and memory issues?They should read the R-FAQ and the Windows FAQ as you say you have. http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021> > I have tried using gc() to see how much memory is used at each step. > Scanned Crawley R-Book and all other R books I have available and > the FAQ > on-line but no help really found. > Running WinXP Pro (32 bit) with 4 GB RAM. > One SATA drive pair is in RAID 0 configuration with 10000 MB > allocated as > virtual memory.On the basis of my Windows experience this may not be enough information. (The drive information is fairly irrelevant.) The R-Win-FAQ suggests: ?Memory ?memory.size # "for information about memory usage. The limit can be raised by calling memory.limit " Although you read the FAQs, have you zeroed in on the relevant sections? What does memory.size report? And what happens when you run R "alone" in WinXP and alter the default settings with memory.limit?> > I do have another machine set up with Ubuntu but it only has 2 GB > RAM and > have not been able to get R installed on that system. > I can run smaller sample data sets w/o problems and everything > plots as > needed. > However I need to review large data sets. > Using latest R version 2.9.0 (2009-04-17) > My data is in CSV format with a header row and is a big data set > with > 1,200,240 rows!It's long, but not particularly wide. Last year I was getting satisfactory work done on a 990K by 50-60 column dataset in a memory constraint of 4GB on a different OS. Your constraint is in the 2.5- 3.0 GB area but your dataframe is only a third of the size.> > E.g. below: > Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc, > 9.81,0,28.78,24.54,26.49,25.81,48.84,14.78, > 4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83, > 4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44, > 10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05, > 15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64, > 14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32, > 11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62, > 14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06, > 8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2, > 3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83, > 16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11, > 16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95, > 16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38, > 5.12,107.16,32,29.41,30.46,29.41,134.45,20.88, > 16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81 > .... etc > I am getting the following warning/error message: > Error: cannot allocate vector of size 228.9 MbSo you got the data into memory. That does not appear to exceed the capacity of your hardware setup, if you address the options offered above.> > Complete listing from R console below: >> library(batcalls) > Loading required package: ggplot2 > Loading required package: proto > Loading required package: grid > Loading required package: reshape > Loading required package: plyr > Attaching package: 'ggplot2' > The following object(s) are masked from package:grid : > nullGrob >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 186251 5.0 407500 10.9 350000 9.4 > Vcells 98245 0.8 786432 6.0 358194 2.8 >> BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv") >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 188034 5.1 667722 17.9 378266 10.2 > Vcells 9733249 74.3 20547202 156.8 20535538 156.7Looks like you need to use memory.limit(<some bigger number>)> >> attach(BR) >> library(ggplot2) >> library(MASS) >> library(batcalls) >> BRC<-kde2d(Sc,Fc) > Error: cannot allocate vector of size 228.9 Mb >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 198547 5.4 667722 17.9 378266 10.2 > Vcells 19339695 147.6 106768803 814.6 124960863 953.4 >> > Tnx for any insight, > Bruce-- David Winsemius, MD Heritage Laboratories West Hartford, CT
On Sun, 26 Apr 2009 09:20:12 -0600 Neotropical bat risk assessments <neotropical.bats at gmail.com> wrote: NBRA> NBRA> How do people deal with R and memory issues? NBRA> I have tried using gc() to see how much memory is used at each NBRA> step. Scanned Crawley R-Book and all other R books I have NBRA> available and the FAQ on-line but no help really found. NBRA> Running WinXP Pro (32 bit) with 4 GB RAM. There is a limit on windows, read the FAQ: http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021 So either you use a (64bit) Linux with enough memory or you use packages or a SQL solution that is able to deal with huge datasets. (biglm for example) Stefan
Then post the material that would make sense for Windows. What _does_ memory.limits() return? This _was_ asked and you did not answer. How many other objects do you have in your workspace? How big are they? Jim Holtman offered this function that displays memory occupation by object and total: my.ls <- function (pos = 1, sorted = F) { .result <- sapply(ls(pos = pos, all.names = TRUE), function(..x) object.size(eval(as.symbol(..x)))) if (sorted) { .result <- rev(sort(.result)) } .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result))) names(.ls) <- "Size" .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0, format = "f") .ls$Mode <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) mode(eval(as.symbol(x))))), "-------") .ls } On Apr 26, 2009, at 12:19 PM, Neotropical bat risk assessments wrote:> > Thanks for the comments, > > I did read the FAQ and that link you sent the first time. No help > and very general. > > I did set memory.size(max = TRUE) but still get same warning-error > message. > > Bruce > > At 09:58 AM 4/26/2009, you wrote: > >> On Apr 26, 2009, at 11:20 AM, Neotropical bat risk assessments wrote: >> >>> >>> How do people deal with R and memory issues? >> >> They should read the R-FAQ and the Windows FAQ as you say you have. >> >> http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021David Winsemius, MD Heritage Laboratories West Hartford, CT [[alternative HTML version deleted]]
Neotropical bat risk assessments wrote:> > > How do people deal with R and memory issues? > I have tried using gc() to see how much memory is used at each step. > Scanned Crawley R-Book and all other R books I have available and the > FAQ > on-line but no help really found. > Running WinXP Pro (32 bit) with 4 GB RAM. > One SATA drive pair is in RAID 0 configuration with 10000 MB allocated > as > virtual memory. > I do have another machine set up with Ubuntu but it only has 2 GB RAM > and > have not been able to get R installed on that system. > I can run smaller sample data sets w/o problems and everything plots as > needed. > However I need to review large data sets. > Using latest R version 2.9.0 (2009-04-17) > My data is in CSV format with a header row and is a big data set with > 1,200,240 rows! > E.g. below: > >Maybe not the general solution you're looking for, but would you get reasonable results by either (1) subsampling data or (2) reading the data file in chunks and averaging the kernel densities you get from each chunk? -- View this message in context: http://www.nabble.com/Memory-issues-in-R-tp23243275p23249481.html Sent from the R help mailing list archive at Nabble.com.
Others may have mentioned this, but you might try loading your data in a small database like mysql and then pulling smaller portions of your data in via a package like RMySQL or RODBC. One approach might be to split the data file into smaller pieces outside of R, then read the smaller pieces into R one at a time, subsequently creating aggregations (counts and sums of your data fields). From these aggregations you can create an "aggregated" dataset that is smaller and more pithy that you ultimately may graph with ggplot2 or other libraries of your choice. -Avram On Apr 26, 2009, at 8:20 AM, Neotropical bat risk assessments wrote:> > How do people deal with R and memory issues? > I have tried using gc() to see how much memory is used at each > step. > Scanned Crawley R-Book and all other R books I have available > and the FAQ > on-line but no help really found. > Running WinXP Pro (32 bit) with 4 GB RAM. > One SATA drive pair is in RAID 0 configuration with 10000 MB > allocated as > virtual memory. > I do have another machine set up with Ubuntu but it only has 2 > GB RAM and > have not been able to get R installed on that system. > I can run smaller sample data sets w/o problems and everything > plots as > needed. > However I need to review large data sets. > Using latest R version 2.9.0 (2009-04-17) > My data is in CSV format with a header row and is a big data > set with > 1,200,240 rows! > E.g. below: > Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc, > 9.81,0,28.78,24.54,26.49,25.81,48.84,14.78, > 4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83, > 4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44, > 10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05, > 15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64, > 14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32, > 11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62, > 14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06, > 8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2, > 3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83, > 16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11, > 16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95, > 16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38, > 5.12,107.16,32,29.41,30.46,29.41,134.45,20.88, > 16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81 > .... etc > I am getting the following warning/error message: > Error: cannot allocate vector of size 228.9 Mb > Complete listing from R console below: >> library(batcalls) > Loading required package: ggplot2 > Loading required package: proto > Loading required package: grid > Loading required package: reshape > Loading required package: plyr > Attaching package: 'ggplot2' > The following object(s) are masked from package:grid : > nullGrob >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 186251 5.0 407500 10.9 350000 9.4 > Vcells 98245 0.8 786432 6.0 358194 2.8 >> BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv") >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 188034 5.1 667722 17.9 378266 10.2 > Vcells 9733249 74.3 20547202 156.8 20535538 156.7 >> attach(BR) >> library(ggplot2) >> library(MASS) >> library(batcalls) >> BRC<-kde2d(Sc,Fc) > Error: cannot allocate vector of size 228.9 Mb >> gc() > used (Mb) gc trigger (Mb) max used (Mb) > Ncells 198547 5.4 667722 17.9 378266 10.2 > Vcells 19339695 147.6 106768803 814.6 124960863 953.4 >> > Tnx for any insight, > Bruce > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
If by "review" you mean read in summary information then sqldf can do that using the sqlite database in two lines of code. You don't have to install, set up or define the database at all. sqldf and the underlying RSQLite will do all that for you. See example 6b on the home page: http://code.google.com/p/sqldf/#Example_6._File_Input On Sun, Apr 26, 2009 at 11:20 AM, Neotropical bat risk assessments <neotropical.bats at gmail.com> wrote:> > ? How do people deal with R and memory issues? > ? I have tried using gc() to see how much memory is used at each step. > ? Scanned Crawley R-Book and all other R books I have available and the FAQ > ? on-line but no help really found. > ? Running WinXP Pro (32 bit) with 4 GB RAM. > ? One SATA drive pair is in RAID 0 configuration with 10000 MB allocated as > ? virtual memory. > ? I do have another machine set up with Ubuntu but it only has 2 GB RAM and > ? have not been able to get R installed on that system. > ? I can run smaller sample data sets w/o problems and everything plots as > ? needed. > ? However I need to review large data sets. > ? Using latest R version 2.9.0 (2009-04-17) > ? My ?data is in CSV format with a header row and is a big data set with > ? 1,200,240 rows! > ? E.g. below: > ? Dur,TBC,Fmax,Fmin,Fmean,Fc,S1,Sc, > ? 9.81,0,28.78,24.54,26.49,25.81,48.84,14.78, > ? 4.79,1838.47,37.21,29.41,31.76,29.52,241.77,62.83, > ? 4.21,5.42,28.99,26.23,27.53,27.4,76.03,11.44, > ? 10.69,193.48,30.53,25.4,27.69,25.4,-208.19,26.05, > ? 15.5,248.18,30.77,24.32,26.57,24.92,-202.76,18.64, > ? 14.85,217.47,31.25,24.62,26.93,25.56,-88.4,10.32, > ? 11.86,158.01,33.61,25.24,27.66,25.32,83.32,17.62, > ? 14.05,229.74,30.65,24.24,26.76,25.24,61.87,14.06, > ? 8.71,264.02,31.01,25.72,27.56,25.72,253.18,19.2, > ? 3.91,10.3,25.32,24.02,24.55,24.02,-71.67,16.83, > ? 16.11,242.21,29.85,24.02,26.07,24.62,79.45,19.11, > ? 16.81,246.48,28.57,23.05,25.46,23.81,-179.82,15.95, > ? 16.93,255.09,28.78,23.19,25.75,24.1,-112.21,16.38, > ? 5.12,107.16,32,29.41,30.46,29.41,134.45,20.88, > ? 16.7,150.49,27.97,22.92,24.91,23.95,42.96,16.81 > ? .... etc > ? I am getting the following warning/error message: > ? Error: cannot allocate vector of size 228.9 Mb > ? Complete listing from R console below: > ? > library(batcalls) > ? Loading required package: ggplot2 > ? Loading required package: proto > ? Loading required package: grid > ? Loading required package: reshape > ? Loading required package: plyr > ? Attaching package: 'ggplot2' > ? ? ? ? ? The following object(s) are masked from package:grid : > ? ? ? ? ? ?nullGrob > ? > gc() > ? ? ? ? ? ?used (Mb) gc trigger (Mb) max used (Mb) > ? Ncells 186251 ?5.0 ? ? 407500 10.9 ? 350000 ?9.4 > ? Vcells ?98245 ?0.8 ? ? 786432 ?6.0 ? 358194 ?2.8 > ? > BR <- read.csv ("C:/R-Stats/Bat calls/Reduced bats.csv") > ? > gc() > ? ? ? ? ? ? used (Mb) gc trigger ?(Mb) max used ?(Mb) > ? Ncells ?188034 ?5.1 ? ? 667722 ?17.9 ? 378266 ?10.2 > ? Vcells 9733249 74.3 ? 20547202 156.8 20535538 156.7 > ? > attach(BR) > ? > library(ggplot2) > ? > library(MASS) > ? > library(batcalls) > ? > BRC<-kde2d(Sc,Fc) > ? Error: cannot allocate vector of size 228.9 Mb > ? > gc() > ? ? ? ? ? ? ?used ?(Mb) gc trigger ?(Mb) ?max used ?(Mb) > ? Ncells ? 198547 ? 5.4 ? ? 667722 ?17.9 ? ?378266 ?10.2 > ? Vcells 19339695 147.6 ?106768803 814.6 124960863 953.4 > ? > > ? Tnx for any insight, > ? Bruce > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >