I've seen several posts over the past 2-3 weeks about memory issues. I've tried to carefully follow the suggestions, but remain baffled as to why I can't load data into R. I hope that in revisiting this issue that I don't exasperate the list. The setting: 1 gig RAM , Linux machine 10 Stata files of approximately 14megs each File contents appear at the end of this boorishly long email. Purpose: load and combine in R for further analysis Question: 1) I've placed memory queries in the command file to see what is going on. It appears that loading a 14meg file consumes approx 5 times this amount of memory - i.e. available memory declines by 70megs when a 14 meg dataset is loaded. (Seen in Method 2 below) 2) Ultimately I would like to replace Stata with R, but the Stata datasets I frequently use are in the 100s of megs, which work fine on this machine. Is R capable of this? The command files: I've attempted the process in to ways (each time as regular user (ulimit=unlimited; and as root on the system to avoid OS restrictions). The first method is as follows: METHOD ONE R --no-save --max-vsize=800M < QuickLook.R > QuickLook.log ======== QuickLook.log follows ===============> library(foreign)> a <- Sys.time() > full <- read.dta('../off/off10yr1.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 1018821 27.3 1166886 31.2 NA Vcells 4456284 34.0 5070089 38.7 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 696303616 376999936 0 21487616 36982784 Swap: 271392768 263294976 8097792 MemTotal: 1048148 kB MemFree: 368164 kB MemShared: 0 kB Buffers: 20984 kB Cached: 36116 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 7908 kB> n <- 2 > while (n<=3) {+ fname1 <- paste('../off/off10yr',n,'.dta', sep="") + full <- rbind(read.dta(fname1), full) + gc() + system('cat /proc/meminfo') + n + n <- n+1} total: used: free: shared: buffers: cached: Mem: 1073303552 780275712 293027840 0 21487616 51609600 Swap: 271392768 263294976 8097792 MemTotal: 1048148 kB MemFree: 286160 kB MemShared: 0 kB Buffers: 20984 kB Cached: 50400 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 7908 kB Error: cannot allocate vector of size 3291 Kb Execution halted SECOND METHOD> library(foreign) > system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 637681664 435621888 0 21753856 31592448 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 425412 kB MemShared: 0 kB Buffers: 21244 kB Cached: 30852 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> a <- Sys.time() > full1 <- read.dta('../off/off10yr1.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 1018825 27.3 1166886 31.2 NA Vcells 4456285 34.0 5070086 38.7 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 707162112 366141440 0 21757952 45498368 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 357560 kB MemShared: 0 kB Buffers: 21248 kB Cached: 44432 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full2 <- read.dta('../off/off10yr2.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 1861390 49.8 2105982 56.3 NA Vcells 8879476 67.8 9315972 71.1 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 777375744 295927808 0 21757952 59826176 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 288992 kB MemShared: 0 kB Buffers: 21248 kB Cached: 58424 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full3 <- read.dta('../off/off10yr3.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 2703952 72.3 3708127 99.1 NA Vcells 13302667 101.5 14190661 108.3 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 847650816 225652736 0 21757952 74153984 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 220364 kB MemShared: 0 kB Buffers: 21248 kB Cached: 72416 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full4 <- read.dta('../off/off10yr4.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 3546514 94.8 4953636 132.3 NA Vcells 17725858 135.3 18735437 143.0 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 917798912 155504640 0 21762048 88481792 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 151860 kB MemShared: 0 kB Buffers: 21252 kB Cached: 86408 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full5 <- read.dta('../off/off10yr5.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 4389076 117.3 6193578 165.4 NA Vcells 22149049 169.0 23279670 177.7 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 988033024 85270528 0 21770240 102809600 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 83272 kB MemShared: 0 kB Buffers: 21260 kB Cached: 100400 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full6 <- read.dta('../off/off10yr6.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 5231638 139.7 7700734 205.7 NA Vcells 26572240 202.8 27312192 208.4 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 1058263040 15040512 0 21774336 117137408 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 14688 kB MemShared: 0 kB Buffers: 21264 kB Cached: 114392 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full7 <- read.dta('../off/off10yr7.dta') > gc()used (Mb) gc trigger (Mb) limit (Mb) Ncells 6074200 162.2 8572058 228.9 NA Vcells 30995431 236.5 31726362 242.1 800> system('cat /proc/meminfo')total: used: free: shared: buffers: cached: Mem: 1073303552 1069006848 4296704 0 21471232 72318976 Swap: 271392768 261148672 10244096 MemTotal: 1048148 kB MemFree: 4196 kB MemShared: 0 kB Buffers: 20968 kB Cached: 70624 kB BigTotal: 131064 kB BigFree: 0 kB SwapTotal: 265032 kB SwapFree: 10004 kB> full8 <- read.dta('../off/off10yr8.dta')Error: cannot allocate vector of size 1645 Kb Execution halted THIRD METHOD I combined the the stata files in stata (same machine) and saved them as a single file thinking there could be an inefficiency with rbind(). Same error code. TO ASSURE YOU THAT I AM NOT CRAZY, THE FOLLOWING IS A SAMPLE DIRECTORY LISTING OF THE FILES OF INTEREST -rw-r--r-- 1 ctaylor econ 14M Jun 27 16:15 off10yr5.dta -rw-r--r-- 1 ctaylor econ 14M Jun 27 17:53 off10yr6.dta -rw-r--r-- 1 ctaylor econ 14M Jun 27 19:30 off10yr7.dta -rw-r--r-- 1 ctaylor econ 14M Jun 27 21:08 off10yr8.dta -rw-r--r-- 1 ctaylor econ 14M Jun 27 23:02 off10yr9.dt DATA CONTENTS (IN TEXT FORM OF COURSE) head off10yr1.out scenario metcode yr ginv cons gocc abs dvac gmre gmer 1 "AA" 2001 .04 3384000 .047 3641000 -.006 .025 .028 1 "AA" 2002 .042 3657000 .046 3716000 -.004 .034 .035 1 "AA" 2003 .031 2816000 .047 3972000 -.015 .051 .056 1 "AA" 2004 .035 3271000 .046 4064000 -.01 .075 .078 1 "AA" 2005 .037 3636000 .037 3444000 0 .084 .084 1 "AA" 2006 .041 4183000 .035 3315000 .006 .118 .116 1 "AA" 2007 .043 4513000 .019 1915000 .021 .094 .086 1 "AA" 2008 .039 4320000 .034 3431000 .005 .068 .066 1 "AA" 2009 .034 3848000 .05 5262000 -.015 .057 .063 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 24 Jul 2001, Micheall Taylor wrote:> I've seen several posts over the past 2-3 weeks about memory issues. I've > tried to carefully follow the suggestions, but remain baffled as to why I > can't load data into R. I hope that in revisiting this issue that I don't > exasperate the list. > > The setting: > 1 gig RAM , Linux machine > 10 Stata files of approximately 14megs each > File contents appear at the end of this boorishly long email. > > Purpose: > load and combine in R for further analysis > > Question: > > 1) I've placed memory queries in the command file to see what is going on. > It appears that loading a 14meg file consumes approx 5 times this amount of > memory - i.e. available memory declines by 70megs when a 14 meg dataset is > loaded. (Seen in Method 2 below)That's quite possible. A `14Mb dataset' is not too helpful to us. You seem to have one char (ca 2 chars) and 9 numeric variables per record. That's ca 75 bytes per record. An actual experiment and using object.size gives 88 (there are row names too). So at 70Mb, that is about 0.8M rows. If that's not right, the data are not being read in correctly. The main problem I see is that your machine seems unable to allocate more than about 450Mb to R, and it has surprisingly little swap space. (This 512Mb Linux machine has 1Gb of swap allocated, and happily allocates 800Mb to R when needed.)> 2) Ultimately I would like to replace Stata with R, but the Stata datasets > I frequently use are in the 100s of megs, which work fine on this machine. > Is R capable of this?Probably not. R does require objects to be stored in memory. As a serious statistical question: what can you usefully do with 8M rows on 9 continuous variables? Why would a 1% sample not be already far more than enough? My group regularly works with datasets in the 100s of Mb, but normally we either sample or we summarize in groups for further analysis. Our latest dataset is a 1.2Gb Oracle table, but it has structure (it's 60 experiments for a start). [...] BTW, rbind is inefficient, but adding a piece at time is the least efficient way to use it. rbind(full1, full2, ..., full10) would be better. Allocating full and assigning to sub-sections would be better still. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>> dim(a[a[,1]<1,drop=F]) >NULLYou forgot a comma:> dim(a[a[,1]<1,, drop=F])[1] 1 3 Paul Gilbert -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Making a typo in the last help response I noticed: a <- t(matrix(c(0.3249816, 1.184596, 1.0408749, 1.4722996, 1.408512, 0.3768964, 1.2737683, 1.811588, 1.9108336, 1.8235127, 1.260909, 1.5995097 ), 3,4))> dim(a[a[,1]<0,, drop=F])[1] 0 3 Is that right? In Splus 3.3 I get> dim(a[a[,1]<0,, drop=F])NULL Paul Gilbert using> version_ platform sparc-sun-solaris2.6 arch sparc os solaris2.6 system sparc, solaris2.6 status Patched major 1 minor 3.0 year 2001 month 07 day 22 language R -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._