thr3ads.net - R help - [R] Memory/data -last time I promise [Jul 2001]

If this information is useful, please help other people find it:
Share via:

Micheall Taylor

2001-Jul-24 12:00 UTC

[R] Memory/data -last time I promise

I've seen several posts over the past 2-3 weeks about memory issues. 
I've
tried to carefully follow the suggestions, but remain baffled as to why I
can't load data into R.  I hope that in revisiting this issue that I
don't
exasperate the list.

The setting: 
1 gig RAM , Linux machine
10 Stata files of approximately 14megs each
File contents appear at the end of this boorishly long email.

Purpose: 
load and combine in R for further analysis

Question:

1) I've placed memory queries in the command file to see what is going on. 
It appears that loading a 14meg file consumes approx 5 times this amount of
memory - i.e. available memory declines by 70megs when a 14 meg dataset is
loaded. (Seen in Method 2 below)
2) Ultimately I would like to replace Stata with R, but the Stata datasets
I frequently use are in the 100s of megs, which work fine on this machine.
Is R capable of this?


The command files:

I've attempted the process in to ways (each time as regular user
(ulimit=unlimited; and as root on the system to avoid OS restrictions). 
The first method is as follows:

METHOD ONE

R --no-save --max-vsize=800M < QuickLook.R > QuickLook.log
========   QuickLook.log follows ===============>
library(foreign)> a <- Sys.time()
> full <- read.dta('../off/off10yr1.dta')
> gc()          used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1018821 27.3    1166886 31.2         NA
Vcells 4456284 34.0    5070089 38.7        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 696303616 376999936        0 21487616 36982784
Swap: 271392768 263294976  8097792
MemTotal:   1048148 kB
MemFree:     368164 kB
MemShared:        0 kB
Buffers:      20984 kB
Cached:       36116 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:      7908 kB> n <- 2
> while (n<=3) {+       fname1 <- paste('../off/off10yr',n,'.dta',
sep="")
+       full <- rbind(read.dta(fname1),  full)
+       gc()
+       system('cat /proc/meminfo')
+       n
+       n <- n+1}
        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 780275712 293027840        0 21487616 51609600
Swap: 271392768 263294976  8097792
MemTotal:   1048148 kB
MemFree:     286160 kB
MemShared:        0 kB
Buffers:      20984 kB
Cached:       50400 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:      7908 kB
Error: cannot allocate vector of size 3291 Kb
Execution halted



SECOND METHOD

> library(foreign)
> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 637681664 435621888        0 21753856 31592448
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     425412 kB
MemShared:        0 kB
Buffers:      21244 kB
Cached:       30852 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> a <- Sys.time()
> full1 <- read.dta('../off/off10yr1.dta')
> gc()          used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1018825 27.3    1166886 31.2         NA
Vcells 4456285 34.0    5070086 38.7        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 707162112 366141440        0 21757952 45498368
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     357560 kB
MemShared:        0 kB
Buffers:      21248 kB
Cached:       44432 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full2 <- read.dta('../off/off10yr2.dta')
> gc()          used (Mb) gc trigger (Mb) limit (Mb)
Ncells 1861390 49.8    2105982 56.3         NA
Vcells 8879476 67.8    9315972 71.1        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 777375744 295927808        0 21757952 59826176
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     288992 kB
MemShared:        0 kB
Buffers:      21248 kB
Cached:       58424 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full3 <- read.dta('../off/off10yr3.dta')
> gc()           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  2703952  72.3    3708127  99.1         NA
Vcells 13302667 101.5   14190661 108.3        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 847650816 225652736        0 21757952 74153984
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     220364 kB
MemShared:        0 kB
Buffers:      21248 kB
Cached:       72416 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full4 <- read.dta('../off/off10yr4.dta')
> gc()           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  3546514  94.8    4953636 132.3         NA
Vcells 17725858 135.3   18735437 143.0        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 917798912 155504640        0 21762048 88481792
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:     151860 kB
MemShared:        0 kB
Buffers:      21252 kB
Cached:       86408 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full5 <- read.dta('../off/off10yr5.dta')
> gc()           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  4389076 117.3    6193578 165.4         NA
Vcells 22149049 169.0   23279670 177.7        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 988033024 85270528        0 21770240 102809600
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:      83272 kB
MemShared:        0 kB
Buffers:      21260 kB
Cached:      100400 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full6 <- read.dta('../off/off10yr6.dta')
> gc()           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  5231638 139.7    7700734 205.7         NA
Vcells 26572240 202.8   27312192 208.4        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 1058263040 15040512        0 21774336 117137408
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:      14688 kB
MemShared:        0 kB
Buffers:      21264 kB
Cached:      114392 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full7 <- read.dta('../off/off10yr7.dta')
> gc()           used  (Mb) gc trigger  (Mb) limit (Mb)
Ncells  6074200 162.2    8572058 228.9         NA
Vcells 30995431 236.5   31726362 242.1        800> system('cat /proc/meminfo')        total:    used:    free:  shared: buffers:  cached:
Mem:  1073303552 1069006848  4296704        0 21471232 72318976
Swap: 271392768 261148672 10244096
MemTotal:   1048148 kB
MemFree:       4196 kB
MemShared:        0 kB
Buffers:      20968 kB
Cached:       70624 kB
BigTotal:    131064 kB
BigFree:          0 kB
SwapTotal:   265032 kB
SwapFree:     10004 kB> full8 <- read.dta('../off/off10yr8.dta')Error: cannot allocate vector of size 1645 Kb
Execution halted


THIRD METHOD

I combined the the stata files in stata (same machine) and saved them as a
single file thinking there could be an inefficiency with  rbind(). Same
error code.


TO ASSURE YOU THAT I AM NOT CRAZY, THE FOLLOWING IS A SAMPLE DIRECTORY
LISTING OF THE FILES OF INTEREST

-rw-r--r--    1 ctaylor  econ          14M Jun 27 16:15 off10yr5.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 17:53 off10yr6.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 19:30 off10yr7.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 21:08 off10yr8.dta
-rw-r--r--    1 ctaylor  econ          14M Jun 27 23:02 off10yr9.dt


DATA CONTENTS (IN TEXT FORM OF COURSE)

head off10yr1.out
scenario        metcode yr      ginv    cons    gocc    abs     dvac    gmre   
gmer
1       "AA"    2001    .04     3384000 .047    3641000 -.006   .025  
.028
1       "AA"    2002    .042    3657000 .046    3716000 -.004   .034  
.035
1       "AA"    2003    .031    2816000 .047    3972000 -.015   .051  
.056
1       "AA"    2004    .035    3271000 .046    4064000 -.01    .075  
.078
1       "AA"    2005    .037    3636000 .037    3444000 0       .084  
.084
1       "AA"    2006    .041    4183000 .035    3315000 .006    .118  
.116
1       "AA"    2007    .043    4513000 .019    1915000 .021    .094  
.086
1       "AA"    2008    .039    4320000 .034    3431000 .005    .068  
.066
1       "AA"    2009    .034    3848000 .05     5262000 -.015   .057  
.063




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian Ripley

2001-Jul-24 15:58 UTC

head link

[R] Memory/data -last time I promise

On Tue, 24 Jul 2001, Micheall Taylor wrote:
> I've seen several posts over the past 2-3 weeks about memory issues. 
I've
> tried to carefully follow the suggestions, but remain baffled as to why I
> can't load data into R.  I hope that in revisiting this issue that I
don't
> exasperate the list.
>
> The setting:
> 1 gig RAM , Linux machine
> 10 Stata files of approximately 14megs each
> File contents appear at the end of this boorishly long email.
>
> Purpose:
> load and combine in R for further analysis
>
> Question:
>
> 1) I've placed memory queries in the command file to see what is going
on.
> It appears that loading a 14meg file consumes approx 5 times this amount of
> memory - i.e. available memory declines by 70megs when a 14 meg dataset is
> loaded. (Seen in Method 2 below)
That's quite possible.  A `14Mb dataset' is not too helpful to us.  You
seem to have one char (ca 2 chars) and 9 numeric variables per record.
That's ca 75 bytes per record.  An actual experiment and using object.size
gives 88 (there are row names too).  So at 70Mb, that is about 0.8M rows.
If that's not right, the data are not being read in correctly.

The main problem I see is that your machine seems unable to allocate more
than about 450Mb to R, and it has surprisingly little swap space.  (This
512Mb Linux machine has 1Gb of swap allocated, and happily allocates 800Mb
to R when needed.)
> 2) Ultimately I would like to replace Stata with R, but the Stata datasets
> I frequently use are in the 100s of megs, which work fine on this machine.
> Is R capable of this?
Probably not.  R does require objects to be stored in memory.

As a serious statistical question: what can you usefully do with 8M rows
on 9 continuous variables?  Why would a 1% sample not be already far more
than enough?  My group regularly works with datasets in the 100s of Mb,
but normally we either sample or we summarize in groups for further
analysis.  Our latest dataset is a 1.2Gb Oracle table, but it has
structure (it's 60 experiments for a start).

[...]

BTW, rbind is inefficient, but adding a piece at time is the least
efficient way to use it.  rbind(full1, full2, ..., full10) would be
better.  Allocating full and assigning to sub-sections would be better
still.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Paul Gilbert

2001-Jul-25 15:00 UTC

head link

[R] drop=F in [

>> dim(a[a[,1]<1,drop=F])
>NULL
You forgot a comma:
> dim(a[a[,1]<1,, drop=F])[1] 1 3

Paul Gilbert

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Paul Gilbert

2001-Jul-25 15:08 UTC

head link

[Rd] empty drop=F in [

Making a typo in the last help response I noticed:

a <- t(matrix(c(0.3249816, 1.184596, 1.0408749,
   1.4722996, 1.408512, 0.3768964,
   1.2737683, 1.811588, 1.9108336,
   1.8235127, 1.260909, 1.5995097 ), 3,4))
> dim(a[a[,1]<0,, drop=F])[1] 0 3

Is that right? In Splus 3.3 I get
> dim(a[a[,1]<0,, drop=F])NULL

Paul Gilbert
using> version         _
platform sparc-sun-solaris2.6
arch     sparc
os       solaris2.6
system   sparc, solaris2.6
status   Patched
major    1
minor    3.0
year     2001
month    07
day      22
language R


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2001 - Memory/data -last time I promise

[R] Memory/data -last time I promise

[R] Memory/data -last time I promise

[R] drop=F in [

[Rd] empty drop=F in [

Maybe Matching Threads