thr3ads.net - R help - [R] data usage [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Edwin Leuven

2004-Mar-29 12:14 UTC

[R] data usage

hello,

for my present project i need to use the data stored in a ca. 100mb 
stata dataset. 

when i import the data in R using:

library("foreign")
x<-read.dta("mydata.dta")

i find that R needs a startling 665mb of memory!

(in stata i can simply allocate, say, 128mb of memory and go ahead)

is there anyway around this, or should i forget R for analysis of 
datasets of this magnitude?

thanks for you help in this, edwin.

Douglas Bates

2004-Mar-29 13:25 UTC

head link

[R] data usage

Edwin Leuven <e.leuven at uva.nl> writes:
> for my present project i need to use the data stored in a ca. 100mb 
> stata dataset. 
> 
> when i import the data in R using:
> 
> library("foreign")
> x<-read.dta("mydata.dta")
> 
> i find that R needs a startling 665mb of memory!
> 
> (in stata i can simply allocate, say, 128mb of memory and go ahead)
> 
> is there anyway around this, or should i forget R for analysis of 
> datasets of this magnitude?
What does the 665 MB represent?  Did you try doing a garbage
collection after you had done the import?

I would suggest

library("foreign")
x<-read.dta("mydata.dta")
gc()              # possibly repeat gc() to lower the thresholds
object.size(x)    # the actual storage (in bytes) allocated to this object
save(x, file = "mydata.rda", compress = TRUE)

After that you can start a new session and use

load("mydata.rda")

to obtain a copy of the data set without the storage overhead incurred
by the stata -> R conversion.

P.S. As described in the help page for object.size, the returned value
is more properly described as an estimate of the object size because
sometimes it is difficult to determine the object size accurately.

Liaw, Andy

2004-Mar-29 14:34 UTC

head link

[R] data usage

Is the ca. 100MB the size of the .dta file, or the size of the data when
loaded into Stata?  Or is there not a difference?  Have you checked the size
of the .rda file created as Doug had suggested?  I'd be curious to see what
that is...

Andy
> From: Edwin Leuven
> 
> > What does the 665 MB represent?  Did you try doing a garbage
> > collection after you had done the import?
> 
> i didn't (sorry, R beginner)
> 
> i followed your example and things look much better now, and
> object.size(x) returns:
> 
> 219,167,604
> 
> which is about double the size of the same object in stata where it 
> is:
> 
> 104,882,604
> 
> this leaves quite some room for improvement, but at least i can 
> now handle the data on my laptop...
> 
> thanks for your quick response! edwin
> 
> 
> > I would suggest
> > 
> > library("foreign")
> > x<-read.dta("mydata.dta")
> > gc()              # possibly repeat gc() to lower the thresholds
> > object.size(x)    # the actual storage (in bytes) allocated 
> to this object
> > save(x, file = "mydata.rda", compress = TRUE)
> > 
> > After that you can start a new session and use
> > 
> > load("mydata.rda")
> > 
> > to obtain a copy of the data set without the storage 
> overhead incurred
> > by the stata -> R conversion.
> > 
> > P.S. As described in the help page for object.size, the 
> returned value
> > is more properly described as an estimate of the object size because
> > sometimes it is difficult to determine the object size accurately.
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Thomas Lumley

2004-Mar-29 23:16 UTC

head link

[R] data usage

On Mon, 29 Mar 2004, Edwin Leuven wrote:
> > Stata tends to store data as float, integer, or even byte where
> > appropriate for the precision.  This is one source of space saving. A
> > factor of 2 is not atypical.
>
> i was suspecting something like this.
>
> what does R do? default to double always (or something like this)?
>
> is this a deliberate design choice made by the R people or just
> convenience while not having to worry about datatypes?
R has only numeric (C double) and integer (C int) types for storing
numeric data. I think the reason for not having single precision is
insufficient accuracy in computation. Shorter integers might be useful
sometimes.

	-thomas

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Mar 2004 - data usage

[R] data usage

[R] data usage

[R] data usage

[R] data usage

Apparently Analagous Threads