thr3ads.net - R help - [R] big panel: filehash, bigmemory or other [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Eric Fail

2010-Feb-22 22:13 UTC

[R] big panel: filehash, bigmemory or other

Dear R-list

I'm on my way to start a new project on a rather big panel, consisting  
of approximately 8 million observations in 30 waves of data and about  
15 variables. I have a similar data set that is approximately 7  
gigabytes in size.

Until now I have done my data management in SAS, and Stata, mostly  
identifying spells, counting events in intervals, and a like, but I  
would like to do the data management-and fitting my models-in R.

Though R can't handle the data in a normal R-way, it's simply too big.  
So I thought of trying either filehash, bigmemory or some other  
similar package I haven't heard of (yet). In the documentation to  
'bigmemory' is says  that the package is capable of ``basic  
manipulation '' on ``manageable subsets of the data '', but what
does
that actually mean?

Since learning this in R is a rather time consuming process, and I  
know SAS is capable of doing the data management, and have the proc  
mixed module, I wanted to ask on the list, before I set out on this  
odyssey.

Does anyone out there have any practical experience with data sets  
(panels) that size and maybe some experience fitting a model,  
presumably using the lmer package or alike, using filehash or  
bigmemory, that they would be willing to share?

Thanks in advance,
Eric

Johan Jackson

2010-Mar-12 02:47 UTC

head link

[R] big panel: filehash, bigmemory or other

Hello Eric,

If you can do a project like this (that manages huge datasets) in SAS, I'd
recommend to just do them in SAS rather than use R. I've sadly come to the
conclusion that R isn't very good at working with large datasets, and until
the powers that be try to do something about to help users like us (e.g.,
help us get around the damn 2^31-1 limit on vectors), R will remain a great
language that is very awkward to use with large datasets. I've used
bigmemory and ff - and I have the greatest respect and appreciation for the
authors of these packages - but they ultimately are awkward to work with
compared to doing things natively in R. For example, there is still a 2^31-1
limit on objects in ff, and bigmemory has been buggy when I tried to use it.
Good luck!

JJ

On Mon, Feb 22, 2010 at 3:13 PM, Eric Fail <e@it.dk> wrote:
> Dear R-list
>
> I'm on my way to start a new project on a rather big panel, consisting
of
> approximately 8 million observations in 30 waves of data and about 15
> variables. I have a similar data set that is approximately 7 gigabytes in
> size.
>
> Until now I have done my data management in SAS, and Stata, mostly
> identifying spells, counting events in intervals, and a like, but I would
> like to do the data management-and fitting my models-in R.
>
> Though R can't handle the data in a normal R-way, it's simply too
big. So I
> thought of trying either filehash, bigmemory or some other similar package
I
> haven't heard of (yet). In the documentation to 'bigmemory' is
says  that
> the package is capable of ``basic manipulation '' on ``manageable
subsets of
> the data '', but what does that actually mean?
>
> Since learning this in R is a rather time consuming process, and I know SAS
> is capable of doing the data management, and have the proc mixed module, I
> wanted to ask on the list, before I set out on this odyssey.
>
> Does anyone out there have any practical experience with data sets (panels)
> that size and maybe some experience fitting a model, presumably using the
> lmer package or alike, using filehash or bigmemory, that they would be
> willing to share?
>
> Thanks in advance,
> Eric
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Feb 2010 - big panel: filehash, bigmemory or other

[R] big panel: filehash, bigmemory or other

[R] big panel: filehash, bigmemory or other

Seemingly Similar Threads