Friends I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns. I have been looking at the ff and bigmemory packages but am not sure that they will do. Before I get too deep can some one offer some wisdom about what the best direction to go would be? Switching to C/C++ is definitely an option if it is all too hard cheers Worik [[alternative HTML version deleted]]
i would suggest that if you want to use R that you get a 64-bit version with 24GB of memory to start. if your data is a numeric matrix, you will need 8GB for a single copy. Do you really need it all in memory at once, or can you partition the problem? Can you use a database to access the portion you need at any time? If you only need one, or two, columns at a time, then the use of a database storing the columns might work. You probably need some more analysis on exactly how you want to solve your problem understanding the limitations of the system. Sent from my iPad On Sep 2, 2011, at 1:13, Worik R <worikr at gmail.com> wrote:> Friends > > I am starting on a (section of the) project where I need to build a matrix > with on the order of 5 million rows and 200 columns > > I am wondering if I can stay in R. > > I need to do rollapply type operations on the columns, including some that > will be functions of (windows of) two columns. > > I have been looking at the ff and bigmemory packages but am not sure that > they will do. > > Before I get too deep can some one offer some wisdom about what the best > direction to go would be? > > Switching to C/C++ is definitely an option if it is all too hard > > cheers > Worik > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 09/01/2011 10:13 PM, Worik R wrote:> I am starting on a (section of the) project where I need to build a matrix > with on the order of 5 million rows and 200 columns > > I am wondering if I can stay in R. > > I need to do rollapply type operations on the columns, including some that > will be functions of (windows of) two columns.Perhaps useful to you -- I recently added WINDOW FUNCTION support to PL/R*. Currently this new feature is only available in git master, but within a few days I will push a new release. You can download the source from git here if you want: https://github.com/jconway/plr The official docs have not been updated yet, but see the pre-release docs here (specifically chapter 9): http://www.joeconway.com/plr/doc/plr-git-US.pdf HTH, Joe *PL/R allows you to execute R functions from within a PostgreSQL database -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support