The problem is common, I have 100GB of data, but only 8GB of RAM. I was thinking of transforming the 100GB of data, which right now is in a nonCSV, fixed row format, to something that R could load quickly and easily in chunks - sort of like pages perhaps. I might be able to do this with some SQL server, but I'm unsure how well this works out with the constant conversion, and I feel there might be a better approach, since I am particularly interested in speed, as I will have to go through several iterations with this data, and speed counts. I was hoping someone much more experienced than I might have a good answer since there's a lot out there. Any advice would be very much appreciated. [[alternative HTML version deleted]]
What do you hope to do with this data while it is in R? E.g., do you want to plot it, fit a model to it, to select a few rows or columns from it, sort it, summarize lots of small subsets of it, or something else? Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Dec 28, 2015 at 1:39 PM, Mark Finkelstein <finkel.mark at gmail.com> wrote:> The problem is common, I have 100GB of data, but only 8GB of RAM. I was > thinking of transforming the 100GB of data, which right now is in a nonCSV, > fixed row format, to something that R could load quickly and easily in > chunks - sort of like pages perhaps. > > I might be able to do this with some SQL server, but I'm unsure how well > this works out with the constant conversion, and I feel there might be a > better approach, since I am particularly interested in speed, as I will > have to go through several iterations with this data, and speed counts. > > I was hoping someone much more experienced than I might have a good answer > since there's a lot out there. > > Any advice would be very much appreciated. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Have you looked at the High Performance Computing Task View on CRAN? Whatever you do, keep in mind that the algorithms you intend to apply will have a strong impact on which data management approach is going to work best. Start small before diving in with all your data, and try successively larger amounts of data to help extrapolate weekday will happen when you process the whole data set. In addition, if you do use SQL, keep in mind that your table schema and index selection can make or break your project (but this is not a SQL support forum). -- Sent from my phone. Please excuse my brevity. On December 28, 2015 1:39:00 PM PST, Mark Finkelstein <finkel.mark at gmail.com> wrote:>The problem is common, I have 100GB of data, but only 8GB of RAM. I was >thinking of transforming the 100GB of data, which right now is in a >nonCSV, >fixed row format, to something that R could load quickly and easily in >chunks - sort of like pages perhaps. > >I might be able to do this with some SQL server, but I'm unsure how >well >this works out with the constant conversion, and I feel there might be >a >better approach, since I am particularly interested in speed, as I will >have to go through several iterations with this data, and speed counts. > >I was hoping someone much more experienced than I might have a good >answer >since there's a lot out there. > >Any advice would be very much appreciated. > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]