Hi, I need to analyze data that has 3.5 million observations and about 60 variables and I was planning on using R to do this but I can't even seem to read in the data. It just freezes and ties up the whole system -- and this is on a Linux box purchased about 6 months ago on a dual-processor PC that was pretty much the top of the line. I've tried expanding R the memory limits but it doesn't help. I'll be hugely disappointed if I can't use R b/c I need to do build tailor-made models (multilevel and other complexities). My fall-back is the SPlus big data package but I'd rather avoid if anyone can provide a solution.... Thanks!!!! Jennifer Hill
Jennifer, it sounds like that's too much data for R to hold in your computer's RAM. You should give serious consideration as to whether you need all those data for the models that you're fitting, and if so, whether you need to do them all at once. If not, think about pre-processing steps, using e.g. SQL command, to pull out the data that you need. For example, if the data are spatial, then think about analyzing them by patches. Good luck, Andrew On Sun, Jul 02, 2006 at 10:12:25AM -0400, JENNIFER HILL wrote:> > Hi, I need to analyze data that has 3.5 million observations and > about 60 variables and I was planning on using R to do this but > I can't even seem to read in the data. It just freezes and ties > up the whole system -- and this is on a Linux box purchased about > 6 months ago on a dual-processor PC that was pretty much the top > of the line. I've tried expanding R the memory limits but it > doesn't help. I'll be hugely disappointed if I can't use R b/c > I need to do build tailor-made models (multilevel and other > complexities). My fall-back is the SPlus big data package but > I'd rather avoid if anyone can provide a solution.... > > Thanks!!!! > > Jennifer Hill > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: a.robinson at ms.unimelb.edu.au http://www.ms.unimelb.edu.au
JENNIFER HILL <jh1030 <at> columbia.edu> writes:> > > Hi, I need to analyze data that has 3.5 million observations and > about 60 variables and I was planning on using R to do this but > I can't even seem to read in the data. It just freezes and ties > up the whole system -- and this is on a Linux box purchased about > 6 months ago on a dual-processor PC that was pretty much the top > of the line. I've tried expanding R the memory limits but it > doesn't help. I'll be hugely disappointed if I can't use R b/c > I need to do build tailor-made models (multilevel and other > complexities). My fall-back is the SPlus big data package but > I'd rather avoid if anyone can provide a solution.... > > Thanks!!!! > > Jennifer Hill >Dear Jennifer, you may want to look at the R newsletters. A few years ago it had an article on using DBMS with R, like MySQL, Oracle, etc. This is a frequently asked question: There are also some posts over the past few years that may be helpful. I have successfully read large database into MySQL, and accessed it from R---it was larger than your database. I hope that helps. Anupam Tyagi.
Jennifer, we had a little discussion about this topic last May when I had a similar problem. It is archived at http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76401.html You can follow the thread to see the various arguments and solutions. I tried to summarize the possible suggested approachs at http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76583.html HTH, Rogerio Porto. ---------- Cabe?alho original ----------- De: r-help-bounces at stat.math.ethz.ch Para: r-help at stat.math.ethz.ch C?pia: Data: Sun, 2 Jul 2006 10:12:25 -0400 (EDT) Assunto: [R] large dataset!> > Hi, I need to analyze data that has 3.5 million observations and > about 60 variables and I was planning on using R to do this but > I can't even seem to read in the data. It just freezes and ties > up the whole system -- and this is on a Linux box purchased about > 6 months ago on a dual-processor PC that was pretty much the top > of the line. I've tried expanding R the memory limits but it > doesn't help. I'll be hugely disappointed if I can't use R b/c > I need to do build tailor-made models (multilevel and other > complexities). My fall-back is the SPlus big data package but > I'd rather avoid if anyone can provide a solution.... > > Thanks!!!! > > Jennifer Hill > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >