Hi, I very recently started using R (as in: last week) and I was wondering if anyone could point me to website(s) with sample code to deal with large datasets (length- and/or breadthwise). I understood that R was never designed to work with datasets larger than, say, a couple of hundred Mb. One way is (as I also read) to let R work in conjunction with SQL. That's one interesting approach I'd like to know more about. But I was also hoping that there also were pure R solutions for working with very large tables (was 'scan' designed for that?). In any case, a standard approach would be desirable. Thanks in advance. Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the face of ambiguity, refuse the temptation to guess. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [[alternative HTML version deleted]]
On 12/16/2009 11:59 AM, Albert-Jan Roskam wrote:> Hi, > > I very recently started using R (as in: last week) and I was wondering if anyone could point me to website(s) with sample code to deal with large datasets (length- and/or breadthwise). I understood that R was never designed to work with datasets larger than, say, a couple of hundred Mb. One way is (as I also read) to let R work in conjunction with SQL. That's one interesting approach I'd like to know more about. But I was also hoping that there also were pure R solutions for working with very large tables (was 'scan' designed for that?). In any case, a standard approach would be desirable. > > Thanks in advance. > > Cheers!! > Albert-Jan > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In the face of ambiguity, refuse the temptation to guess. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >See for example the "Large memory and out-of-memory data" section of the taskview "High-Performance and Parallel Computing with R" (pick a mirror on http://cran.r-project.org/mirrors.html then "Task Views" then "HighPerformanceComputing") Stephan
The sqldf package may be of help to you. Regards S?ren -----Oprindelig meddelelse----- Fra: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] P? vegne af Albert-Jan Roskam Sendt: 16. december 2009 11:59 Til: r-help at r-project.org Emne: [R] R & very large files Hi, ? I very recently started using R (as in: last week) and I was wondering if anyone could point me to website(s) with sample code to deal with large datasets (length- and/or breadthwise). I understood that R was never designed to work with datasets larger than, say, a couple of hundred Mb. One way is (as I also read) to let R work in conjunction with SQL. That's one interesting approach I'd like to know more about. But I was also hoping that there also were pure R solutions for working with very large tables (was 'scan' designed for that?). In any case, a standard approach would be desirable. ? Thanks in advance. Cheers!! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the face of ambiguity, refuse the temptation to guess. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [[alternative HTML version deleted]]
Albert-Jan Roskam wrote:> Hi, > > I very recently started using R (as in: last week) and I was wondering if anyone could point me to website(s) with sample code to deal with large datasets (length- and/or breadthwise). I understood that R was never designed to work with datasets larger than, say, a couple of hundred Mb. One way is (as I also read) to let R work in conjunction with SQL. That's one interesting approach I'd like to know more about. But I was also hoping that there also were pure R solutions for working with very large tables (was 'scan' designed for that?). In any case, a standard approach would be desirable. >Hi Albert-Jan, If you are faced with enormous datasets, R is, in my opinion, a great tool. It only takes careful thought how to tackle an analysis if the data does not fit into your memory all at once. As you mentioned, you could put your data in a database and extract subsets to do the analysis, lateron combining the results. You could sparsely sample your data. But without you specifying exactly what it is you want to do, it is impossible for us to give you any specific advice. Please review the posting guide for some hints as to what kind of information you can provide us with. cheers, Paul> > Thanks in advance. > > Cheers!! > Albert-Jan > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > In the face of ambiguity, refuse the temptation to guess. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > [[alternative HTML version deleted]] > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul