Dear Sir/Madam, I would like to know what is the maximum number of observations a single file must have when using R. I am asking this because am trying to do research on banking transactions and i have around 49million records. Can R handle this? Advise with regard to this. Mark Nasila Quantitative Analyst CBS Risk Management Personal Banking 7th Floor, 2 First Place, Cnr Jeppe and Simmonds Street, Johannesburg, 2000 Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 e-mail MNasila@fnb.co.za <mailto:MNasila@fnb.co.za> www.fnb.co.za <http://www.fnb.co.za/> www.howcanwehelpyou.co.za <http://www.howcanwehelpyou.co.za/> First National Bank - a division of FirstRand Bank Limited. An Authorised Financial Services and Credit Provider (NCRCP20). 'Consider the effect on the environment before printing this email.' To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclaimer@fnb.co.za and we will send you a copy of the Disclaimer. [[alternative HTML version deleted]]
On 02/17/2011 10:16 AM, Nasila, Mark wrote:> Dear Sir/Madam, > > > > I would like to know what is the maximum number of observations a > single file must have when using R. I am asking this because am tryingDear Mark,> to do research on banking transactions and i have around 49million > records. Can R handle this? Advise with regard to this.I think R can address up to a length of 2^32 ? 4.3e9 elements. 2^32 elements (numeric) = 32 GB per vector (matrix, array). For me, the available RAM is the more important limit: I work without problem with (numeric) matrices of size 2e5 x 250 = 5e7 elements (380 MB) that were produced from 5e4 x 2500 = 1.25e8 elements (? 1GB) raw data. The raw data is the practical limit on my 8 GB (64 bit linux) machine: During the processing it becomes complex, thus ? 2 GB, and with that I had to be very careful not to copy the matrix too often. This and a bunch of gc() calls let me process the data without swapping. :-) Note that 2 GB corresponds quite nicely to the rule of thumb that the end of fun is reached with variable sizes of 1/3 of the RAM. If you are concerned about your data set, I'd recommend reading a fraction of the data set and have a look at the object.size() and also on how the RAM use is during data analysis of that partial data set. Then extrapolate to the complete data set. HTH Claudia> > > > > > > > > > > Mark Nasila > Quantitative Analyst > CBS Risk Management > > Personal Banking > 7th Floor, 2 First Place, > Cnr Jeppe and Simmonds Street, > Johannesburg, > 2000 > Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 > e-mail MNasila at fnb.co.za<mailto:MNasila at fnb.co.za> > > www.fnb.co.za<http://www.fnb.co.za/> www.howcanwehelpyou.co.za > <http://www.howcanwehelpyou.co.za/> > > First National Bank - a division of FirstRand Bank Limited. > An Authorised Financial Services and Credit Provider (NCRCP20). > > 'Consider the effect on the environment before printing this email.' > > > > > To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: > https://www.fnb.co.za/disclaimer.html > > If you are unable to access the Disclaimer, send a blank e-mail to > firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Universit? degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbeleites at units.it
On Thu, 17 Feb 2011, Nasila, Mark wrote:> Dear Sir/Madam, > > > > I would like to know what is the maximum number of observations a > single file must have when using R. I am asking this because am trying > to do research on banking transactions and i have around 49million > records. Can R handle this? Advise with regard to this.Depends on the platform and how many fields there are in a record. (On a 64-bit platform we have handled databases of 70m records and about 30 fields: we did use a DBMS to store them, though: see the 'R Data Import/Export Manual'.) OTOH, one could ask what extra useful information there is in 49m records over a 1% sample. (In our case it was rare combinations, and we simply extracted those separately from the DBMS.)> Mark Nasila > Quantitative Analyst > CBS Risk Management > > Personal Banking > 7th Floor, 2 First Place, > Cnr Jeppe and Simmonds Street, > Johannesburg, > 2000 > Tel (011) 371-2406, Fax (011) 352-9812, Cell 083 317 0118 > e-mail MNasila at fnb.co.za <mailto:MNasila at fnb.co.za> > > www.fnb.co.za <http://www.fnb.co.za/> www.howcanwehelpyou.co.za > <http://www.howcanwehelpyou.co.za/> > > First National Bank - a division of FirstRand Bank Limited. > An Authorised Financial Services and Credit Provider (NCRCP20). > > 'Consider the effect on the environment before printing this email.' > > > > > To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: > https://www.fnb.co.za/disclaimer.html > > If you are unable to access the Disclaimer, send a blank e-mail to > firstrandbankdisclaimer at fnb.co.za and we will send you a copy of the Disclaimer. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595