Hi, I have been enjoying r for some time now, but was wondering about working with larger data files. When I try to load in big files with more than 20,000 records, the programs seems unbable to store all the records. Is there some way that I can increase the size of records that I work with? Ideally I would like to work with census data which can hold a million records. Greg
On Mon, 4 Oct 2004, Greg Butler wrote:> Hi, > > I have been enjoying r for some time now, but was wondering about working > with larger data files. When I try to load in big files with more than > 20,000 records, the programs seems unbable to store all the records. Is > there some way that I can increase the size of records that I work with? > Ideally I would like to work with census data which can hold a million > records. >You should be able to handle 20,000 records on a reasonable computer (my laptop, with 256Mb memory can, very slowly, do survey analyses on a file with 26,000 records and about 100 variables). A million records is likely to be infeasible. A 32bit computer can't even address enough memory to store that much data. You would need to put the data either in a database or in a file format such as netCDF or hdf5 that allows smaller chunks to be read and processed. -thomas
You will need to be more specific (and careful). You talk about large datafiles increase the size of records so is the problem 1) The size of the files 2) The size of the records 3) Storage for the R objects you are creating, in some unstated way 4) Something else ? Without more details, the most helpful advice we can give is A) Use a 64-bit Linux/Unix OS (e.g. AMD64/Opteron) B) Add as much RAM as you can afford/fit. On Mon, 4 Oct 2004, Greg Butler wrote:> Hi, > > I have been enjoying r for some time now, but was wondering about working > with larger data files. When I try to load in big files with more than > 20,000 records, the programs seems unbable to store all the records. Is > there some way that I can increase the size of records that I work with? > Ideally I would like to work with census data which can hold a million > records.> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlPlease do, and tell us the OS you are using and resources you have (e.g. RAM). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Out of the box R keeps everything in memory. 1 million wide records could easily take all your RAM. What do you want to do with all the data at once? Some suggestions (not original by any means) 1) read the data via the "connection" functions, which would allow you to for example keep the data gzipped (help(gzfile)) and read chunks at a time, e.g., in order to 2) sample 3) if you really need more or less random access to records, look into the database access packages for postgres or oracle etc, or have a look at the RObjectTables package from Omegahat (I don't have experience with it yet). 4) I wrote some R functions to "stash" objects to disk so they're still "there" just like any R object but don't use RAM. Each access reads the whole object, though, and each write writes the whole object, so it's not at all suited to random access. Let me know if it would help. Reid Huntsinger -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Butler Sent: Monday, October 04, 2004 10:49 AM To: R-help at stat.math.ethz.ch Subject: [R] Working with large datafiles Hi, I have been enjoying r for some time now, but was wondering about working with larger data files. When I try to load in big files with more than 20,000 records, the programs seems unbable to store all the records. Is there some way that I can increase the size of records that I work with? Ideally I would like to work with census data which can hold a million records. Greg ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html