Hello, I am a new R user and have two datasets that I would like to analyze. The first is (2409222 x 17) and the other is (21682998 x 17). Is this possible in R? If not then what is the maximum number of rows and columns or number of elements that R can handle? Thanks in advance, Barry _________________________ Barry Baker, Ph.D. Global Climate Change Initiative The Nature Conservancy 2424 Spruce St., Suite 100 Boulder, CO 80302 Tel: (303)-541-0322 Fax: (303)-449-4328 http://nature.org/tncscience/scientists/misc/baker.html
Barry Baker wrote:> Hello, > > I am a new R user and have two datasets that I would like to analyze. The > first is (2409222 x 17) and the other is (21682998 x 17). Is this possible > in R? If not then what is the maximum number of rows and columns or number > of elements that R can handle?The number of columns and rows is not a problem here, but you will need 21682998 * 17 * 4 bytes to store the latter matrix (assuming floats) in memory, that is 1406.139 Mb. In order to do something sensible with the data, you need *at least* twice the amount of RAM, hence at least 3Gb. Uwe Ligges> Thanks in advance, > Barry > _________________________ > Barry Baker, Ph.D. > Global Climate Change Initiative > The Nature Conservancy > 2424 Spruce St., Suite 100 > Boulder, CO 80302 > > Tel: (303)-541-0322 > Fax: (303)-449-4328 > > http://nature.org/tncscience/scientists/misc/baker.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Mon, 21 Nov 2005, Uwe Ligges wrote:> Barry Baker wrote: > >> Hello, >> >> I am a new R user and have two datasets that I would like to analyze. The >> first is (2409222 x 17) and the other is (21682998 x 17). Is this possible >> in R? If not then what is the maximum number of rows and columns or number >> of elements that R can handle? > > > The number of columns and rows is not a problem here, but you will need > 21682998 * 17 * 4 bytes to store the latter matrix (assuming floats) in > memory, that is 1406.139 Mb.R does not use floats internally. So unless these are integers/logicals you are going to need twice that,> In order to do something sensible with the data, you need *at least* > twice the amount of RAM, hence at least 3Gb.Here I think the issue is rather virtual memory and address space. You will need a 64-bit OS to do anything with this object. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
What do you want to do with these large matrices? Both "scan" and "read.table" allow you to skip a certain number of lines at the beginning of a file and process however many lines you want from that point. I recently had large files that were too big for S-Plus 6. I moved to R, and processed them as submatrices without a problem. I typically use "readLines" to check the format of the first few records and "count.fields" to determine if all records have the same numbers of fields. In one case recently, I had a file that was almost but not quite regular. I processed the file in pieces, carefully examining records right before and after each change in the number of records, and recovered basically everything without going back to my client (through several layers of bureaucracy) to ask for their help in parsing that file. I frequently use a construct like the following: File. <- ".....<filename>" readLines(File., 9) # to check the format including the "sep" character quantile(nFlds <- count.fields(File., sep="\t")) #or sep="," for csv # If the file honestly has a fixed number of fields, # this will show that. # If not, either the "sep" character is wrong or the file has problems. # In either case, this helps me plan what to do next. hope this helps. spencer graves Prof Brian Ripley wrote:> On Mon, 21 Nov 2005, Uwe Ligges wrote: > > >>Barry Baker wrote: >> >> >>>Hello, >>> >>>I am a new R user and have two datasets that I would like to analyze. The >>>first is (2409222 x 17) and the other is (21682998 x 17). Is this possible >>>in R? If not then what is the maximum number of rows and columns or number >>>of elements that R can handle? >> >> >>The number of columns and rows is not a problem here, but you will need >>21682998 * 17 * 4 bytes to store the latter matrix (assuming floats) in >>memory, that is 1406.139 Mb. > > > R does not use floats internally. So unless these are integers/logicals > you are going to need twice that, > > >>In order to do something sensible with the data, you need *at least* >>twice the amount of RAM, hence at least 3Gb. > > > Here I think the issue is rather virtual memory and address space. You > will need a 64-bit OS to do anything with this object. >-- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA spencer.graves at pdf.com www.pdf.com <http://www.pdf.com> Tel: 408-938-4420 Fax: 408-280-7915