Recently I've run into memory problems while using data.frames for a reasonably large dataset. I've solved those problems using arrays, and that has provoked me to do a few benchmarks. I would like to share the results. Let us start with the data. There are N subjects classified into G groups. These subjects are observed for T periods, and each observation consists of M variables. So, this is a standard panel. Suppose, though, that it's reasonably large, with hundreds of variables, tens of thousands of subjects, and over a decade. As I think, there are three common ways to organize such data. The first way is a single table, where each row is an observation (columns are Group, Subject, Period, plus all M variables). This is a standard way in econometrics software, let me call it the wide format. The second way is to have a separate table for data, where each row is an observation for a particular variable, i.e. the columns are Subject, Period, Variable, Value, and to have a separate table with classification of subjects into groups. This would be a standard way to organize data in a relational database (a star scheme). Finally, given that I'm talking about dense data, the data can be organized as a multidimensional array (subjects, periods, variables), plus one would need vectors with names for the elements of each of the dimensions. I did two benchmarks: 1) creating random data in the respective format, and 2) aggregating over groups. As data.table can be faster than data.frame, I've included both. Here is the source code: https://docs.google.com/uc?id=0B-uoYmSQJJvwNTdjNzljZjUtZmVhYS00ZTQ5LTgyMjEtYmJhMjg1OTBhOTU5 The results, in brief, are as follows. Long format (star scheme) is dominated by all other options w.r.t. time and memory usage (no big surprise, R is not MySQL). Concerning the wide format, data.table is faster and more memory efficient than data.frame. Finally, the wide format with a data.table and the array format are similar in execution times, but the array format requires less memory. More importantly, if I need to do aggregations over variables, then the wide.format is not that suitable anymore, whereas the array can be applied just as before. So, a data.cube package anyone? Andrei.