Dear R-users, I am faced with the problem of analyzing a huge dataset (+ 2 million records, +150 variables) which does not fit into memory. I would like to know if there are pre-packaged tools (in the spirit of Insigthful I-Miner, for instance) aimed at subsampling or splitting the dataset into data-frameable subdatasets, applying functions record-wise, etc. Thank you very much for your help. Carlos J. Gil Bellosta Sigma Consultores Estad?sticos http://www.consultoresestadisticos.com
One possibility is to use a DBMS like MySQL or Postgresql, and RODBC to connect to these. Search the archives for previous postings about these, have a look at the first R-Newsletter and at Data Import-Export manual. In a message dated 8/7/03 1:26:20 PM Eastern Daylight Time, sigma@consultoresestadisticos.com writes:> Dear R-users, > > I am faced with the problem of analyzing a huge dataset (+ 2 million > records, +150 variables) which does not fit into memory. I would like to > know if there are pre-packaged tools (in the spirit of Insigthful > I-Miner, for instance) aimed at subsampling or splitting the dataset > into data-frameable subdatasets, applying functions record-wise, etc. > > Thank you very much for your help. > > Carlos J. Gil Bellosta > Sigma Consultores Estadísticos > http://www.consultoresestadisticos.com >[[alternative HTML version deleted]]
TyagiAnupam at aol.com wrote:> One possibility is to use a DBMS like MySQL or Postgresql, and RODBC to > connect to these. Search the archives for previous postings about these, have a > look at the first R-Newsletter and at Data Import-Export manual. >If you use PostgreSQL, you might want to try PL/R; see: http://www.joeconway.com/plr/ It allows your R functions to run inside the backend database process, minimizing data I/O. HTH, Joe