thr3ads.net - R help - [R] Statistical analysis of huge datasets. [Aug 2003]

If this information is useful, please help other people find it:
Share via:

Carlos J. Gil Bellosta

2003-Aug-07 17:20 UTC

[R] Statistical analysis of huge datasets.

Dear R-users,

I am faced with the problem of analyzing a huge dataset (+ 2 million 
records, +150 variables) which does not fit into memory. I would like to 
know if there are pre-packaged tools (in the spirit of Insigthful 
I-Miner, for instance) aimed at subsampling or splitting the dataset 
into data-frameable subdatasets, applying functions record-wise, etc.

Thank you very much for your help.

Carlos J. Gil Bellosta
Sigma Consultores Estad?sticos
http://www.consultoresestadisticos.com

TyagiAnupam@aol.com

2003-Aug-07 17:39 UTC

head link

[R] Statistical analysis of huge datasets.

One possibility is to use a DBMS like MySQL or Postgresql, and RODBC to 
connect to these. Search the archives for previous postings about these, have a 
look at the first R-Newsletter and at Data Import-Export manual.

In a message dated 8/7/03 1:26:20 PM Eastern Daylight Time, 
sigma@consultoresestadisticos.com writes:
> Dear R-users,
> 
> I am faced with the problem of analyzing a huge dataset (+ 2 million 
> records, +150 variables) which does not fit into memory. I would like to 
> know if there are pre-packaged tools (in the spirit of Insigthful 
> I-Miner, for instance) aimed at subsampling or splitting the dataset 
> into data-frameable subdatasets, applying functions record-wise, etc.
> 
> Thank you very much for your help.
> 
> Carlos J. Gil Bellosta
> Sigma Consultores Estadísticos
> http://www.consultoresestadisticos.com
> 	[[alternative HTML version deleted]]

Joe Conway

2003-Aug-07 17:56 UTC

head link

[R] Statistical analysis of huge datasets.

TyagiAnupam at aol.com wrote:> One possibility is to use a DBMS like MySQL or Postgresql, and RODBC to 
> connect to these. Search the archives for previous postings about these,
have a
> look at the first R-Newsletter and at Data Import-Export manual.
> 
If you use PostgreSQL, you might want to try PL/R; see:
   http://www.joeconway.com/plr/

It allows your R functions to run inside the backend database process, 
minimizing data I/O.

HTH,

Joe

Possibly Parallel Threads

Search for more reasonably related threads

R help - Aug 2003 - Statistical analysis of huge datasets.

[R] Statistical analysis of huge datasets.

[R] Statistical analysis of huge datasets.

[R] Statistical analysis of huge datasets.

Possibly Parallel Threads