thr3ads.net - R help - [R] big data [Sep 2010]

If this information is useful, please help other people find it:
Share via:

André de Boer

2010-Sep-08 11:26 UTC

[R] big data

Hello,

I searched the internet but i didn't find the answer for the next problem:
I want to do a glm on a csv file consisting of 25 columns and 4 mln rows.
Not all the columns are relevant. My problem is to read the data into R.
Manipulate the data and then do a glm.

I've tried with:

dd<-scan("myfile.csv",colClasses=classes)
dat<-as.data.frame(dd)

My question is: what is the right way to do is?
Can someone give me a hint?

Thanks,
Arend

	[[alternative HTML version deleted]]

Dirk Eddelbuettel

2010-Sep-08 12:30 UTC

head link

[R] big data

On 8 September 2010 at 13:26, Andr? de Boer wrote:
| I searched the internet but i didn't find the answer for the next problem:
| I want to do a glm on a csv file consisting of 25 columns and 4 mln rows.
| Not all the columns are relevant. My problem is to read the data into R.
| Manipulate the data and then do a glm.
| 
| I've tried with:
| 
| dd<-scan("myfile.csv",colClasses=classes)
| dat<-as.data.frame(dd)
| 
| My question is: what is the right way to do is?
| Can someone give me a hint?

Look at the biglm package by Thomas Lumley which will allow you to fit glm
models in "chunks".  

Dirk

-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com

Greg Snow

2010-Sep-08 17:05 UTC

head link

[R] big data

In addition to Dirks advice about the biglm package, you may also want to look
at the RSQLite and SQLiteDF packages which may make dealing with the large
dataset faster and easier, especially for passing the chunks to bigglm.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Andr? de Boer
> Sent: Wednesday, September 08, 2010 5:27 AM
> To: r-help at r-project.org
> Subject: [R] big data
> 
> Hello,
> 
> I searched the internet but i didn't find the answer for the next
> problem:
> I want to do a glm on a csv file consisting of 25 columns and 4 mln
> rows.
> Not all the columns are relevant. My problem is to read the data into
> R.
> Manipulate the data and then do a glm.
> 
> I've tried with:
> 
> dd<-scan("myfile.csv",colClasses=classes)
> dat<-as.data.frame(dd)
> 
> My question is: what is the right way to do is?
> Can someone give me a hint?
> 
> Thanks,
> Arend
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more reasonably related threads

R help - Sep 2010 - big data

[R] big data

[R] big data

[R] big data

Reasonably Related Threads