Dear all, I'm trying to process HUGE datasets with R. It's very fast, but I would like to optimize it a bit more, by focusing one one column at time..... say file is 1GB big and has 100 columns..... In order to prevent "out of memory" problems.... I need to load one column at the time.... the only problem is that read.table doesn't support this feature.... Is there some thick which will do the magic? Thank you in advance. -- This e-mail and any files transmitted with it are confid...{{dropped:14}}
Gabor Grothendieck
2008-Jan-14 23:04 UTC
[R] Loading only particular columns from csv file...
See the colClasses= argument of read.table where you can use "NULL". On Jan 14, 2008 6:02 PM, Marko Milicic <milicic.marko at gmail.com> wrote:> Dear all, > > I'm trying to process HUGE datasets with R. It's very fast, but I would like > to optimize it a bit more, by focusing one one column at time..... say file > is 1GB big and has 100 columns..... In order to prevent "out of memory" > problems.... I need to load one column at the time.... the only problem is > that read.table doesn't support this feature.... > > > Is there some thick which will do the magic? > > > Thank you in advance. > > -- > This e-mail and any files transmitted with it are confid...{{dropped:14}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Charles C. Berry
2008-Jan-15 00:58 UTC
[R] Loading only particular columns from csv file...
On Mon, 14 Jan 2008, Marko Milicic wrote:> Dear all, > > I'm trying to process HUGE datasets with R. It's very fast, but I would like > to optimize it a bit more, by focusing one one column at time..... say file > is 1GB big and has 100 columns..... In order to prevent "out of memory" > problems.... I need to load one column at the time.... the only problem is > that read.table doesn't support this feature.... > > > Is there some thick which will do the magic?There is a unix utility called 'cut' that enables stuff like columns.1.3.5.to.7 <- read.table( pipe( "cut -f1,3,5-7 myfile" ) ) and if you have numeric data only, using scan() directly will save some space. HTH, Chuck> > > Thank you in advance. > > -- > This e-mail and any files transmitted with it are confid...{{dropped:14}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901