Dear, I'm doing analysis where I need to work on relatively large (50-60 MB) text files, though I'm really interested only in parts with binary variables (named indicators1, indicators2, ... etc.) Every text file contains other numeric columns, but not always the same and not always in the same order - therefore I would rather need a method connecting to file and reading only colums with respect to name pattern (ie indicators + number). That should speed things up (now I have to clean data by hand) but also leave less memory footprint. Could You point me towards sth?
?file - how to use connections ?read.table 'skip' parameter, colClasses to only read columns you want That is not a large file. Read the whole thing in and then extract the data you need. On Tue, Nov 23, 2010 at 6:05 AM, fbielejec <fbielejec at gmail.com> wrote:> Dear, > > I'm doing analysis where I need to work on relatively large (50-60 MB) > text files, though I'm really interested only in parts with binary > variables (named indicators1, indicators2, ... etc.) > > Every text file contains other numeric columns, but not always the same > and not always in the same order - therefore I would rather need a > method connecting to file and reading only colums with respect to name > pattern (ie indicators + number). That should speed things up (now I > have to clean data by hand) but also leave less memory footprint. Could > You point me towards sth? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Tue, Nov 23, 2010 at 6:05 AM, fbielejec <fbielejec at gmail.com> wrote:> Dear, > > I'm doing analysis where I need to work on relatively large (50-60 MB) > text files, though I'm really interested only in parts with binary > variables (named indicators1, indicators2, ... etc.) > > Every text file contains other numeric columns, but not always the same > and not always in the same order - therefore I would rather need a > method connecting to file and reading only colums with respect to name > pattern (ie indicators + number). That should speed things up (now I > have to clean data by hand) but also leave less memory footprint. Could > You point me towards sth? >This is easy using read.csv.sql: library(sqldf) # create test file write.table(anscombe, "anscombe.csv", sep = ",", quote = FALSE, row.names = FALSE) # read it back but only indicated columns read.csv.sql("anscombe.csv", sql = "select x1, x2, y1, y2 from file") See ?read.csv.sql and also sqldf home page at http://sqldf.googlecode.com -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Apparently Analagous Threads
- [PATCH 08/10] virtio/s390: add indirection to indicators access
- [PATCH 08/10] virtio/s390: add indirection to indicators access
- [PATCH 08/10] virtio/s390: add indirection to indicators access
- [PATCH 08/10] virtio/s390: add indirection to indicators access
- [PATCH 08/10] virtio/s390: add indirection to indicators access