Hi, I had a large file for which I require a subset of rows. Instead of reading it all into memory, I use the awk command to get the relevant rows. However, I'm doing it pretty inefficiently as I write the subset to disk, before reading it into R. Is there a way that I can read it into an R object without writing to disk? For example, this is what I do currently: ## write test sample file mat1 <- matrix(sample(1:100,16),8,2) fname1 <- 'temp1.txt' fname2 <- 'temp2.txt' write.table(mat1,fname1,sep='\t',row.names=F,col.names=F) ## Read a subset of rows, write to file, and read from file system(paste("awk '(NR > 1 && NR < 4) {print $0}' ",fname1," > ",fname2,sep='')) mat2 <- read.table(fname2,sep='\t') print(mat2) ##### Is there a way that I can skip writing to disk? thanks! [[alternative HTML version deleted]]
On Mon, Oct 17, 2011 at 9:23 AM, Brian Smith <bsmith030465 at gmail.com> wrote:> Hi, > > I had a large file for which I require a subset of rows. Instead of reading > it all into memory, I use the awk command to get the relevant rows. However, > I'm doing it pretty inefficiently as I write the subset to disk, before > reading it into R. Is there a way that I can read it into an R object > without writing to disk? For example, this is what I do currently: > > ## write test sample file > mat1 <- matrix(sample(1:100,16),8,2) > fname1 <- 'temp1.txt' > fname2 <- 'temp2.txt' > write.table(mat1,fname1,sep='\t',row.names=F,col.names=F) > > ## Read a subset of rows, write to file, and read from file > system(paste("awk '(NR > 1 && NR < 4) {print $0}' ",fname1," > > ",fname2,sep='')) > mat2 <- read.table(fname2,sep='\t') > > print(mat2) > ##### > > Is there a way that I can skip writing to disk? >See: http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Mon, 17 Oct 2011, Brian Smith wrote:> Hi, > > I had a large file for which I require a subset of rows. Instead of reading > it all into memory, I use the awk command to get the relevant rows. However, > I'm doing it pretty inefficiently as I write the subset to disk, before > reading it into R. Is there a way that I can read it into an R object > without writing to disk? For example, this is what I do currently: > > ## write test sample file > mat1 <- matrix(sample(1:100,16),8,2) > fname1 <- 'temp1.txt' > fname2 <- 'temp2.txt' > write.table(mat1,fname1,sep='\t',row.names=F,col.names=F) > > ## Read a subset of rows, write to file, and read from file > system(paste("awk '(NR > 1 && NR < 4) {print $0}' ",fname1," > > ",fname2,sep='')) > mat2 <- read.table(fname2,sep='\t') > > print(mat2) > ##### > > Is there a way that I can skip writing to disk?Use a pipe() connection.> > thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595