Dear R people, I have a very big tab-delim txt file with header and I only want to import several columns into R. I checked the options for "read.table" and only found "nrows" which lets you specify the maximum number of rows to read in. Although I can use some text editors (e.g., wordpad) to edit the txt file first before running R, I feel it?s not very convenient. The reason for me to do this is that if I import the whole file into R, it will eat up too much of my system?s memory. Even after I remove it later, I still can?t release the memory. Anyone has any suggestions? Thank you very much, Frank
Thomas Lumley
2004-Aug-09 20:52 UTC
[R] How to import specific column(s) using "read.table"?
On Mon, 9 Aug 2004, F Duan wrote:> Dear R people, > > I have a very big tab-delim txt file with header and I only want to import > several columns into R. I checked the options for "read.table" and only > found "nrows" which lets you specify the maximum number of rows to read in. > Although I can use some text editors (e.g., wordpad) to edit the txt file first > before running R, I feel it?s not very convenient. The reason for me to do this > is that if I import the whole file into R, it will eat up too much of my > system?s memory. Even after I remove it later, I still can?t release the memory. >You can't avoid reading the whole file, but you can avoid having it in memory. I'll assume you know how many lines are in the file, call it N. (this isn't necessary but it is tidier) and that you are interested in columns 10 and 110, both numeric If you do something like inputfile<-file("inputfile.txt",open="r") result<-data.frame(col10=numeric(N), col110=numeric(N)) chunksize<-1000 nchunks<- ceiling(N/1000) for(i in 1:nchunks){ chunk<-read.table(inputfile,nrows=chunksize) result[ (i-1)*chunksize+ (1:chunksize),]<-chunk[,c(10,110)] } close(inputfile) you can choose the chunk size so that the memory use is not too bad. There are also more efficient ways that make you do more of the work (eg read in lines of text with readLines and use regular expressions to extract the columns you need) -thomas
Prof Brian Ripley
2004-Aug-09 20:56 UTC
[R] How to import specific column(s) using "read.table"?
There is no way for read.table to skip columns. It is however very easy to do this with a preprocessing of the table: cut, awk and perl all come to mind, and you could do it in R too, reading a block of rows at a time and writing them back out. scan() can skip columns, but I would still use preprocessing with scan. On Mon, 9 Aug 2004, F Duan wrote:> I have a very big tab-delim txt file with header and I only want to import > several columns into R. I checked the options for "read.table" and only > found "nrows" which lets you specify the maximum number of rows to read in. > Although I can use some text editors (e.g., wordpad) to edit the txt file first > before running R, I feel it?s not very convenient. The reason for me to do this > is that if I import the whole file into R, it will eat up too much of my > system?s memory. Even after I remove it later, I still can?t release the memory.The peculiar quotes suggest this is Windows -- the Rtools we use to build R there contain a cut.exe. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Gabor Grothendieck
2004-Aug-10 05:09 UTC
[R] How to import specific column(s) using "read.table"?
F Duan <f.duan <at> yale.edu> writes:> I have a very big tab-delim txt file with header and I only want to import > several columns into R. I checked the options for 伮"read.table伮" and onlyTry using scan with the what=list(...) and flush=TRUE arguments. For example, if your data looks like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 then you could read columns 2 and 4 into a list with: scan(myfile, what = list(0, NULL, 0), flush = TRUE) or read in and convert to a data frame via: do.call("cbind", scan(myfile, what = list(0, NULL, 0), flush = TRUE))
Gabor Grothendieck
2004-Aug-10 06:09 UTC
[R] How to import specific column(s) using "read.table"?
Gabor Grothendieck <ggrothendieck <at> myway.com> writes: : : F Duan <f.duan <at> yale.edu> writes: : : > I have a very big tab-delim txt file with header and I only want to import : > several columns into R. I checked the options for 伮"read.table伮" and only : : Try using scan with the what=list(...) and flush=TRUE arguments. : For example, if your data looks like this: : : 1 2 3 4 : 5 6 7 8 : 9 10 11 12 : 13 14 15 16 : : then you could read columns 2 and 4 into a list with: : oops. That should be 1 and 3. : scan(myfile, what = list(0, NULL, 0), flush = TRUE) : : or read in and convert to a data frame via: : : do.call("cbind", scan(myfile, what = list(0, NULL, 0), flush = TRUE))
Use as.list. Andy> From: F Duan > > Thanks a lot. > > Your way works perfect. And one more tiny question related to > your codes: > > My data file has many columns to be omitted (suppose the > first 20 ones), but I > found "scan(myfile, what=list(rep(NULL, 20), rep(0, 5))" > doesn't work. I had to > to type "NULL" 20 times and "0" five times in the "list(...)". > > But anyway, it works and saves a lot of memory for me. Thank > you again. > > Frank > > > Quoting Gabor Grothendieck <ggrothendieck at myway.com>: > > > Gabor Grothendieck <ggrothendieck <at> myway.com> writes: > > > > : > > : F Duan <f.duan <at> yale.edu> writes: > > : > > : > I have a very big tab-delim txt file with header and I > only want to > > import > > : > several columns into R. I checked the options for > ??"read.table??" and only > > > > : > > : Try using scan with the what=list(...) and flush=TRUE arguments. > > : For example, if your data looks like this: > > : > > : 1 2 3 4 > > : 5 6 7 8 > > : 9 10 11 12 > > : 13 14 15 16 > > : > > : then you could read columns 2 and 4 into a list with: > > : > > > > oops. That should be 1 and 3. > > > > : scan(myfile, what = list(0, NULL, 0), flush = TRUE) > > : > > : or read in and convert to a data frame via: > > : > > : do.call("cbind", scan(myfile, what = list(0, NULL, 0), > flush = TRUE)) > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >