I have a file in fwf. It is rather large, about 40,000 rows and 40 variables (columns). I only need about 10 variables form the data set for the analysis at hand. Unfortunately, these 10 variables are not contiguous in the file, for example, the first is position 1-8, the next position 25-27, then 40. Is there a way to read the selected varaibles that I need without reading in the entire data set? Thanks, Brett -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 3 May 2002, Brett A Magill wrote:> I have a file in fwf. It is rather large, about 40,000 rows and 40 variables (columns). I only need about 10 variables form the data set for the analysis at hand. Unfortunately, these 10 variables are not contiguous in the file, for example, the first is position 1-8, the next position 25-27, then 40. > > Is there a way to read the selected varaibles that I need without reading in the entire data set?No. The first thing read.fwf does is read the whole dataset as a character vector. It's not designed for large files, but it may well read all the vars in your problem if you have enough memory: the file only seems to be of the order of 20Mb. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
If you are in a Unix-like environment, you can use awk to preselect the columns you want as follows: read.table(pipe("awk '{print $1,$3,$4,$5,$6,$7,$8,$25,$26,$27,$40}' fwf")) Cheers, Pierre Brett A Magill wrote:> I have a file in fwf. It is rather large, about 40,000 rows and 40 variables (columns). I only need about 10 variables form the data set for the analysis at hand. Unfortunately, these 10 variables are not contiguous in the file, for example, the first is position 1-8, the next position 25-27, then 40. > > Is there a way to read the selected varaibles that I need without reading in the entire data set? > > Thanks, > > Brett > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- ----------------------------------------------------------------- Pierre Kleiber Email: pkleiber at honlab.nmfs.hawaii.edu Fishery Biologist Tel: 808 983-5399/737-7544 NOAA FISHERIES - Honolulu Laboratory Fax: 808 983-2902 2570 Dole St., Honolulu, HI 96822-2396 ----------------------------------------------------------------- "God could have told Moses about galaxies and mitochondria and all. But behold... It was good enough for government work." ----------------------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
While at least two people had already responded to your question and probably you may have received other tips on this topic personally, this e-mail intends to have two of these other tips get into the help archive for future reference. 1. cut command and read.fwf This is a once-and-for-all (preprocessing the file before R) approach: if you handle the file on some Linux/Unix-type platform, you can extract the necessary columns from the data file and save them as a new file by using 'cut' command with -b option and '> a_new_file_name' [redirection] ; for MS windows, cut.exe can be found within the tools.zip available at http://www.stats.ox.ac.uk/pub/Rtools/. 2. customize read.fwf As Professor Ripley mentioned in his e-mail, read.fwf isn't suitable for large files; however, if you stick to solutions within R, one approach is to customize read.fwf that is just a wrapping function of scan, read.table, and others. As you can see, read.fwf internally calculates first and last positions to cut/save relevant columns [i.e., an internal function, doone]; therefore, instead of the argument widths, these first/last positions can be externally given to the function as new argument(s) of your customized read.fwf. Cheers, Akio Sone Harvard-MIT data center> -----Original Message----- > From: owner-r-help at stat.math.ethz.ch > [mailto:owner-r-help at stat.math.ethz.ch] On Behalf Of Brett A Magill > Sent: Friday, May 03, 2002 12:54 PM > To: r-help at stat.math.ethz.ch > Subject: [R] skipping columns with read.fwf? > > > I have a file in fwf. It is rather large, about 40,000 rows > and 40 variables (columns). I only need about 10 variables > form the data set for the analysis at hand. Unfortunately, > these 10 variables are not contiguous in the file, for > example, the first is position 1-8, the next position 25-27, then 40. > > Is there a way to read the selected varaibles that I need > without reading in the entire data set? > > Thanks, > > Brett > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-> FAQ.html > Send "info", > "help", or "[un]subscribe" > (in the > "body", not the subject !) To: > r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._._._._._._._ > o-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._