Dear R'sians.. I apologize if this topic has been beaten to death and hope that hawks don't pounce on me! Could you please suggest an efficient way to filter rows from 500+ text files (some with 30000+ rows with multiple section table headers) residing in several folders? I guess probably "scan" is the fastest way to scan a file, but, I noticed it sometimes takes a long time when reading large text files. Would really appreciate your suggestions. Regards, Santosh [[alternative HTML version deleted]]
On Sep 14, 2009, at 6:25 PM, Santosh wrote:> Dear R'sians.. > I apologize if this topic has been beaten to death and hope that > hawks don't > pounce on me! > > Could you please suggest an efficient way to filter rows from 500+ > text > files (some with 30000+ rows with multiple section table headers) > residing > in several folders? I guess probably "scan" is the fastest way to > scan a > file, but, I noticed it sometimes takes a long time when reading > large text > files.scan would attempt to parse the files while you could create a long character vector composed of entire lines with readLines.>David Winsemius, MD Heritage Laboratories West Hartford, CT
Check out ?read.csv.sql in the sqldf package. A single read.csv.sql statement will: - read the data directly into sqlite without going through R - set up the database for you - filter the data using an sql statement of your choice - read the filtered data into R and - delete the database it created so that the only I/O that involves R is reading the much smaller filtered data into R. Performance will depend on the specifics of your data and what you want to do but its easy enough to try. There are further examples on the home page: http://sqldf.googlecode.com On Mon, Sep 14, 2009 at 6:25 PM, Santosh <santosh2005 at gmail.com> wrote:> Dear R'sians.. > I apologize if this topic has been beaten to death and hope that hawks don't > pounce on me! > > Could you please suggest an efficient way to filter rows from 500+ text > files (some with 30000+ rows with multiple section table headers) residing > in several folders? ?I guess probably "scan" is the fastest way to scan a > file, but, I noticed it sometimes takes a long time when reading large text > files. > > Would really appreciate your suggestions. > > Regards, > Santosh