Peng Yu
2009-Dec-04 02:34 UTC
[R] How to exclude lines that match certain regex when using read.table?
I'm thinking of using external program 'grep' and pipe() to do so. But I'm wondering if there is a more efficient way to do so purely in R
Sharpie
2009-Dec-04 03:09 UTC
[R] How to exclude lines that match certain regex when using read.table?
pengyu.ut wrote:> > I'm thinking of using external program 'grep' and pipe() to do so. But > I'm wondering if there is a more efficient way to do so purely in R >I would just suck the whole table in using read.table(), locate the lines that I don't want using apply() and grepl() and then reduce the data set: dataSet <- read.table( "someData.txt" ) dataToDrop <- apply( dataSet, 1, function( row ){ return( any( grepl( "regex", row ) ) ) }) dataSet <- subset( dataSet, !dataToDrop ) Since this solution executes entirely in R without resorting to system() calls, it should be portable between platforms. -Charlie -- View this message in context: http://n4.nabble.com/How-to-exclude-lines-that-match-certain-regex-when-using-read-table-tp948207p948221.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2009-Dec-04 03:22 UTC
[R] How to exclude lines that match certain regex when using read.table?
Using grep in a pipeline is pretty fast so if that is workable that is probably the way to go; however, one other possibility is to use read.csv.sql from the sqldf package. read.csv.sql allows you to specify an sql statement that it will use to filter the data. It reads the data into a temporary sqlite database (which it automatically sets up for you and uses sqlite, not R, to do that) and then applies the sql statement reading the presumably much smaller result into R and finally automatically destroy the temporary database. Whether that is faster or slower than the alternatives could easily be tested as read.csv.sql takes only one line of code. See the examples at http://sqldf.googlecode.com On Thu, Dec 3, 2009 at 9:34 PM, Peng Yu <pengyu.ut@gmail.com> wrote:> I'm thinking of using external program 'grep' and pipe() to do so. But > I'm wondering if there is a more efficient way to do so purely in R > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]