Greetings, The following question has come up in an off-list discussion. Is it possible to construct a regular expression 'rex' out of two given regular expressions 'rex1' and 'rex2', such that a character string X matches 'rex' if and only if X matches 'rex1' AND X does not match 'rex2'? The desired end result can be achieved by logically combining the results of a grep using 'rex1' with the results of a grep on 'rex2', illustrated by the following example: ## Given character vector X (below), and two regular exdpressions ## rex1="abc", rex2="ijk", to return the elements of X which match ## rex1 AND do not match rex1: X <- c( "abcdefg", # Yes "abchijk", # No "mnopqrs", # No "ijkpqrs", # No "abcpqrs" ) # Yes rex1 <- "abc" rex2 <- "ijk" ix1<- grep(rex1,X) ix2<- grep(rex2,X) X[ix1[!(ix1 %in% ix2)]] ## [1] "abcdefg" "abcpqrs" Question: is there a way to construct 'rex' from 'rex1' and 'rex2' such that X[grep(rex,X)] would given the same result? I've not managed to find anything helpful in desciptions of regular expression syntax, though one feels it should be possible if this is capable of supporting a logically complete language! With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 12-Jun-10 Time: 10:38:45 ------------------------------ XFMail ------------------------------
I think you have missed grepl(), e.g. X[grepl(rex1, X) & !grepl(rex2, X)] grepl is a fairly recent addition (2.9.0) that is used extensively in R's own text-processing operations (e.g. help files, utilities such as 'R CMD check'). On Sat, 12 Jun 2010, Ted.Harding at manchester.ac.uk wrote:> Greetings, > The following question has come up in an off-list discussion. > Is it possible to construct a regular expression 'rex' out of > two given regular expressions 'rex1' and 'rex2', such that a > character string X matches 'rex' if and only if X matches 'rex1' > AND X does not match 'rex2'?Not in general.> The desired end result can be achieved by logically combining > the results of a grep using 'rex1' with the results of a grep > on 'rex2', illustrated by the following example: > > ## Given character vector X (below), and two regular exdpressions > ## rex1="abc", rex2="ijk", to return the elements of X which match > ## rex1 AND do not match rex1: > X <- c( > "abcdefg", # Yes > "abchijk", # No > "mnopqrs", # No > "ijkpqrs", # No > "abcpqrs" ) # Yes > rex1 <- "abc" > rex2 <- "ijk" > ix1<- grep(rex1,X) > ix2<- grep(rex2,X) > X[ix1[!(ix1 %in% ix2)]] > ## [1] "abcdefg" "abcpqrs" > > Question: is there a way to construct 'rex' from 'rex1' and 'rex2' > such that > > X[grep(rex,X)] > > would given the same result? > > I've not managed to find anything helpful in desciptions of > regular expression syntax, though one feels it should be possible > if this is capable of supporting a logically complete language! > > With thanks, > Ted.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Sat, Jun 12, 2010 at 5:38 AM, Ted Harding <Ted.Harding at manchester.ac.uk> wrote:> Greetings, > The following question has come up in an off-list discussion. > Is it possible to construct a regular expression 'rex' out of > two given regular expressions 'rex1' and 'rex2', such that a > character string X matches 'rex' if and only if X matches 'rex1' > AND X does not match 'rex2'? > > The desired end result can be achieved by logically combining > the results of a grep using 'rex1' with the results of a grep > on 'rex2', illustrated by the following example: > > ## Given character vector X (below), and two regular exdpressions > ## rex1="abc", rex2="ijk", to return the elements of X which match > ## rex1 AND do not match rex1: > X <- c( > ?"abcdefg", ? ? ? # Yes > ?"abchijk", ? ? ? # No > ?"mnopqrs", ? ? ? # No > ?"ijkpqrs", ? ? ? # No > ?"abcpqrs" ) ? ? ?# Yes > rex1 <- "abc" > rex2 <- "ijk" > ix1<- grep(rex1,X) > ix2<- grep(rex2,X) > X[ix1[!(ix1 %in% ix2)]] > ## [1] "abcdefg" "abcpqrs" > > Question: is there a way to construct 'rex' from 'rex1' and 'rex2' > such that > > ?X[grep(rex,X)] > > would given the same result?Try this: rex <- "^(?!(.*ijk)).*abc" grep(rex, X, perl = TRUE) Also note that X[grep(rex, X, perl = TRUE)] can be written: grep(rex, X, perl = TRUE, value = TRUE) See ?regex for more info. Further regular expression links can be found in the External Links box on the gsubfn home page at http://gsubfn.googlecode.com