Dear all, let's assume I have a vector of character strings: x <- c("abcdef", "defabc", "qwerty") What I would like to find is the following: all elements where the word 'abc' does not appear (i.e. 3 in this case of 'x'). Since I am not really experienced with regular expressions, I started slowly and thought I find all word were 'abc' actually does appear:> grep(pattern="abc", x=x)[1] 1 2 So far, so good. Now I read that ^ is the negation operator. But it can also denote the beginning of a string as in:> grep(pattern="^abc", x=x)[1] 1 Of course, we need to put it inside square brackets to negate the expression [1]> grep(pattern="[^abc]", x=x)[1] 1 2 3 But this is not what I want either. I'd appreciate any help. I assume this is rather easy and straightforward. Thanks, Roland [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or caret) inside square brackets negates the expression.... ---------- This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
Just remove those elements that match:> x <- c("abcdef", "defabc", "qwerty") > x[-grep('abc',x)][1] "qwerty">On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland <Rau at demogr.mpg.de> wrote:> Dear all, > > let's assume I have a vector of character strings: > > x <- c("abcdef", "defabc", "qwerty") > > What I would like to find is the following: all elements where the word > 'abc' does not appear (i.e. 3 in this case of 'x'). > > Since I am not really experienced with regular expressions, I started > slowly and thought I find all word were 'abc' actually does appear: > >> grep(pattern="abc", x=x) > [1] 1 2 > > So far, so good. Now I read that ^ is the negation operator. But it can > also denote the beginning of a string as in: > >> grep(pattern="^abc", x=x) > [1] 1 > > Of course, we need to put it inside square brackets to negate the > expression [1] >> grep(pattern="[^abc]", x=x) > [1] 1 2 3 > > But this is not what I want either. > > I'd appreciate any help. I assume this is rather easy and > straightforward. > > Thanks, > Roland > > > [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or > caret) inside square brackets negates the expression.... > > ---------- > This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Rau, Roland wrote:> Dear all, > > let's assume I have a vector of character strings: > > x <- c("abcdef", "defabc", "qwerty") > > What I would like to find is the following: all elements where the word > 'abc' does not appear (i.e. 3 in this case of 'x'). >a quick shot is: x[-grep("abc", x)] which unfortunately fails if none of the strings in x matches the pattern, i.e., grep returns integer(0); arguably, x[integer(0)] should rather return all elements of x: "An empty index selects all values" (from ?'[') but apparently integer(0) does not count as an empty index (and neither does NULL). so you may want something like: strings = c("abcdef", "defabc", "qwerty") pattern = "abc" if (length(matching <- grep(pattern, strings))) x[-matching] else x vQ
Try this: # indexes setdiff(seq_along(x), grep("abc", x)) # values setdiff(x, grep("abc", x, value = TRUE)) Another possibility is: z <- "abc" x0 <- c(x, z) # to handle no match case x0[- grep(z, x0)] # values On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland <Rau at demogr.mpg.de> wrote:> Dear all, > > let's assume I have a vector of character strings: > > x <- c("abcdef", "defabc", "qwerty") > > What I would like to find is the following: all elements where the word > 'abc' does not appear (i.e. 3 in this case of 'x'). > > Since I am not really experienced with regular expressions, I started > slowly and thought I find all word were 'abc' actually does appear: > >> grep(pattern="abc", x=x) > [1] 1 2 > > So far, so good. Now I read that ^ is the negation operator. But it can > also denote the beginning of a string as in: > >> grep(pattern="^abc", x=x) > [1] 1 > > Of course, we need to put it inside square brackets to negate the > expression [1] >> grep(pattern="[^abc]", x=x) > [1] 1 2 3 > > But this is not what I want either. > > I'd appreciate any help. I assume this is rather easy and > straightforward. > > Thanks, > Roland > > > [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or > caret) inside square brackets negates the expression.... > > ---------- > This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Roland, I think you were almost there with your first example. Howabout using: > x <- c("abcdef", "defabc", "qwerty") > y <- grep(pattern="abc", x=x) > z.char <- x[-y] > z.index <- (1:length(x))[-y] > > z.char [1] "qwerty" > z.index [1] 3 Cheers, eric Rau, Roland wrote:> Dear all, > > let's assume I have a vector of character strings: > > x <- c("abcdef", "defabc", "qwerty") > > What I would like to find is the following: all elements where the word > 'abc' does not appear (i.e. 3 in this case of 'x'). > > Since I am not really experienced with regular expressions, I started > slowly and thought I find all word were 'abc' actually does appear: > >> grep(pattern="abc", x=x) > [1] 1 2 > > So far, so good. Now I read that ^ is the negation operator. But it can > also denote the beginning of a string as in: > >> grep(pattern="^abc", x=x) > [1] 1 > > Of course, we need to put it inside square brackets to negate the > expression [1] >> grep(pattern="[^abc]", x=x) > [1] 1 2 3 > > But this is not what I want either. > > I'd appreciate any help. I assume this is rather easy and > straightforward. > > Thanks, > Roland > > > [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or > caret) inside square brackets negates the expression.... > > ---------- > This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Eric Archer, Ph.D. Southwest Fisheries Science Center 8604 La Jolla Shores Dr. La Jolla, CA 92037 858-546-7121 (work) 858-546-7003 (FAX) ETP Cetacean Assessment Program: http://swfsc.noaa.gov/prd-etp.aspx Population ID Program: http://swfsc.noaa.gov/prd-popid.aspx "Innocence about Science is the worst crime today." - Sir Charles Percy Snow "Lighthouses are more helpful than churches." - Benjamin Franklin "...but I'll take a GPS over either one." - John C. "Craig" George
Jorge Ivan Velez wrote:> Hi Wacek, > I think you wanted to say "strings" instead "x" in your last line : ) > >of course, thanks. the correct version is: if(length(matching <- grep(pattern, strings))) strings[-matching] else strings btw., and in relation to a recent post complaining about how the mailing list is maintained, i must say that although the idea that posts could be edited after they've been sent does may not sound good in general, i think it would be useful to be able to just fix such minor typos in place instead of posting a correction. after all, the list is intended to serve as help to those who care not only to ask, but also to browse the archives. but this is a side comment, i take no sides and make no recommendations. vQ> Best, > > Jorge > > > On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk < > Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote: > > >> Rau, Roland wrote: >> >>> Dear all, >>> >>> let's assume I have a vector of character strings: >>> >>> x <- c("abcdef", "defabc", "qwerty") >>> >>> What I would like to find is the following: all elements where the word >>> 'abc' does not appear (i.e. 3 in this case of 'x'). >>> >>> >> a quick shot is: >> >> x[-grep("abc", x)] >> >> which unfortunately fails if none of the strings in x matches the >> pattern, i.e., grep returns integer(0); arguably, x[integer(0)] should >> rather return all elements of x: >> >> "An empty index selects all values" (from ?'[') >> >> but apparently integer(0) does not count as an empty index (and neither >> does NULL). so you may want something like: >> >> strings = c("abcdef", "defabc", "qwerty") >> pattern = "abc" >> if (length(matching <- grep(pattern, strings))) x[-matching] else x >> >> vQ >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>
Gabor Grothendieck wrote:> Try this: > > > # values > setdiff(x, grep("abc", x, value = TRUE)) > > Another possibility is: > > z <- "abc" > x0 <- c(x, z) # to handle no match case > x0[- grep(z, x0)] # values >on quick testing, these two and the if-based version have comparable runtime, with a minor win for the last one, and if the input is moderate this makes no real difference. however, the second solution above is likely to fail if the pattern is more complex, e.g., contains a character class or a wildcard: strings = c("xyz") pattern = "a[a-z]" strings[-grep(pattern, c(strings, pattern))] # character(0) vQ
In that case just add fixed = TRUE On Sun, Jan 18, 2009 at 2:58 PM, Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:> Gabor Grothendieck wrote: >> Try this: >> >> >> # values >> setdiff(x, grep("abc", x, value = TRUE)) >> >> Another possibility is: >> >> z <- "abc" >> x0 <- c(x, z) # to handle no match case >> x0[- grep(z, x0)] # values >> > > on quick testing, these two and the if-based version have comparable > runtime, with a minor win for the last one, and if the input is moderate > this makes no real difference. > > however, the second solution above is likely to fail if the pattern is > more complex, e.g., contains a character class or a wildcard: > > strings = c("xyz") > pattern = "a[a-z]" > strings[-grep(pattern, c(strings, pattern))] > # character(0) > > > vQ >
Wacek Kusnierczyk wrote:> > # r code > ungrep = function(pattern, x, ...) > grep(paste(pattern, "(*COMMIT)(*FAIL)|(*ACCEPT)", sep=""), x, > perl=TRUE, ...) > > strings = c("abc", "xyz") > pattern = "a[a-z]" > (filtered = strings[ungrep(pattern, strings)]) > # "xyz" >this was a toy example, but if you need this sort of ungrep with patterns involving alterations, you need a fix: ungrep("a|x", strings, value=TRUE) # "abc" # NOT character(0) # fix ungrep = function(pattern, x, ...) grep(paste("(?:", pattern, ")(*COMMIT)(*FAIL)|(*ACCEPT)", sep=""), x, perl=TRUE, ...) ungrep("a|x", strings, value=TRUE) # character(0) vQ