Dear all,
let's assume I have a vector of character strings:
x <- c("abcdef", "defabc", "qwerty")
What I would like to find is the following: all elements where the word
'abc' does not appear (i.e. 3 in this case of 'x').
Since I am not really experienced with regular expressions, I started
slowly and thought I find all word were 'abc' actually does appear:
> grep(pattern="abc", x=x)
[1] 1 2
So far, so good. Now I read that ^ is the negation operator. But it can
also denote the beginning of a string as in:
> grep(pattern="^abc", x=x)
[1] 1
Of course, we need to put it inside square brackets to negate the
expression [1]> grep(pattern="[^abc]", x=x)
[1] 1 2 3
But this is not what I want either.
I'd appreciate any help. I assume this is rather easy and
straightforward.
Thanks,
Roland
[1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
caret) inside square brackets negates the expression....
----------
This mail has been sent through the MPI for Demographic Research. Should you
receive a mail that is apparently from a MPI user without this text displayed,
then the address has most likely been faked. If you are uncertain about the
validity of this message, please check the mail header or ask your system
administrator for assistance.
Just remove those elements that match:> x <- c("abcdef", "defabc", "qwerty") > x[-grep('abc',x)][1] "qwerty">On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland <Rau at demogr.mpg.de> wrote:> Dear all, > > let's assume I have a vector of character strings: > > x <- c("abcdef", "defabc", "qwerty") > > What I would like to find is the following: all elements where the word > 'abc' does not appear (i.e. 3 in this case of 'x'). > > Since I am not really experienced with regular expressions, I started > slowly and thought I find all word were 'abc' actually does appear: > >> grep(pattern="abc", x=x) > [1] 1 2 > > So far, so good. Now I read that ^ is the negation operator. But it can > also denote the beginning of a string as in: > >> grep(pattern="^abc", x=x) > [1] 1 > > Of course, we need to put it inside square brackets to negate the > expression [1] >> grep(pattern="[^abc]", x=x) > [1] 1 2 3 > > But this is not what I want either. > > I'd appreciate any help. I assume this is rather easy and > straightforward. > > Thanks, > Roland > > > [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or > caret) inside square brackets negates the expression.... > > ---------- > This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Rau, Roland wrote:> Dear all, > > let's assume I have a vector of character strings: > > x <- c("abcdef", "defabc", "qwerty") > > What I would like to find is the following: all elements where the word > 'abc' does not appear (i.e. 3 in this case of 'x'). >a quick shot is: x[-grep("abc", x)] which unfortunately fails if none of the strings in x matches the pattern, i.e., grep returns integer(0); arguably, x[integer(0)] should rather return all elements of x: "An empty index selects all values" (from ?'[') but apparently integer(0) does not count as an empty index (and neither does NULL). so you may want something like: strings = c("abcdef", "defabc", "qwerty") pattern = "abc" if (length(matching <- grep(pattern, strings))) x[-matching] else x vQ
Try this:
# indexes
setdiff(seq_along(x), grep("abc", x))
# values
setdiff(x, grep("abc", x, value = TRUE))
Another possibility is:
z <- "abc"
x0 <- c(x, z) # to handle no match case
x0[- grep(z, x0)] # values
On Sun, Jan 18, 2009 at 1:35 PM, Rau, Roland <Rau at demogr.mpg.de>
wrote:> Dear all,
>
> let's assume I have a vector of character strings:
>
> x <- c("abcdef", "defabc", "qwerty")
>
> What I would like to find is the following: all elements where the word
> 'abc' does not appear (i.e. 3 in this case of 'x').
>
> Since I am not really experienced with regular expressions, I started
> slowly and thought I find all word were 'abc' actually does appear:
>
>> grep(pattern="abc", x=x)
> [1] 1 2
>
> So far, so good. Now I read that ^ is the negation operator. But it can
> also denote the beginning of a string as in:
>
>> grep(pattern="^abc", x=x)
> [1] 1
>
> Of course, we need to put it inside square brackets to negate the
> expression [1]
>> grep(pattern="[^abc]", x=x)
> [1] 1 2 3
>
> But this is not what I want either.
>
> I'd appreciate any help. I assume this is rather easy and
> straightforward.
>
> Thanks,
> Roland
>
>
> [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
> caret) inside square brackets negates the expression....
>
> ----------
> This mail has been sent through the MPI for Demographic Research. Should
you receive a mail that is apparently from a MPI user without this text
displayed, then the address has most likely been faked. If you are uncertain
about the validity of this message, please check the mail header or ask your
system administrator for assistance.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Roland,
I think you were almost there with your first example. Howabout using:
> x <- c("abcdef", "defabc", "qwerty")
> y <- grep(pattern="abc", x=x)
> z.char <- x[-y]
> z.index <- (1:length(x))[-y]
>
> z.char
[1] "qwerty"
> z.index
[1] 3
Cheers,
eric
Rau, Roland wrote:> Dear all,
>
> let's assume I have a vector of character strings:
>
> x <- c("abcdef", "defabc", "qwerty")
>
> What I would like to find is the following: all elements where the word
> 'abc' does not appear (i.e. 3 in this case of 'x').
>
> Since I am not really experienced with regular expressions, I started
> slowly and thought I find all word were 'abc' actually does appear:
>
>> grep(pattern="abc", x=x)
> [1] 1 2
>
> So far, so good. Now I read that ^ is the negation operator. But it can
> also denote the beginning of a string as in:
>
>> grep(pattern="^abc", x=x)
> [1] 1
>
> Of course, we need to put it inside square brackets to negate the
> expression [1]
>> grep(pattern="[^abc]", x=x)
> [1] 1 2 3
>
> But this is not what I want either.
>
> I'd appreciate any help. I assume this is rather easy and
> straightforward.
>
> Thanks,
> Roland
>
>
> [1] http://www.zytrax.com/tech/web/regex.htm: The ^ (circumflex or
> caret) inside square brackets negates the expression....
>
> ----------
> This mail has been sent through the MPI for Demographic Research. Should
you receive a mail that is apparently from a MPI user without this text
displayed, then the address has most likely been faked. If you are uncertain
about the validity of this message, please check the mail header or ask your
system administrator for assistance.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Eric Archer, Ph.D.
Southwest Fisheries Science Center
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121 (work)
858-546-7003 (FAX)
ETP Cetacean Assessment Program: http://swfsc.noaa.gov/prd-etp.aspx
Population ID Program: http://swfsc.noaa.gov/prd-popid.aspx
"Innocence about Science is the worst crime today."
- Sir Charles Percy Snow
"Lighthouses are more helpful than churches."
- Benjamin Franklin
"...but I'll take a GPS over either one."
- John C. "Craig" George
Jorge Ivan Velez wrote:> Hi Wacek, > I think you wanted to say "strings" instead "x" in your last line : ) > >of course, thanks. the correct version is: if(length(matching <- grep(pattern, strings))) strings[-matching] else strings btw., and in relation to a recent post complaining about how the mailing list is maintained, i must say that although the idea that posts could be edited after they've been sent does may not sound good in general, i think it would be useful to be able to just fix such minor typos in place instead of posting a correction. after all, the list is intended to serve as help to those who care not only to ask, but also to browse the archives. but this is a side comment, i take no sides and make no recommendations. vQ> Best, > > Jorge > > > On Sun, Jan 18, 2009 at 2:22 PM, Wacek Kusnierczyk < > Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote: > > >> Rau, Roland wrote: >> >>> Dear all, >>> >>> let's assume I have a vector of character strings: >>> >>> x <- c("abcdef", "defabc", "qwerty") >>> >>> What I would like to find is the following: all elements where the word >>> 'abc' does not appear (i.e. 3 in this case of 'x'). >>> >>> >> a quick shot is: >> >> x[-grep("abc", x)] >> >> which unfortunately fails if none of the strings in x matches the >> pattern, i.e., grep returns integer(0); arguably, x[integer(0)] should >> rather return all elements of x: >> >> "An empty index selects all values" (from ?'[') >> >> but apparently integer(0) does not count as an empty index (and neither >> does NULL). so you may want something like: >> >> strings = c("abcdef", "defabc", "qwerty") >> pattern = "abc" >> if (length(matching <- grep(pattern, strings))) x[-matching] else x >> >> vQ >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>
Gabor Grothendieck wrote:> Try this: > > > # values > setdiff(x, grep("abc", x, value = TRUE)) > > Another possibility is: > > z <- "abc" > x0 <- c(x, z) # to handle no match case > x0[- grep(z, x0)] # values >on quick testing, these two and the if-based version have comparable runtime, with a minor win for the last one, and if the input is moderate this makes no real difference. however, the second solution above is likely to fail if the pattern is more complex, e.g., contains a character class or a wildcard: strings = c("xyz") pattern = "a[a-z]" strings[-grep(pattern, c(strings, pattern))] # character(0) vQ
In that case just add fixed = TRUE On Sun, Jan 18, 2009 at 2:58 PM, Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:> Gabor Grothendieck wrote: >> Try this: >> >> >> # values >> setdiff(x, grep("abc", x, value = TRUE)) >> >> Another possibility is: >> >> z <- "abc" >> x0 <- c(x, z) # to handle no match case >> x0[- grep(z, x0)] # values >> > > on quick testing, these two and the if-based version have comparable > runtime, with a minor win for the last one, and if the input is moderate > this makes no real difference. > > however, the second solution above is likely to fail if the pattern is > more complex, e.g., contains a character class or a wildcard: > > strings = c("xyz") > pattern = "a[a-z]" > strings[-grep(pattern, c(strings, pattern))] > # character(0) > > > vQ >
Wacek Kusnierczyk wrote:> > # r code > ungrep = function(pattern, x, ...) > grep(paste(pattern, "(*COMMIT)(*FAIL)|(*ACCEPT)", sep=""), x, > perl=TRUE, ...) > > strings = c("abc", "xyz") > pattern = "a[a-z]" > (filtered = strings[ungrep(pattern, strings)]) > # "xyz" >this was a toy example, but if you need this sort of ungrep with patterns involving alterations, you need a fix: ungrep("a|x", strings, value=TRUE) # "abc" # NOT character(0) # fix ungrep = function(pattern, x, ...) grep(paste("(?:", pattern, ")(*COMMIT)(*FAIL)|(*ACCEPT)", sep=""), x, perl=TRUE, ...) ungrep("a|x", strings, value=TRUE) # character(0) vQ