Hi, In the example below, one of the searched patterns "SE" is matched in the word "second". I would like to ignore all matches in which the character following the match is one of [:alpha:]. How do I do this without removing the "ignore.case = T" argument of the strapply function? Thank you very much! # load library require(gsubfn) # read in data data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE") # define the object to be searched text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma Holdings") # match strapply(text, data, ignore.case = T) The preferred outcome would be: [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "Starpharma Holdings" instead of: [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "se" "Starpharma Holdings" -- View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673.html Sent from the R help mailing list archive at Nabble.com.
HI, Tried matching with data and text using strapply-Unsuccessful.? But, you can get the result from the data alone if that helps you.? dat2<-strapply(data,"[^\\|]",c) ?list1<-list(paste(dat2[[1]][1:18],collapse=""),paste(dat2[[1]][19:37],collapse="")) ?list1 [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "Starpharma Holdings" A.K. ----- Original Message ----- From: mdvaan <mathijsdevaan at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, July 24, 2012 5:06 PM Subject: [R] strapply and characters adjacent to the matched pattern Hi, In the example below, one of the searched patterns "SE" is matched in the word "second". I would like to ignore all matches in which the character following the match is one of [:alpha:]. How do I do this without removing the "ignore.case = T" argument of the strapply function? Thank you very much! # load library require(gsubfn) # read in data data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE") # define the object to be searched text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma Holdings") # match strapply(text, data, ignore.case = T) The preferred outcome would be: [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "Starpharma Holdings" instead of: [[1]] [1] "Santa Fe Gold Corp" [[2]] [1] "se"? ? ? ? ? ? ? ? ? "Starpharma Holdings" -- View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2012-Jul-25 11:37 UTC
[R] strapply and characters adjacent to the matched pattern
On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan at gmail.com> wrote:> Hi, > > In the example below, one of the searched patterns "SE" is matched in the > word "second". I would like to ignore all matches in which the character > following the match is one of [:alpha:]. How do I do this without removing > the "ignore.case = T" argument of the strapply function? Thank you very > much! > > # load library > require(gsubfn) > # read in data > data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE") > # define the object to be searched > text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma > Holdings") > # match > strapply(text, data, ignore.case = T) > > The preferred outcome would be: > > [[1]] > [1] "Santa Fe Gold Corp" > > [[2]] > [1] "Starpharma Holdings" > > instead of: > > [[1]] > [1] "Santa Fe Gold Corp" > > [[2]] > [1] "se" "Starpharma Holdings" > >Try this:> strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])")[[1]] NULL [[2]] [1] "ab" [[3]] [1] "ab" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Thanks Gabor. That worked really well. I have been reading about the use of POSIX and regular expressions and I tried to use your example to see if I could ignore all matches in which the character preceding (rather than following) the match is one of [:alpha:]? So far, I have been unsuccessful. Could anyone help me out here or direct me to a source that explains the combined use of POSIX and regular expressions? Thanks! require(gsubfn) # this only checks for the characters following the match and therefore matches also matches the third element # however I want it to match only the 2nd, 5th and 6th elements strapply(c("abc", "ab", "abdef", "defc", "def", " def "), "(def|ab)($|[^[[:alpha:]])") The outcome should look like this: [[1]] NULL [[2]] [1] "ab" [[3]] NULL [[4]] NULL [[5]] [1] "def" [[6]] [1] "def" Gabor Grothendieck wrote> > On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan@> wrote: >> Hi, >> >> In the example below, one of the searched patterns "SE" is matched in the >> word "second". I would like to ignore all matches in which the character >> following the match is one of [:alpha:]. How do I do this without >> removing >> the "ignore.case = T" argument of the strapply function? Thank you very >> much! >> >> # load library >> require(gsubfn) >> # read in data >> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE") >> # define the object to be searched >> text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma >> Holdings") >> # match >> strapply(text, data, ignore.case = T) >> >> The preferred outcome would be: >> >> [[1]] >> [1] "Santa Fe Gold Corp" >> >> [[2]] >> [1] "Starpharma Holdings" >> >> instead of: >> >> [[1]] >> [1] "Santa Fe Gold Corp" >> >> [[2]] >> [1] "se" "Starpharma Holdings" >> >> > > Try this: > >> strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])") > [[1]] > NULL > > [[2]] > [1] "ab" > > [[3]] > [1] "ab" > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637835.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2012-Jul-26 11:08 UTC
[R] strapply and characters adjacent to the matched pattern
On Wed, Jul 25, 2012 at 4:34 PM, mdvaan <mathijsdevaan at gmail.com> wrote:> Thanks Gabor. That worked really well. I have been reading about the use of > POSIX and regular expressions and I tried to use your example to see if I > could ignore all matches in which the character preceding (rather than > following) the match is one of [:alpha:]? So far, I have been unsuccessful. > Could anyone help me out here or direct me to a source that explains the > combined use of POSIX and regular expressions? Thanks! >We match the start of the string or a non-alpha followed by the desired string (here its xx). Because we want the second back reference (the default is to return the first back rerference if the function is omitted) we must specifically tell it that by using a function which returns its second argument, e.g. function(x, y) y or function(...) ..2 or using the equivalent formula notation just ~ ..2 : strapply(c("cxx", "xxc", "xx", " xx"), "(^|[^[:alpha:]])(xx)", ~ ..2) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Thanks Gabor for your invaluable help! I learned a lot. -- View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637939.html Sent from the R help mailing list archive at Nabble.com.