Hi,
In the example below, one of the searched patterns "SE" is matched in
the
word "second". I would like to ignore all matches in which the
character
following the match is one of [:alpha:]. How do I do this without removing
the "ignore.case = T" argument of the strapply function? Thank you
very
much!
# load library
require(gsubfn)
# read in data
data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
# define the object to be searched
text <- c("the first is Santa Fe Gold Corp", "the second is
Starpharma
Holdings")
# match
strapply(text, data, ignore.case = T)
The preferred outcome would be:
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "Starpharma Holdings"
instead of:
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "se" "Starpharma Holdings"
--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673.html
Sent from the R help mailing list archive at Nabble.com.
HI,
Tried matching with data and text using strapply-Unsuccessful.? But, you can get
the result from the data alone if that helps you.?
dat2<-strapply(data,"[^\\|]",c)
?list1<-list(paste(dat2[[1]][1:18],collapse=""),paste(dat2[[1]][19:37],collapse=""))
?list1
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "Starpharma Holdings"
A.K.
----- Original Message -----
From: mdvaan <mathijsdevaan at gmail.com>
To: r-help at r-project.org
Cc:
Sent: Tuesday, July 24, 2012 5:06 PM
Subject: [R] strapply and characters adjacent to the matched pattern
Hi,
In the example below, one of the searched patterns "SE" is matched in
the
word "second". I would like to ignore all matches in which the
character
following the match is one of [:alpha:]. How do I do this without removing
the "ignore.case = T" argument of the strapply function? Thank you
very
much!
# load library
require(gsubfn)
# read in data
data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
# define the object to be searched
text <- c("the first is Santa Fe Gold Corp", "the second is
Starpharma
Holdings")
# match
strapply(text, data, ignore.case = T)
The preferred outcome would be:
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "Starpharma Holdings"
instead of:
[[1]]
[1] "Santa Fe Gold Corp"
[[2]]
[1] "se"? ? ? ? ? ? ? ? ? "Starpharma Holdings"
--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Gabor Grothendieck
2012-Jul-25 11:37 UTC
[R] strapply and characters adjacent to the matched pattern
On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan at gmail.com> wrote:> Hi, > > In the example below, one of the searched patterns "SE" is matched in the > word "second". I would like to ignore all matches in which the character > following the match is one of [:alpha:]. How do I do this without removing > the "ignore.case = T" argument of the strapply function? Thank you very > much! > > # load library > require(gsubfn) > # read in data > data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE") > # define the object to be searched > text <- c("the first is Santa Fe Gold Corp", "the second is Starpharma > Holdings") > # match > strapply(text, data, ignore.case = T) > > The preferred outcome would be: > > [[1]] > [1] "Santa Fe Gold Corp" > > [[2]] > [1] "Starpharma Holdings" > > instead of: > > [[1]] > [1] "Santa Fe Gold Corp" > > [[2]] > [1] "se" "Starpharma Holdings" > >Try this:> strapply(c("abc", "ab", "ab def"), "(ab|d)($|[^[[:alpha:]])")[[1]] NULL [[2]] [1] "ab" [[3]] [1] "ab" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Thanks Gabor. That worked really well. I have been reading about the use of
POSIX and regular expressions and I tried to use your example to see if I
could ignore all matches in which the character preceding (rather than
following) the match is one of [:alpha:]? So far, I have been unsuccessful.
Could anyone help me out here or direct me to a source that explains the
combined use of POSIX and regular expressions? Thanks!
require(gsubfn)
# this only checks for the characters following the match and therefore
matches also matches the third element
# however I want it to match only the 2nd, 5th and 6th elements
strapply(c("abc", "ab", "abdef", "defc",
"def", " def "),
"(def|ab)($|[^[[:alpha:]])")
The outcome should look like this:
[[1]]
NULL
[[2]]
[1] "ab"
[[3]]
NULL
[[4]]
NULL
[[5]]
[1] "def"
[[6]]
[1] "def"
Gabor Grothendieck wrote>
> On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan@>
wrote:
>> Hi,
>>
>> In the example below, one of the searched patterns "SE" is
matched in the
>> word "second". I would like to ignore all matches in which
the character
>> following the match is one of [:alpha:]. How do I do this without
>> removing
>> the "ignore.case = T" argument of the strapply function?
Thank you very
>> much!
>>
>> # load library
>> require(gsubfn)
>> # read in data
>> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
>> # define the object to be searched
>> text <- c("the first is Santa Fe Gold Corp", "the
second is Starpharma
>> Holdings")
>> # match
>> strapply(text, data, ignore.case = T)
>>
>> The preferred outcome would be:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "Starpharma Holdings"
>>
>> instead of:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "se" "Starpharma Holdings"
>>
>>
>
> Try this:
>
>> strapply(c("abc", "ab", "ab def"),
"(ab|d)($|[^[[:alpha:]])")
> [[1]]
> NULL
>
> [[2]]
> [1] "ab"
>
> [[3]]
> [1] "ab"
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637835.html
Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2012-Jul-26 11:08 UTC
[R] strapply and characters adjacent to the matched pattern
On Wed, Jul 25, 2012 at 4:34 PM, mdvaan <mathijsdevaan at gmail.com> wrote:> Thanks Gabor. That worked really well. I have been reading about the use of > POSIX and regular expressions and I tried to use your example to see if I > could ignore all matches in which the character preceding (rather than > following) the match is one of [:alpha:]? So far, I have been unsuccessful. > Could anyone help me out here or direct me to a source that explains the > combined use of POSIX and regular expressions? Thanks! >We match the start of the string or a non-alpha followed by the desired string (here its xx). Because we want the second back reference (the default is to return the first back rerference if the function is omitted) we must specifically tell it that by using a function which returns its second argument, e.g. function(x, y) y or function(...) ..2 or using the equivalent formula notation just ~ ..2 : strapply(c("cxx", "xxc", "xx", " xx"), "(^|[^[:alpha:]])(xx)", ~ ..2) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Thanks Gabor for your invaluable help! I learned a lot. -- View this message in context: http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637939.html Sent from the R help mailing list archive at Nabble.com.