thr3ads.net - R help - [R] strapply and characters adjacent to the matched pattern [Jul 2012]

If this information is useful, please help other people find it:
Share via:

mdvaan

2012-Jul-24 21:06 UTC

[R] strapply and characters adjacent to the matched pattern

Hi,

In the example below, one of the searched patterns "SE" is matched in
the
word "second". I would like to ignore all matches in which the
character
following the match is one of [:alpha:]. How do I do this without removing
the "ignore.case = T" argument of the strapply function? Thank you
very
much!

# load library
require(gsubfn)
# read in data 
data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
# define the object to be searched 
text <- c("the first is Santa Fe Gold Corp", "the second is
Starpharma
Holdings") 
# match 
strapply(text, data, ignore.case = T)

The preferred outcome would be:

[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "Starpharma Holdings"

instead of:

[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "se"                  "Starpharma Holdings"






--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673.html
Sent from the R help mailing list archive at Nabble.com.

arun

2012-Jul-25 05:08 UTC

head link

[R] strapply and characters adjacent to the matched pattern

HI,

Tried matching with data and text using strapply-Unsuccessful.? But, you can get
the result from the data alone if that helps you.?

dat2<-strapply(data,"[^\\|]",c)
?list1<-list(paste(dat2[[1]][1:18],collapse=""),paste(dat2[[1]][19:37],collapse=""))
?list1
[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "Starpharma Holdings"

A.K.



----- Original Message -----
From: mdvaan <mathijsdevaan at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, July 24, 2012 5:06 PM
Subject: [R] strapply and characters adjacent to the matched pattern

Hi,

In the example below, one of the searched patterns "SE" is matched in
the
word "second". I would like to ignore all matches in which the
character
following the match is one of [:alpha:]. How do I do this without removing
the "ignore.case = T" argument of the strapply function? Thank you
very
much!

# load library
require(gsubfn)
# read in data 
data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
# define the object to be searched 
text <- c("the first is Santa Fe Gold Corp", "the second is
Starpharma
Holdings") 
# match 
strapply(text, data, ignore.case = T)

The preferred outcome would be:

[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "Starpharma Holdings"

instead of:

[[1]]
[1] "Santa Fe Gold Corp"

[[2]]
[1] "se"? ? ? ? ? ? ? ? ? "Starpharma Holdings"






--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2012-Jul-25 11:37 UTC

head link

[R] strapply and characters adjacent to the matched pattern

On Tue, Jul 24, 2012 at 5:06 PM, mdvaan <mathijsdevaan at gmail.com>
wrote:> Hi,
>
> In the example below, one of the searched patterns "SE" is
matched in the
> word "second". I would like to ignore all matches in which the
character
> following the match is one of [:alpha:]. How do I do this without removing
> the "ignore.case = T" argument of the strapply function? Thank
you very
> much!
>
> # load library
> require(gsubfn)
> # read in data
> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
> # define the object to be searched
> text <- c("the first is Santa Fe Gold Corp", "the second
is Starpharma
> Holdings")
> # match
> strapply(text, data, ignore.case = T)
>
> The preferred outcome would be:
>
> [[1]]
> [1] "Santa Fe Gold Corp"
>
> [[2]]
> [1] "Starpharma Holdings"
>
> instead of:
>
> [[1]]
> [1] "Santa Fe Gold Corp"
>
> [[2]]
> [1] "se"                  "Starpharma Holdings"
>
>
Try this:
> strapply(c("abc", "ab", "ab def"),
"(ab|d)($|[^[[:alpha:]])")[[1]]
NULL

[[2]]
[1] "ab"

[[3]]
[1] "ab"


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

mdvaan

2012-Jul-25 20:34 UTC

head link

[R] strapply and characters adjacent to the matched pattern

Thanks Gabor. That worked really well. I have been reading about the use of
POSIX and regular expressions and I tried to use your example to see if I
could  ignore all matches in which the character preceding (rather than
following) the match is one of [:alpha:]? So far, I have been unsuccessful.
Could anyone help me out here or direct me to a source that explains the
combined use of POSIX and regular expressions? Thanks!

require(gsubfn)
# this only checks for the characters following the match and therefore
matches also matches the third element
# however I want it to match only the 2nd, 5th and 6th elements
strapply(c("abc", "ab", "abdef", "defc",
"def", " def "),
"(def|ab)($|[^[[:alpha:]])")

The outcome should look like this:
[[1]]
NULL

[[2]]
[1] "ab"

[[3]]
NULL

[[4]]
NULL

[[5]]
[1] "def"

[[6]]
[1] "def"



Gabor Grothendieck wrote> 
> On Tue, Jul 24, 2012 at 5:06 PM, mdvaan &lt;mathijsdevaan@&gt;
wrote:
>> Hi,
>>
>> In the example below, one of the searched patterns "SE" is
matched in the
>> word "second". I would like to ignore all matches in which
the character
>> following the match is one of [:alpha:]. How do I do this without
>> removing
>> the "ignore.case = T" argument of the strapply function?
Thank you very
>> much!
>>
>> # load library
>> require(gsubfn)
>> # read in data
>> data <- c("Santa Fe Gold Corp|Starpharma Holdings|SE")
>> # define the object to be searched
>> text <- c("the first is Santa Fe Gold Corp", "the
second is Starpharma
>> Holdings")
>> # match
>> strapply(text, data, ignore.case = T)
>>
>> The preferred outcome would be:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "Starpharma Holdings"
>>
>> instead of:
>>
>> [[1]]
>> [1] "Santa Fe Gold Corp"
>>
>> [[2]]
>> [1] "se"                  "Starpharma Holdings"
>>
>>
> 
> Try this:
> 
>> strapply(c("abc", "ab", "ab def"),
"(ab|d)($|[^[[:alpha:]])")
> [[1]]
> NULL
> 
> [[2]]
> [1] "ab"
> 
> [[3]]
> [1] "ab"
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637835.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2012-Jul-26 11:08 UTC

head link

[R] strapply and characters adjacent to the matched pattern

On Wed, Jul 25, 2012 at 4:34 PM, mdvaan <mathijsdevaan at gmail.com>
wrote:> Thanks Gabor. That worked really well. I have been reading about the use of
> POSIX and regular expressions and I tried to use your example to see if I
> could  ignore all matches in which the character preceding (rather than
> following) the match is one of [:alpha:]? So far, I have been unsuccessful.
> Could anyone help me out here or direct me to a source that explains the
> combined use of POSIX and regular expressions? Thanks!
>
We match the start of the string or a non-alpha followed by the
desired string (here its xx).  Because we want the second back
reference (the default is to return the first back rerference if the
function is omitted) we must specifically tell it that by using a
function which returns its second argument, e.g. function(x, y) y or
function(...) ..2 or using the equivalent formula notation just ~ ..2
:

strapply(c("cxx", "xxc", "xx", " xx"),
"(^|[^[:alpha:]])(xx)", ~ ..2)


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

mdvaan

2012-Jul-26 14:46 UTC

head link

[R] strapply and characters adjacent to the matched pattern

Thanks Gabor for your invaluable help! I learned a lot.



--
View this message in context:
http://r.789695.n4.nabble.com/strapply-and-characters-adjacent-to-the-matched-pattern-tp4637673p4637939.html
Sent from the R help mailing list archive at Nabble.com.

R help - Jul 2012 - strapply and characters adjacent to the matched pattern

[R] strapply and characters adjacent to the matched pattern

[R] strapply and characters adjacent to the matched pattern

[R] strapply and characters adjacent to the matched pattern

[R] strapply and characters adjacent to the matched pattern

[R] strapply and characters adjacent to the matched pattern

[R] strapply and characters adjacent to the matched pattern