thr3ads.net - R help - [R] Regular expressions: offsets of groups [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Titus von der Malsburg

2010-Sep-27 15:48 UTC

[R] Regular expressions: offsets of groups

Dear list!
> gregexpr("a+(b+)", "abcdaabbc")[[1]]
[1] 1 5
attr(,"match.length")
[1] 2 4

What I want is the offsets of the matches for the group (b+), i.e. 2
and 7, not the offsets of the complete matches.  Is there a way in R
to get that?

I know about gsubgn and strapply, but they only give me the strings
matched by groups not their offsets.

I could write something myself that first takes the above matches
("ab" and "aabb") and then searches again using only the
group (b+).
For this to work, I'd have to parse the regular expression and search
several times (> 2, for nested groups) instead of just once.  But I'm
sure there is a better way to do this.

Thanks for any suggestion!

   Titus

jim holtman

2010-Sep-27 16:43 UTC

head link

[R] Regular expressions: offsets of groups

try this:
> x <-  gregexpr("a+(b+)", "abcdaabbcaaacaaab")
> justA <-  gregexpr("a+", "abcdaabbcaaacaaab")
> # find matches in 'x' for 'justA'
> indx <- which(justA[[1]] %in% x[[1]])
> # now determine where 'b' starts
> justA[[1]][indx] + attr(justA[[1]], 'match.length')[indx]
[1]  2  7 17>

On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:> Dear list!
>
>> gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches. ?Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only
the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once. ?But
I'm
> sure there is a better way to do this.
>
> Thanks for any suggestion!
>
> ? Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Henrique Dallazuanna

2010-Sep-27 17:16 UTC

head link

[R] Regular expressions: offsets of groups

You've tried:

gregexpr("b+", "abcdaabbc")


On Mon, Sep 27, 2010 at 12:48 PM, Titus von der Malsburg
<malsburg@gmail.com> wrote:
> Dear list!
>
> > gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches.  Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only
the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once.  But
I'm
> sure there is a better way to do this.
>
> Thanks for any suggestion!
>
>   Titus
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Titus von der Malsburg

2010-Sep-27 17:25 UTC

head link

[R] Regular expressions: offsets of groups

On Mon, Sep 27, 2010 at 7:16 PM, Henrique Dallazuanna <wwwhsd at
gmail.com> wrote:> You've tried:
>
> gregexpr("b+", "abcdaabbc")
But this would match the third occurrence of b+ in "abcdaabbcbb".  But
in this example I'm only interested in b+ if it's preceded by a+.

  Titus

Gabor Grothendieck

2010-Sep-27 17:29 UTC

head link

[R] Regular expressions: offsets of groups

On Mon, Sep 27, 2010 at 11:48 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:> Dear list!
>
>> gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches. ?Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only
the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once. ?But
I'm
> sure there is a better way to do this.
>
Try this zero width negative look behind expression:
> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE)[[1]]
[1] 2 7
attr(,"match.length")
[1] 1 2

See ?regexp for more info.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Titus von der Malsburg

2010-Sep-27 17:34 UTC

head link

[R] Regular expressions: offsets of groups

On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:> Try this zero width negative look behind expression:
>
>> gregexpr("(?!a+)(b+)", "abcdaabbc", perl = TRUE)
> [[1]]
> [1] 2 7
> attr(,"match.length")
> [1] 1 2
Thanks Gabor, but this gives me the same result as

  gregexpr("b+", "abcdaabbc", perl = TRUE)

which is wrong if the string is "abcdaabbcbbb".

  Titus

Gabor Grothendieck

2010-Sep-27 18:10 UTC

head link

[R] Regular expressions: offsets of groups

On Mon, Sep 27, 2010 at 1:34 PM, Titus von der Malsburg
<malsburg at gmail.com> wrote:> On Mon, Sep 27, 2010 at 7:29 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> Try this zero width negative look behind expression:
>>
>>> gregexpr("(?!a+)(b+)", "abcdaabbc", perl =
TRUE)
>> [[1]]
>> [1] 2 7
>> attr(,"match.length")
>> [1] 1 2
>
> Thanks Gabor, but this gives me the same result as
>
> ?gregexpr("b+", "abcdaabbc", perl = TRUE)
>
> which is wrong if the string is "abcdaabbcbbb".
>
Sorry, try this:
>  gregexpr("(?<=a)b+", "abcdaabbcbbb", perl = TRUE)[[1]]
[1] 2 7
attr(,"match.length")
[1] 1 2

Note that it does not give the same answer as:
>  gregexpr("b+", "abcdaabbcbbb", perl = TRUE)[[1]]
[1]  2  7 10
attr(,"match.length")
[1] 1 2 3


 gregexpr("(?<=a)b+", "abcdaabbcbbb", perl = TRUE)




-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Michael Bedward

2010-Sep-28 07:46 UTC

head link

[R] Regular expressions: offsets of groups

What Titus wants to do is akin to retrieving capturing groups from a
Matcher object in Java. I also thought there must be an existing,
elegant solution to this some time ago and searched for it, including
looking at the sources (albeit with not much expertise) but came up
blank.

I also looked at the stringr package (which is nice) but it doesn't
quite do it either.

Michael

On 28 September 2010 01:48, Titus von der Malsburg <malsburg at gmail.com>
wrote:> Dear list!
>
>> gregexpr("a+(b+)", "abcdaabbc")
> [[1]]
> [1] 1 5
> attr(,"match.length")
> [1] 2 4
>
> What I want is the offsets of the matches for the group (b+), i.e. 2
> and 7, not the offsets of the complete matches. ?Is there a way in R
> to get that?
>
> I know about gsubgn and strapply, but they only give me the strings
> matched by groups not their offsets.
>
> I could write something myself that first takes the above matches
> ("ab" and "aabb") and then searches again using only
the group (b+).
> For this to work, I'd have to parse the regular expression and search
> several times (> 2, for nested groups) instead of just once. ?But
I'm
> sure there is a better way to do this.
>
> Thanks for any suggestion!
>
> ? Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Sep 2010 - Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

[R] Regular expressions: offsets of groups

Seemingly Similar Threads