Folks: Consider:> y <- "xx wt"> grep(" +(?=t)",y, perl = TRUE)integer(0) ## Unexpected. Lookahead construct does not find "t" after space ## But> grep(" +(?=.+t)",y, perl = TRUE)[1] 1 ## Expected. Given pattern for **exact** match, lookahead finds it My concern is: ?regexp says this: "Patterns (?=...) and (?!...) are zero-width positive and negative lookahead *assertions*: they match if an attempt to match the ... forward from the current position would succeed (or not), but use up no characters in the string being processed." But this appears to be imprecise (it confused me, anyway). The usual sense of "matching" in regex's is "match the pattern somewhere in the string going forward." But in the perl lookahead construct it apparently must **exactly** match *everything* in the string that follows. Questions: Am I correct about this? If not, what do I misunderstand? If I am correct, should the regex help be slightly modified to something like: "Patterns (?=...) and (?!...) are zero-width positive and negative lookahead *assertions*: they match if an attempt to **exactly" match all of ... forward from the current position would succeed (or not), but use up no characters in the string being processed." Thanks. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) [[alternative HTML version deleted]]
I think that the current documentation is correct, but that does not mean that it cannot be improved. The key phrase for me is "from the current position" which says to me that the match needs to happen right there, not just somewhere in the rest of the string. If you used the expression " +t" then you would expect it to only match if the t was immediately after the last space, not somewhere in the string after the last space, it is the same with the look-ahead. On Mon, Aug 10, 2020 at 10:37 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:> > Folks: > > Consider: > > y <- "xx wt" > > > grep(" +(?=t)",y, perl = TRUE) > integer(0) > ## Unexpected. Lookahead construct does not find "t" after space > ## But > > grep(" +(?=.+t)",y, perl = TRUE) > [1] 1 > ## Expected. Given pattern for **exact** match, lookahead finds it > > My concern is: > ?regexp says this: > "Patterns (?=...) and (?!...) are zero-width positive and negative lookahead > *assertions*: they match if an attempt to match the ... forward from the > current position would succeed (or not), but use up no characters in the > string being processed." > > But this appears to be imprecise (it confused me, anyway). The usual sense > of "matching" in regex's is "match the pattern somewhere in the string > going forward." But in the perl lookahead construct it apparently must > **exactly** match *everything* in the string that follows. > > Questions: > Am I correct about this? If not, what do I misunderstand? > If I am correct, should the regex help be slightly modified to something > like: > > "Patterns (?=...) and (?!...) are zero-width positive and negative lookahead > *assertions*: they match if an attempt to **exactly" match all of ... forward > from the current position would succeed (or not), but use up no characters > in the string being processed." > > Thanks. > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
> On 10 Aug 2020, at 18:36, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > But this appears to be imprecise (it confused me, anyway). The usual sense > of "matching" in regex's is "match the pattern somewhere in the string > going forward." But in the perl lookahead construct it apparently must > **exactly** match *everything* in the string that follows. > > Questions: > Am I correct about this? If not, what do I misunderstand?I think you're confused about the terminology. To _match_ a regular expression is to find a substring described by the regexp at a given starting point; what you have in mind is to _search_ a string for matches of a regular expression. Python uses this terminology in its regexp matching functions, and from what you cited in the documentation so do Perl and PCRE in their docs. Best, Stefan
Thank you. That indeed dispels my brain fog! Best, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Aug 12, 2020 at 6:35 AM Stefan Evert <stefanML at collocations.de> wrote:> > > On 10 Aug 2020, at 18:36, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > But this appears to be imprecise (it confused me, anyway). The usual > sense > > of "matching" in regex's is "match the pattern somewhere in the string > > going forward." But in the perl lookahead construct it apparently must > > **exactly** match *everything* in the string that follows. > > > > Questions: > > Am I correct about this? If not, what do I misunderstand? > > I think you're confused about the terminology. To _match_ a regular > expression is to find a substring described by the regexp at a given > starting point; what you have in mind is to _search_ a string for matches > of a regular expression. > > Python uses this terminology in its regexp matching functions, and from > what you cited in the documentation so do Perl and PCRE in their docs. > > Best, > Stefan > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]