The docs for ?regexp say this: "By default repetition is greedy, so the maximal possible number of repeats is used. This can be changed to ?minimal? by appending ? to the quantifier. (There are further quantifiers that allow approximate matching: see the TRE documentation.)" I want the minimal match, but I don't seem to be getting it. For example, x <- "abaca" grep("a.*?a", x, value = TRUE) #> [1] "abaca" Shouldn't I have gotten "aba", which is the first match to "a.*a"? If not, what would be the regexp that would give me the first match to "a.*a", without greedy expansion of the .*? Duncan Murdoch
grep(value = TRUE) just returns the strings which match the pattern. You have to use regexpr() or gregexpr() if you want to know where the matches are: ``` x <- "abaca" # extract only the first match with regexpr() m <- regexpr("a.*?a", x) regmatches(x, m) # or # extract every match with gregexpr() m <- gregexpr("a.*?a", x) regmatches(x, m) ``` You could also use sub() to remove the rest of the string: `sub("^.*(a.*?a).*$", "\\1", x)` keeping only the match within the parenthesis. On Wed, Jan 25, 2023, 19:19 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> The docs for ?regexp say this: "By default repetition is greedy, so the > maximal possible number of repeats is used. This can be changed to > ?minimal? by appending ? to the quantifier. (There are further > quantifiers that allow approximate matching: see the TRE documentation.)" > > I want the minimal match, but I don't seem to be getting it. For example, > > x <- "abaca" > grep("a.*?a", x, value = TRUE) > #> [1] "abaca" > > Shouldn't I have gotten "aba", which is the first match to "a.*a"? If > not, what would be the regexp that would give me the first match to > "a.*a", without greedy expansion of the .*? > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 25/01/2023 7:19 p.m., Duncan Murdoch wrote:> The docs for ?regexp say this: "By default repetition is greedy, so the > maximal possible number of repeats is used. This can be changed to > ?minimal? by appending ? to the quantifier. (There are further > quantifiers that allow approximate matching: see the TRE documentation.)" > > I want the minimal match, but I don't seem to be getting it. For example, > > x <- "abaca" > grep("a.*?a", x, value = TRUE) > #> [1] "abaca" > > Shouldn't I have gotten "aba", which is the first match to "a.*a"? If > not, what would be the regexp that would give me the first match to > "a.*a", without greedy expansion of the .*?Sorry, that was a dumb question. Of course grep returned the whole thing. I should be using regexpr() or some related function to extract the match. Duncan Murdoch
Perhaps sub( "^.*(a.*?a).*$", "\\1", x ) On January 25, 2023 4:19:01 PM PST, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:>The docs for ?regexp say this: "By default repetition is greedy, so the maximal possible number of repeats is used. This can be changed to ?minimal? by appending ? to the quantifier. (There are further quantifiers that allow approximate matching: see the TRE documentation.)" > >I want the minimal match, but I don't seem to be getting it. For example, > >x <- "abaca" >grep("a.*?a", x, value = TRUE) >#> [1] "abaca" > >Shouldn't I have gotten "aba", which is the first match to "a.*a"? If not, what would be the regexp that would give me the first match to "a.*a", without greedy expansion of the .*? > >Duncan Murdoch > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.