Thanks for pointing out my mistake. I oversimplified the real problem.
I'll try to post a version of it that comes closer: Suppose I have a
string like this:
x <- "\n```html\nblah blah \n```\n\n```r\nblah blah\n```\n"
If I cat() it, I see that it is really markdown source:
```html
blah blah
```
```r
blah blah
```
I want to find the part that includes the html block, but not the r
block. So I want to match "```html", followed by a minimal number of
characters, then "```". Then this pattern works:
pattern <- "\n```html\n.*?\n```\n"
and we get the right answer:
cat(regmatches(x, regexpr(pattern, x)))
```html
blah blah
```
Okay, but this flavour of markdown says there can be more backticks, not
just 3. So the block might look like
````html
blah blah
````
I need to have the same number of backticks in the opening and closing
marker. So I make the pattern more complicated, and it doesn't work:
pattern2 <- "\n([`]{3,})html\n.*?\n\\1\n"
This matches all of x:
> pattern2 <- "\n([`]{3,})html\n.*?\n\\1\n"
> cat(regmatches(x, regexpr(pattern2, x)))
```html
blah blah
```
```r
blah blah
```
Is that a bug, or am I making a silly mistake again?
Duncan Murdoch
On 25/01/2023 7:34 p.m., Andrew Simmons wrote:> grep(value = TRUE) just returns the strings which match the pattern. You
> have to use regexpr() or gregexpr() if you want to know where the
> matches are:
>
> ```
> x <- "abaca"
>
> # extract only the first match with?regexpr()
> m <- regexpr("a.*?a", x)
> regmatches(x, m)
>
> # or
>
> # extract every match with gregexpr()
> m <- gregexpr("a.*?a", x)
> regmatches(x, m)
> ```
>
> You could also use sub() to remove the rest of the string:
> `sub("^.*(a.*?a).*$", "\\1", x)`
> keeping only the match within the parenthesis.
>
>
> On Wed, Jan 25, 2023, 19:19 Duncan Murdoch <murdoch.duncan at gmail.com
> <mailto:murdoch.duncan at gmail.com>> wrote:
>
> The docs for ?regexp say this:? "By default repetition is greedy,
so
> the
> maximal possible number of repeats is used. This can be changed to
> ?minimal? by appending ? to the quantifier. (There are further
> quantifiers that allow approximate matching: see the TRE
> documentation.)"
>
> I want the minimal match, but I don't seem to be getting it.? For
> example,
>
> x <- "abaca"
> grep("a.*?a", x, value = TRUE)
> #> [1] "abaca"
>
> Shouldn't I have gotten "aba", which is the first match
to "a.*a"?? If
> not, what would be the regexp that would give me the first match to
> "a.*a", without greedy expansion of the .*?
>
> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing
list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>