Duncan -
Indeed, this has now been well documented; I have called these constructions
"Schr?dinger Facts", since they arise from a superposition of truths
in the training data that collapse into an untruth when observed.
   https://sentientsyllabus.substack.com/p/chatgpts-achilles-heel
Now, the curious question is: why can it program. Why is its production of
R-syntax less vulnerable to this effect than a literature quote, or reference?
Maybe it has to do with the fact that in producing correct syntax there is a
ground truth - errors just will not run. But I'm not sure.
Regardless: it signals that we need a reinvigorated culture of validation.
Actually ChatGPT will happily make test data for you. Whisvh. in a way, it judt
did  ;-)
Boris
> On 2023-02-13, at 08:37, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
> 
> I was wondering which anchor would match in the regular expression
"(a)|(b)" if both "a" and "b" matches the target
string.  What appears to happen is that the first match wins, and that's how
it is documented in Python, but I was looking for docs on this in R.
> 
> As far as I can see, the ?regex help page doesn't resolve this. So just
for fun, I decided to ask ChatGPT, and got an interesting response.  I won't
show the whole conversation here, but just some extracts:
> 
> ---------------------------------------
> Me:  In a regular expression in R, if both "a" and "b"
match, which does "(a)|(b)" match?  Please include authoritative
references in your answer.
> 
> ChatGPT: [long answer about grep]
> 
> Me:  In grep(), it doesn't matter which of a or b matches.  It's
only in sub() or gsub() that it makes a difference.  Do you know an
authoritative reference that discusses those functions?
> 
> ChatGPT:  [long irrelevant answer about sub and gsub, with a reference to
"R Cookbook" by P. Teetor.]
> 
> Me:  Teetor's book is not authoritative, and the other reference
doesn't discuss ambiguous alternation.
> 
> ... finally we get to the interesting part ...
> 
> ChatGPT: I apologize for the mistake in my previous answer.
> 
> A more authoritative reference for the behavior of regular expressions in R
can be found in the official R documentation.
> 
> Regarding the behavior of ambiguous alternation, the regex section of the
manual "An Introduction to R" explains it as follows:
> 
> "When several alternatives are possible, the first alternative is
chosen, so it is important to specify alternatives in the order in which they
should be tried."
> 
> Here is the reference for the official R documentation:
> 
>    "An Introduction to R" manual:
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities
> 
> I hope this reference provides the information you were looking for.
> ---------------------------------------
> 
> 
> All right, an authoritative answer!  The only problem is that *it
doesn't exist*.  There's no such anchor in that document.  R-intro
isn't a very rapidly changing document, so I could look at every change
recorded in svn, and that section has *never* existed.
> 
> I guess ChatGPT is a pattern matching machine, and if such a question had
an answer, it would probably be in a location like that.  (I would have guessed
a help page like ?regex instead of R-intro, but I had just told it that answer
was wrong.)
> 
> Duncan Murdoch
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Boris Steipe MD, PhD
Professor em.
Department of Biochemistry 
Temerty Faculty of Medicine
University of Toronto