Duncan - Indeed, this has now been well documented; I have called these constructions "Schr?dinger Facts", since they arise from a superposition of truths in the training data that collapse into an untruth when observed. https://sentientsyllabus.substack.com/p/chatgpts-achilles-heel Now, the curious question is: why can it program. Why is its production of R-syntax less vulnerable to this effect than a literature quote, or reference? Maybe it has to do with the fact that in producing correct syntax there is a ground truth - errors just will not run. But I'm not sure. Regardless: it signals that we need a reinvigorated culture of validation. Actually ChatGPT will happily make test data for you. Whisvh. in a way, it judt did ;-) Boris> On 2023-02-13, at 08:37, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > I was wondering which anchor would match in the regular expression "(a)|(b)" if both "a" and "b" matches the target string. What appears to happen is that the first match wins, and that's how it is documented in Python, but I was looking for docs on this in R. > > As far as I can see, the ?regex help page doesn't resolve this. So just for fun, I decided to ask ChatGPT, and got an interesting response. I won't show the whole conversation here, but just some extracts: > > --------------------------------------- > Me: In a regular expression in R, if both "a" and "b" match, which does "(a)|(b)" match? Please include authoritative references in your answer. > > ChatGPT: [long answer about grep] > > Me: In grep(), it doesn't matter which of a or b matches. It's only in sub() or gsub() that it makes a difference. Do you know an authoritative reference that discusses those functions? > > ChatGPT: [long irrelevant answer about sub and gsub, with a reference to "R Cookbook" by P. Teetor.] > > Me: Teetor's book is not authoritative, and the other reference doesn't discuss ambiguous alternation. > > ... finally we get to the interesting part ... > > ChatGPT: I apologize for the mistake in my previous answer. > > A more authoritative reference for the behavior of regular expressions in R can be found in the official R documentation. > > Regarding the behavior of ambiguous alternation, the regex section of the manual "An Introduction to R" explains it as follows: > > "When several alternatives are possible, the first alternative is chosen, so it is important to specify alternatives in the order in which they should be tried." > > Here is the reference for the official R documentation: > > "An Introduction to R" manual: https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities > > I hope this reference provides the information you were looking for. > --------------------------------------- > > > All right, an authoritative answer! The only problem is that *it doesn't exist*. There's no such anchor in that document. R-intro isn't a very rapidly changing document, so I could look at every change recorded in svn, and that section has *never* existed. > > I guess ChatGPT is a pattern matching machine, and if such a question had an answer, it would probably be in a location like that. (I would have guessed a help page like ?regex instead of R-intro, but I had just told it that answer was wrong.) > > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Boris Steipe MD, PhD Professor em. Department of Biochemistry Temerty Faculty of Medicine University of Toronto
Chat bots are like politicians, or talking dogs. The fact that they exist is interesting. But no same person would believe anything they say. On Mon, Feb 13, 2023, 10:58 AM Boris Steipe <boris.steipe at utoronto.ca> wrote:> Duncan - > > Indeed, this has now been well documented; I have called these > constructions "Schr?dinger Facts", since they arise from a superposition of > truths in the training data that collapse into an untruth when observed. > > https://sentientsyllabus.substack.com/p/chatgpts-achilles-heel > > Now, the curious question is: why can it program. Why is its production of > R-syntax less vulnerable to this effect than a literature quote, or > reference? Maybe it has to do with the fact that in producing correct > syntax there is a ground truth - errors just will not run. But I'm not sure. > > Regardless: it signals that we need a reinvigorated culture of validation. > Actually ChatGPT will happily make test data for you. Whisvh. in a way, it > judt did ;-) > > > Boris > > > > > > On 2023-02-13, at 08:37, Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: > > > > I was wondering which anchor would match in the regular expression > "(a)|(b)" if both "a" and "b" matches the target string. What appears to > happen is that the first match wins, and that's how it is documented in > Python, but I was looking for docs on this in R. > > > > As far as I can see, the ?regex help page doesn't resolve this. So just > for fun, I decided to ask ChatGPT, and got an interesting response. I > won't show the whole conversation here, but just some extracts: > > > > --------------------------------------- > > Me: In a regular expression in R, if both "a" and "b" match, which does > "(a)|(b)" match? Please include authoritative references in your answer. > > > > ChatGPT: [long answer about grep] > > > > Me: In grep(), it doesn't matter which of a or b matches. It's only in > sub() or gsub() that it makes a difference. Do you know an authoritative > reference that discusses those functions? > > > > ChatGPT: [long irrelevant answer about sub and gsub, with a reference > to "R Cookbook" by P. Teetor.] > > > > Me: Teetor's book is not authoritative, and the other reference doesn't > discuss ambiguous alternation. > > > > ... finally we get to the interesting part ... > > > > ChatGPT: I apologize for the mistake in my previous answer. > > > > A more authoritative reference for the behavior of regular expressions > in R can be found in the official R documentation. > > > > Regarding the behavior of ambiguous alternation, the regex section of > the manual "An Introduction to R" explains it as follows: > > > > "When several alternatives are possible, the first alternative is > chosen, so it is important to specify alternatives in the order in which > they should be tried." > > > > Here is the reference for the official R documentation: > > > > "An Introduction to R" manual: > https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Regular-expression-facilities > > > > I hope this reference provides the information you were looking for. > > --------------------------------------- > > > > > > All right, an authoritative answer! The only problem is that *it > doesn't exist*. There's no such anchor in that document. R-intro isn't a > very rapidly changing document, so I could look at every change recorded in > svn, and that section has *never* existed. > > > > I guess ChatGPT is a pattern matching machine, and if such a question > had an answer, it would probably be in a location like that. (I would have > guessed a help page like ?regex instead of R-intro, but I had just told it > that answer was wrong.) > > > > Duncan Murdoch > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > Boris Steipe MD, PhD > > Professor em. > Department of Biochemistry > Temerty Faculty of Medicine > University of Toronto > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
ChatGPT provided an R code with error when I asked a question related to prediction of a new data point for linear mixed effects model. So, it is not immune to coding mistakes! Ravi ________________________________ From: R-devel <r-devel-bounces at r-project.org> on behalf of Boris Steipe <boris.steipe at utoronto.ca> Sent: Monday, February 13, 2023 10:54 AM To: Duncan Murdoch <murdoch.duncan at gmail.com> Cc: r-devel <r-devel at r-project.org> Subject: Re: [Rd] An interesting chat with ChatGPT External Email - Use Caution Duncan - Indeed, this has now been well documented; I have called these constructions "Schr?dinger Facts", since they arise from a superposition of truths in the training data that collapse into an untruth when observed. https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsentientsyllabus.substack.com%2Fp%2Fchatgpts-achilles-heel&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WM23AlyRohiQh5EqTqkTI2B%2BXTG38k%2BtfaRv7KQwybw%3D&reserved=0 Now, the curious question is: why can it program. Why is its production of R-syntax less vulnerable to this effect than a literature quote, or reference? Maybe it has to do with the fact that in producing correct syntax there is a ground truth - errors just will not run. But I'm not sure. Regardless: it signals that we need a reinvigorated culture of validation. Actually ChatGPT will happily make test data for you. Whisvh. in a way, it judt did ;-) Boris> On 2023-02-13, at 08:37, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: > > I was wondering which anchor would match in the regular expression "(a)|(b)" if both "a" and "b" matches the target string. What appears to happen is that the first match wins, and that's how it is documented in Python, but I was looking for docs on this in R. > > As far as I can see, the ?regex help page doesn't resolve this. So just for fun, I decided to ask ChatGPT, and got an interesting response. I won't show the whole conversation here, but just some extracts: > > --------------------------------------- > Me: In a regular expression in R, if both "a" and "b" match, which does "(a)|(b)" match? Please include authoritative references in your answer. > > ChatGPT: [long answer about grep] > > Me: In grep(), it doesn't matter which of a or b matches. It's only in sub() or gsub() that it makes a difference. Do you know an authoritative reference that discusses those functions? > > ChatGPT: [long irrelevant answer about sub and gsub, with a reference to "R Cookbook" by P. Teetor.] > > Me: Teetor's book is not authoritative, and the other reference doesn't discuss ambiguous alternation. > > ... finally we get to the interesting part ... > > ChatGPT: I apologize for the mistake in my previous answer. > > A more authoritative reference for the behavior of regular expressions in R can be found in the official R documentation. > > Regarding the behavior of ambiguous alternation, the regex section of the manual "An Introduction to R" explains it as follows: > > "When several alternatives are possible, the first alternative is chosen, so it is important to specify alternatives in the order in which they should be tried." > > Here is the reference for the official R documentation: > > "An Introduction to R" manual: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcran.r-project.org%2Fdoc%2Fmanuals%2Fr-release%2FR-intro.html%23Regular-expression-facilities&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6MFKiZJ0nyDLmEDLcVxg%2Bm%2BYTe5lQixS2NIdZNCMecE%3D&reserved=0 > > I hope this reference provides the information you were looking for. > --------------------------------------- > > > All right, an authoritative answer! The only problem is that *it doesn't exist*. There's no such anchor in that document. R-intro isn't a very rapidly changing document, so I could look at every change recorded in svn, and that section has *never* existed. > > I guess ChatGPT is a pattern matching machine, and if such a question had an answer, it would probably be in a location like that. (I would have guessed a help page like ?regex instead of R-intro, but I had just told it that answer was wrong.) > > Duncan Murdoch > > ______________________________________________ > R-devel at r-project.org mailing list > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tcAowvYXOiEUYTN0K40pcLcag7YenH1cDa9%2FtLnOn6c%3D&reserved=0-- Boris Steipe MD, PhD Professor em. Department of Biochemistry Temerty Faculty of Medicine University of Toronto ______________________________________________ R-devel at r-project.org mailing list https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-devel&data=05%7C01%7Cravi.varadhan%40jhu.edu%7C1a6aecbbe3974b52595608db0ddb3271%7C9fa4f438b1e6473b803f86f8aedf0dec%7C0%7C0%7C638119007350695881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tcAowvYXOiEUYTN0K40pcLcag7YenH1cDa9%2FtLnOn6c%3D&reserved=0 [[alternative HTML version deleted]]