thr3ads.net - R help - [R] [External] Amazing AI [Dec 2022]

If this information is useful, please help other people find it:
Share via:

Barry Rowlingson

2022-Dec-21 13:50 UTC

[R] [External] Amazing AI

Next year one of my R programming assigments will read like this:

"Here is some R code written by a multi-million dollar AI system to
compute [something]. It doesn't work. Fix the bugs, then ask the AI to
write a letter to its creators apologising for how rubbish it is at
coding. Collect one million dollars."


On Sun, Dec 18, 2022 at 7:01 PM Boris Steipe <boris.steipe at utoronto.ca>
wrote:>
> This email originated outside the University. Check before clicking links
or attachments.
>
> Technically not a help question. But crucial to be aware of, especially for
those of us in academia, or otherwise teaching R. I am not aware of a suitable
alternate forum. If this does not interest you, please simply ignore - I already
know that this may be somewhat OT.
>
> Thanks.
> ------------------------------------------------------
>
> You very likely have heard of ChatGPT, the conversation interface on top of
the GPT-3 large language model and that it can generate code. I thought it
doesn't do R - I was wrong. Here is a little experiment:
>
> MY PROMPT:
> =========> Please write me some R code that will read a tsv file, remove
all rows that have duplicates in the column "names", and store the
median values of the remaining rows in a variable called "Medians". Of
course, that calculation should only be done with columns that are numeric.
Print the last variable in order from largest to smallest.
>
>
> AI RESPONSE:
> ===========>
> Here is some R code that should accomplish what you've described:
>
> (Note: the code is formatted in a little terminal window, complete with a
button to "copy code" at the top.)
>
> ----
> # Read the tsv file
> data <- read.table("file.tsv", header = TRUE, sep =
"\t")
>
> # Remove rows with duplicates in the "names" column
> data <- data[!duplicated(data$names), ]
>
> # Select only the numeric columns
> num_columns <- sapply(data, is.numeric)
> numeric_data <- data[, num_columns]
>
> # Calculate the medians for each column
> Medians <- apply(numeric_data, 1, median)
>
> # Sort Medians in descending order
> Medians <- sort(Medians, decreasing = TRUE)
>
> # Print Medians
> print(Medians)
> ----
>
> This code assumes that the tsv file is called "file.tsv" and is
stored in the current working directory. It also assumes that the first row of
the file contains the column names.
>
>
>
> MY PROMPT:
> =========> Can you provide me with a bit of sample data that I can use
to test this code?
>
> AI RESPONSE:
> ===========> Certainly! Here is some sample data that you can use to
test the code I provided:
>
> ----
> name    age     height  weight
> Alice   23      5.6     120
> Bob     32      6.2     180
> Charlie 29      5.9     160
> Alice   23      5.6     120
> David   25      5.8     140
> Eve     27      5.5     110
> ----
>
> You can save this data as a tsv file and use it to test the code.
>
>
> ------------------------------------------------------
>
> Notes:
> 1) The code, as written does not work with the test data. The reason is
that the test data specifies a column "name", but the code assumes
"names" (which I specified in the instructions). Once this is fixed in
the test data, the code executes.
>
> 2) The instructions are not accurately translated: I had asked for row
means. But changing the margin in the apply() statement from 2 to 1 does the
correct thing.
>
> 3) Note that the test data contains both numeric and non-numeric columns.
Also, the name "Alice" is duplicated in the test data, which is
subtle, and the right thing to do.
>
>
> ------------------------------------------------------
>
> A similar experiment, run on the beta-site of OpenAI's Codex
code-completion tool gives the following code (complete with the comments.):
>
> ----
> # Read in the data
> data <- read.table("mydata.tsv", sep = "\t", header
= TRUE)
>
> # Remove all rows that have duplicates in the column "names"
> unique_names <- unique(data$names)
> data_unique <- data[data$names %in% unique_names,]
>
> # Calculate medians
> Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)],
median, na.rm = TRUE)
>
> # Print last variable in order form largest to smallest
> rev(sort(Medians[length(Medians)]))
> ----
>
> Note that the strategy is quite different (e.g using %in%, not duplicated()
), the interpretation of "last variable" is technically correct but
not what I had in mind (ChatGPT got that right though).
>
>
> Changing my prompts slightly resulted it going for a dplyr solution
instead, complete with %>% idioms etc ... again, syntactically correct but
not giving me the fully correct results.
>
> ------------------------------------------------------
>
> Bottom line: The AI's ability to translate natural language
instructions into code is astounding. Errors the AI makes are subtle and
probably not easy to fix if you don't already know what you are doing. But
the way that this can be "confidently incorrect" and plausible makes
it nearly impossible to detect unless you actually run the code (you may have
noticed that when you read the code).
>
> Will our students use it? Absolutely.
>
> Will they successfully cheat with it? That depends on the assignment. We
probably need to _encourage_ them to use it rather than sanction - but require
them to attribute the AI, document prompts, and identify their own, additional
contributions.
>
> Will it help them learn? When you are aware of the issues, it may be quite
useful. It may be especially useful to teach them to specify their code
carefully and completely, and to ask questions in the right way. Test cases are
crucial.
>
> How will it affect what we do as instructors? I don't know. Really.
>
> And the future? I am not pleased to extrapolate to a job market in which
they compete with knowledge workers who work 24/7 without benefits, vacation
pay, or even a salary. They'll need to rethink the value of their investment
in an academic education. We'll need to rethink what we do to provide value
above and beyond what AI's can do. (Nb. all of the arguments I hear about
why humans will always be better etc. are easily debunked, but that's even
more OT :-)
>
> --------------------------------------------------------
>
> If you have thoughts to share how your institution is thinking about
academic integrity in this situation, or creative ideas how to integrate this
into teaching, I'd love to hear from you.
>
>
> All the best!
> Boris
>
>
> --
> Boris Steipe MD, PhD
> University of Toronto
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Spencer Graves

2022-Dec-21 14:20 UTC

head link

[R] [External] Amazing AI

On 12/21/22 7:50 AM, Barry Rowlingson wrote:> Next year one of my R programming assigments will read like this:
> 
> "Here is some R code written by a multi-million dollar AI system to
> compute [something]. It doesn't work. Fix the bugs, then ask the AI to
> write a letter to its creators apologising for how rubbish it is at
> coding. Collect one million dollars."

You might want to be careful about such a promise.  Kahneman, Sibony, 
and Sunstein (2021) Noise:  A flaw in human judgment (Little, Brown and 
Company) claim that genuine expertise is acquired by learning from 
frequent, rapid, high-quality feedback on the quality of their 
decisions.  Few people have access to such feedback.  They call leaders 
in fields without such feedback "respect-experts", and note that 
respect-experts have only the illusion of competence.


	  1.  They further say that most respect-experts can be beaten by 
simple heuristics developed by intelligent lay people.


	  2.  Moreover, with a modest amount of data, ordinary least squares 
can beat most such heuristics.


	  3.  And if lots of data are available, AI can beat the simple 
heuristics.


	  They provide substantial quantities of research to support those 
claims.


	  Regarding your million dollars, it should not be hard to write an R 
interface to existing AI code cited by Kahneman et al.


	  Do you really want one of your students initiating a legal procedure 
to try to collect your million dollars?


	  A quarter century ago, my friend Soren Bisgaard told me about a 
colleague who had raved about AI.  Soren thought, "You prefer artificial 
intelligence to real intelligence?"


	  I perceive a role for AI in identifying subtle phenomena missed by 
more understandable modeling techniques.  Let's use the best 
understandable model, and apply AI to the residuals from that.  Then 
identify the variables that make the largest contributions to a useful 
AI model, and see if they can be added to the other model.


	  Spencer Graves> 
> 
> On Sun, Dec 18, 2022 at 7:01 PM Boris Steipe <boris.steipe at
utoronto.ca> wrote:
>>
>> This email originated outside the University. Check before clicking
links or attachments.
>>
>> Technically not a help question. But crucial to be aware of, especially
for those of us in academia, or otherwise teaching R. I am not aware of a
suitable alternate forum. If this does not interest you, please simply ignore -
I already know that this may be somewhat OT.
>>
>> Thanks.
>> ------------------------------------------------------
>>
>> You very likely have heard of ChatGPT, the conversation interface on
top of the GPT-3 large language model and that it can generate code. I thought
it doesn't do R - I was wrong. Here is a little experiment:
>>
>> MY PROMPT:
>> =========>> Please write me some R code that will read a tsv
file, remove all rows that have duplicates in the column "names", and
store the median values of the remaining rows in a variable called
"Medians". Of course, that calculation should only be done with
columns that are numeric. Print the last variable in order from largest to
smallest.
>>
>>
>> AI RESPONSE:
>> ===========>>
>> Here is some R code that should accomplish what you've described:
>>
>> (Note: the code is formatted in a little terminal window, complete with
a button to "copy code" at the top.)
>>
>> ----
>> # Read the tsv file
>> data <- read.table("file.tsv", header = TRUE, sep =
"\t")
>>
>> # Remove rows with duplicates in the "names" column
>> data <- data[!duplicated(data$names), ]
>>
>> # Select only the numeric columns
>> num_columns <- sapply(data, is.numeric)
>> numeric_data <- data[, num_columns]
>>
>> # Calculate the medians for each column
>> Medians <- apply(numeric_data, 1, median)
>>
>> # Sort Medians in descending order
>> Medians <- sort(Medians, decreasing = TRUE)
>>
>> # Print Medians
>> print(Medians)
>> ----
>>
>> This code assumes that the tsv file is called "file.tsv" and
is stored in the current working directory. It also assumes that the first row
of the file contains the column names.
>>
>>
>>
>> MY PROMPT:
>> =========>> Can you provide me with a bit of sample data that I
can use to test this code?
>>
>> AI RESPONSE:
>> ===========>> Certainly! Here is some sample data that you can
use to test the code I provided:
>>
>> ----
>> name    age     height  weight
>> Alice   23      5.6     120
>> Bob     32      6.2     180
>> Charlie 29      5.9     160
>> Alice   23      5.6     120
>> David   25      5.8     140
>> Eve     27      5.5     110
>> ----
>>
>> You can save this data as a tsv file and use it to test the code.
>>
>>
>> ------------------------------------------------------
>>
>> Notes:
>> 1) The code, as written does not work with the test data. The reason is
that the test data specifies a column "name", but the code assumes
"names" (which I specified in the instructions). Once this is fixed in
the test data, the code executes.
>>
>> 2) The instructions are not accurately translated: I had asked for row
means. But changing the margin in the apply() statement from 2 to 1 does the
correct thing.
>>
>> 3) Note that the test data contains both numeric and non-numeric
columns. Also, the name "Alice" is duplicated in the test data, which
is subtle, and the right thing to do.
>>
>>
>> ------------------------------------------------------
>>
>> A similar experiment, run on the beta-site of OpenAI's Codex
code-completion tool gives the following code (complete with the comments.):
>>
>> ----
>> # Read in the data
>> data <- read.table("mydata.tsv", sep = "\t",
header = TRUE)
>>
>> # Remove all rows that have duplicates in the column "names"
>> unique_names <- unique(data$names)
>> data_unique <- data[data$names %in% unique_names,]
>>
>> # Calculate medians
>> Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)],
median, na.rm = TRUE)
>>
>> # Print last variable in order form largest to smallest
>> rev(sort(Medians[length(Medians)]))
>> ----
>>
>> Note that the strategy is quite different (e.g using %in%, not
duplicated() ), the interpretation of "last variable" is technically
correct but not what I had in mind (ChatGPT got that right though).
>>
>>
>> Changing my prompts slightly resulted it going for a dplyr solution
instead, complete with %>% idioms etc ... again, syntactically correct but
not giving me the fully correct results.
>>
>> ------------------------------------------------------
>>
>> Bottom line: The AI's ability to translate natural language
instructions into code is astounding. Errors the AI makes are subtle and
probably not easy to fix if you don't already know what you are doing. But
the way that this can be "confidently incorrect" and plausible makes
it nearly impossible to detect unless you actually run the code (you may have
noticed that when you read the code).
>>
>> Will our students use it? Absolutely.
>>
>> Will they successfully cheat with it? That depends on the assignment.
We probably need to _encourage_ them to use it rather than sanction - but
require them to attribute the AI, document prompts, and identify their own,
additional contributions.
>>
>> Will it help them learn? When you are aware of the issues, it may be
quite useful. It may be especially useful to teach them to specify their code
carefully and completely, and to ask questions in the right way. Test cases are
crucial.
>>
>> How will it affect what we do as instructors? I don't know. Really.
>>
>> And the future? I am not pleased to extrapolate to a job market in
which they compete with knowledge workers who work 24/7 without benefits,
vacation pay, or even a salary. They'll need to rethink the value of their
investment in an academic education. We'll need to rethink what we do to
provide value above and beyond what AI's can do. (Nb. all of the arguments I
hear about why humans will always be better etc. are easily debunked, but
that's even more OT :-)
>>
>> --------------------------------------------------------
>>
>> If you have thoughts to share how your institution is thinking about
academic integrity in this situation, or creative ideas how to integrate this
into teaching, I'd love to hear from you.
>>
>>
>> All the best!
>> Boris
>>
>>
>> --
>> Boris Steipe MD, PhD
>> University of Toronto
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Dec 2022 - [External] Amazing AI

[R] [External] Amazing AI

[R] [External] Amazing AI