thr3ads.net - R help - [R] Amazing AI [Dec 2022]

If this information is useful, please help other people find it:
Share via:

Boris Steipe

2022-Dec-18 19:01 UTC

[R] Amazing AI

Technically not a help question. But crucial to be aware of, especially for
those of us in academia, or otherwise teaching R. I am not aware of a suitable
alternate forum. If this does not interest you, please simply ignore - I already
know that this may be somewhat OT.

Thanks.
------------------------------------------------------

You very likely have heard of ChatGPT, the conversation interface on top of the
GPT-3 large language model and that it can generate code. I thought it
doesn't do R - I was wrong. Here is a little experiment:

MY PROMPT:
=========Please write me some R code that will read a tsv file, remove all rows
that have duplicates in the column "names", and store the median
values of the remaining rows in a variable called "Medians". Of
course, that calculation should only be done with columns that are numeric.
Print the last variable in order from largest to smallest.


AI RESPONSE:
===========
Here is some R code that should accomplish what you've described:

(Note: the code is formatted in a little terminal window, complete with a button
to "copy code" at the top.)

----
# Read the tsv file
data <- read.table("file.tsv", header = TRUE, sep = "\t")

# Remove rows with duplicates in the "names" column
data <- data[!duplicated(data$names), ]

# Select only the numeric columns
num_columns <- sapply(data, is.numeric)
numeric_data <- data[, num_columns]

# Calculate the medians for each column
Medians <- apply(numeric_data, 1, median)

# Sort Medians in descending order
Medians <- sort(Medians, decreasing = TRUE)

# Print Medians
print(Medians)
----

This code assumes that the tsv file is called "file.tsv" and is stored
in the current working directory. It also assumes that the first row of the file
contains the column names.



MY PROMPT:
=========Can you provide me with a bit of sample data that I can use to test
this code?

AI RESPONSE:
===========Certainly! Here is some sample data that you can use to test the code
I provided:

----
name	age	height	weight
Alice	23	5.6	120
Bob	32	6.2	180
Charlie	29	5.9	160
Alice	23	5.6	120
David	25	5.8	140
Eve	27	5.5	110
----

You can save this data as a tsv file and use it to test the code.


------------------------------------------------------

Notes: 
1) The code, as written does not work with the test data. The reason is that the
test data specifies a column "name", but the code assumes
"names" (which I specified in the instructions). Once this is fixed in
the test data, the code executes.

2) The instructions are not accurately translated: I had asked for row means.
But changing the margin in the apply() statement from 2 to 1 does the correct
thing.

3) Note that the test data contains both numeric and non-numeric columns. Also,
the name "Alice" is duplicated in the test data, which is subtle, and
the right thing to do.


------------------------------------------------------

A similar experiment, run on the beta-site of OpenAI's Codex code-completion
tool gives the following code (complete with the comments.):

----
# Read in the data
data <- read.table("mydata.tsv", sep = "\t", header =
TRUE)

# Remove all rows that have duplicates in the column "names"
unique_names <- unique(data$names)
data_unique <- data[data$names %in% unique_names,]

# Calculate medians 
Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median,
na.rm = TRUE)

# Print last variable in order form largest to smallest
rev(sort(Medians[length(Medians)]))
----

Note that the strategy is quite different (e.g using %in%, not duplicated() ),
the interpretation of "last variable" is technically correct but not
what I had in mind (ChatGPT got that right though).


Changing my prompts slightly resulted it going for a dplyr solution instead,
complete with %>% idioms etc ... again, syntactically correct but not giving
me the fully correct results.

------------------------------------------------------

Bottom line: The AI's ability to translate natural language instructions
into code is astounding. Errors the AI makes are subtle and probably not easy to
fix if you don't already know what you are doing. But the way that this can
be "confidently incorrect" and plausible makes it nearly impossible to
detect unless you actually run the code (you may have noticed that when you read
the code).

Will our students use it? Absolutely.

Will they successfully cheat with it? That depends on the assignment. We
probably need to _encourage_ them to use it rather than sanction - but require
them to attribute the AI, document prompts, and identify their own, additional
contributions.

Will it help them learn? When you are aware of the issues, it may be quite
useful. It may be especially useful to teach them to specify their code
carefully and completely, and to ask questions in the right way. Test cases are
crucial.

How will it affect what we do as instructors? I don't know. Really. 

And the future? I am not pleased to extrapolate to a job market in which they
compete with knowledge workers who work 24/7 without benefits, vacation pay, or
even a salary. They'll need to rethink the value of their investment in an
academic education. We'll need to rethink what we do to provide value above
and beyond what AI's can do. (Nb. all of the arguments I hear about why
humans will always be better etc. are easily debunked, but that's even more
OT :-)

--------------------------------------------------------

If you have thoughts to share how your institution is thinking about academic
integrity in this situation, or creative ideas how to integrate this into
teaching, I'd love to hear from you.


All the best!
Boris


--
Boris Steipe MD, PhD
University of Toronto

Ebert,Timothy Aaron

2022-Dec-18 22:47 UTC

head link

[R] Amazing AI

It would help students formulate a plan for coding. Successful students will be
able to give good directions that the AI can turn into good code. This skill is
essential no matter who writes the program.
In more advanced classes I might collect some data sets designed to cause the AI
problems. Another option is to make tests where students have to write code on
paper or multiple guess where students must choose between similar code snips.


-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Boris Steipe
Sent: Sunday, December 18, 2022 2:01 PM
To: r-help at r-project.org
Subject: [R] Amazing AI

[External Email]

Technically not a help question. But crucial to be aware of, especially for
those of us in academia, or otherwise teaching R. I am not aware of a suitable
alternate forum. If this does not interest you, please simply ignore - I already
know that this may be somewhat OT.

Thanks.
------------------------------------------------------

You very likely have heard of ChatGPT, the conversation interface on top of the
GPT-3 large language model and that it can generate code. I thought it
doesn't do R - I was wrong. Here is a little experiment:

MY PROMPT:
=========Please write me some R code that will read a tsv file, remove all rows
that have duplicates in the column "names", and store the median
values of the remaining rows in a variable called "Medians". Of
course, that calculation should only be done with columns that are numeric.
Print the last variable in order from largest to smallest.


AI RESPONSE:
===========
Here is some R code that should accomplish what you've described:

(Note: the code is formatted in a little terminal window, complete with a button
to "copy code" at the top.)

----
# Read the tsv file
data <- read.table("file.tsv", header = TRUE, sep = "\t")

# Remove rows with duplicates in the "names" column data <-
data[!duplicated(data$names), ]

# Select only the numeric columns
num_columns <- sapply(data, is.numeric)
numeric_data <- data[, num_columns]

# Calculate the medians for each column
Medians <- apply(numeric_data, 1, median)

# Sort Medians in descending order
Medians <- sort(Medians, decreasing = TRUE)

# Print Medians
print(Medians)
----

This code assumes that the tsv file is called "file.tsv" and is stored
in the current working directory. It also assumes that the first row of the file
contains the column names.



MY PROMPT:
=========Can you provide me with a bit of sample data that I can use to test
this code?

AI RESPONSE:
===========Certainly! Here is some sample data that you can use to test the code
I provided:

----
name    age     height  weight
Alice   23      5.6     120
Bob     32      6.2     180
Charlie 29      5.9     160
Alice   23      5.6     120
David   25      5.8     140
Eve     27      5.5     110
----

You can save this data as a tsv file and use it to test the code.


------------------------------------------------------

Notes:
1) The code, as written does not work with the test data. The reason is that the
test data specifies a column "name", but the code assumes
"names" (which I specified in the instructions). Once this is fixed in
the test data, the code executes.

2) The instructions are not accurately translated: I had asked for row means.
But changing the margin in the apply() statement from 2 to 1 does the correct
thing.

3) Note that the test data contains both numeric and non-numeric columns. Also,
the name "Alice" is duplicated in the test data, which is subtle, and
the right thing to do.


------------------------------------------------------

A similar experiment, run on the beta-site of OpenAI's Codex code-completion
tool gives the following code (complete with the comments.):

----
# Read in the data
data <- read.table("mydata.tsv", sep = "\t", header =
TRUE)

# Remove all rows that have duplicates in the column "names"
unique_names <- unique(data$names)
data_unique <- data[data$names %in% unique_names,]

# Calculate medians
Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)], median,
na.rm = TRUE)

# Print last variable in order form largest to smallest
rev(sort(Medians[length(Medians)]))
----

Note that the strategy is quite different (e.g using %in%, not duplicated() ),
the interpretation of "last variable" is technically correct but not
what I had in mind (ChatGPT got that right though).


Changing my prompts slightly resulted it going for a dplyr solution instead,
complete with %>% idioms etc ... again, syntactically correct but not giving
me the fully correct results.

------------------------------------------------------

Bottom line: The AI's ability to translate natural language instructions
into code is astounding. Errors the AI makes are subtle and probably not easy to
fix if you don't already know what you are doing. But the way that this can
be "confidently incorrect" and plausible makes it nearly impossible to
detect unless you actually run the code (you may have noticed that when you read
the code).

Will our students use it? Absolutely.

Will they successfully cheat with it? That depends on the assignment. We
probably need to _encourage_ them to use it rather than sanction - but require
them to attribute the AI, document prompts, and identify their own, additional
contributions.

Will it help them learn? When you are aware of the issues, it may be quite
useful. It may be especially useful to teach them to specify their code
carefully and completely, and to ask questions in the right way. Test cases are
crucial.

How will it affect what we do as instructors? I don't know. Really.

And the future? I am not pleased to extrapolate to a job market in which they
compete with knowledge workers who work 24/7 without benefits, vacation pay, or
even a salary. They'll need to rethink the value of their investment in an
academic education. We'll need to rethink what we do to provide value above
and beyond what AI's can do. (Nb. all of the arguments I hear about why
humans will always be better etc. are easily debunked, but that's even more
OT :-)

--------------------------------------------------------

If you have thoughts to share how your institution is thinking about academic
integrity in this situation, or creative ideas how to integrate this into
teaching, I'd love to hear from you.


All the best!
Boris


--
Boris Steipe MD, PhD
University of Toronto

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7Ce75e9fae0cc6458889d808dae12a42fc%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638069869013242055%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SUHfSjl4x6R6iJd3HZ8A5RLOxz7%2BycNv6gvjTTch%2BYg%3D&reserved=0
PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7Ce75e9fae0cc6458889d808dae12a42fc%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638069869013242055%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nFy1v8poyMtjvYlXgt8yRaerlPwSpeoTWMOrgzvxeH0%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.

Milan Glacier

2022-Dec-19 08:58 UTC

head link

[R] Amazing AI

On 12/18/22 19:01, Boris Steipe wrote:>Technically not a help question. But crucial to be aware of, especially for
those of us in academia, or otherwise teaching R. I am not aware of a suitable
alternate forum. If this does not interest you, please simply ignore - I already
know that this may be somewhat OT.
>
>Thanks.
>------------------------------------------------------
>
>You very likely have heard of ChatGPT, the conversation interface on top of
the GPT-3 large language model and that it can generate code. I thought it
doesn't do R - I was wrong. Here is a little experiment:
>Note that the strategy is quite different (e.g using %in%, not duplicated()
), the interpretation of "last variable" is technically correct but
not what I had in mind (ChatGPT got that right though).
>
>
>Changing my prompts slightly resulted it going for a dplyr solution instead,
complete with %>% idioms etc ... again, syntactically correct but not giving
me the fully correct results.
>
>------------------------------------------------------
>
>Bottom line: The AI's ability to translate natural language instructions
into code is astounding. Errors the AI makes are subtle and probably not easy to
fix if you don't already know what you are doing. But the way that this can
be "confidently incorrect" and plausible makes it nearly impossible to
detect unless you actually run the code (you may have noticed that when you read
the code).
>
>Will our students use it? Absolutely.
>
>Will they successfully cheat with it? That depends on the assignment. We
probably need to _encourage_ them to use it rather than sanction - but require
them to attribute the AI, document prompts, and identify their own, additional
contributions.
>
>Will it help them learn? When you are aware of the issues, it may be quite
useful. It may be especially useful to teach them to specify their code
carefully and completely, and to ask questions in the right way. Test cases are
crucial.
>
>How will it affect what we do as instructors? I don't know. Really.
>
>And the future? I am not pleased to extrapolate to a job market in which
they compete with knowledge workers who work 24/7 without benefits, vacation
pay, or even a salary. They'll need to rethink the value of their investment
in an academic education. We'll need to rethink what we do to provide value
above and beyond what AI's can do. (Nb. all of the arguments I hear about
why humans will always be better etc. are easily debunked, but that's even
more OT :-)
>
>--------------------------------------------------------
>
>If you have thoughts to share how your institution is thinking about
academic integrity in this situation, or creative ideas how to integrate this
into teaching, I'd love to hear from you.
*NEVER* let the AI misleading the students! ChatGPT gives you seemingly
sound but actually *wrong* code!

ChatGPT never understands the formal abstraction behind the code, it
just understands the shallow text pattern (and the syntax rules) in the
code. And it often gives you the code that seemingly correct but indeed
wrongly output. If it is used with code completion, then it is okay
(just like github copilot), since the coder need to modify the code
after getting the completion. But if you want to use ChatGPT for
students to query information / writing code, it is error proning!

Christopher W. Ryan

2022-Dec-19 21:04 UTC

head link

[R] Amazing AI

In clinical medicine, the question the patient asks rarely represents
their main concern. Most of what I've done in my career, and most of
what I've taught, is about how to have the back-and-forth dynamic dialoq
with the patient, to help them formulate what's really on their mind,
and make sure I understand it before proceeding.

Seems to me that statistical consulting, or working IT in an
organization where one is serving "internal" customers, is similar.
Students need to learn the skills of clarifying, verifying, and
paraphrasing a client's needs.

Sure, ChatGPT may be able to generate decent code to do what a client
*says* they want to do, but is that *really* what they are looking for?

So don't retire yet :)

--Chris Ryan

Boris Steipe wrote:> Technically not a help question. But crucial to be aware of, especially for
those of us in academia, or otherwise teaching R. I am not aware of a suitable
alternate forum. If this does not interest you, please simply ignore - I already
know that this may be somewhat OT.
> 
> Thanks.
> ------------------------------------------------------
> 
> You very likely have heard of ChatGPT, the conversation interface on top of
the GPT-3 large language model and that it can generate code. I thought it
doesn't do R - I was wrong. Here is a little experiment:
> 
> MY PROMPT:
> =========> Please write me some R code that will read a tsv file, remove
all rows that have duplicates in the column "names", and store the
median values of the remaining rows in a variable called "Medians". Of
course, that calculation should only be done with columns that are numeric.
Print the last variable in order from largest to smallest.
> 
> 
> AI RESPONSE:
> ===========> 
> Here is some R code that should accomplish what you've described:
> 
> (Note: the code is formatted in a little terminal window, complete with a
button to "copy code" at the top.)
> 
> ----
> # Read the tsv file
> data <- read.table("file.tsv", header = TRUE, sep =
"\t")
> 
> # Remove rows with duplicates in the "names" column
> data <- data[!duplicated(data$names), ]
> 
> # Select only the numeric columns
> num_columns <- sapply(data, is.numeric)
> numeric_data <- data[, num_columns]
> 
> # Calculate the medians for each column
> Medians <- apply(numeric_data, 1, median)
> 
> # Sort Medians in descending order
> Medians <- sort(Medians, decreasing = TRUE)
> 
> # Print Medians
> print(Medians)
> ----
> 
> This code assumes that the tsv file is called "file.tsv" and is
stored in the current working directory. It also assumes that the first row of
the file contains the column names.
> 
> 
> 
> MY PROMPT:
> =========> Can you provide me with a bit of sample data that I can use
to test this code?
> 
> AI RESPONSE:
> ===========> Certainly! Here is some sample data that you can use to
test the code I provided:
> 
> ----
> name	age	height	weight
> Alice	23	5.6	120
> Bob	32	6.2	180
> Charlie	29	5.9	160
> Alice	23	5.6	120
> David	25	5.8	140
> Eve	27	5.5	110
> ----
> 
> You can save this data as a tsv file and use it to test the code.
> 
> 
> ------------------------------------------------------
> 
> Notes: 
> 1) The code, as written does not work with the test data. The reason is
that the test data specifies a column "name", but the code assumes
"names" (which I specified in the instructions). Once this is fixed in
the test data, the code executes.
> 
> 2) The instructions are not accurately translated: I had asked for row
means. But changing the margin in the apply() statement from 2 to 1 does the
correct thing.
> 
> 3) Note that the test data contains both numeric and non-numeric columns.
Also, the name "Alice" is duplicated in the test data, which is
subtle, and the right thing to do.
> 
> 
> ------------------------------------------------------
> 
> A similar experiment, run on the beta-site of OpenAI's Codex
code-completion tool gives the following code (complete with the comments.):
> 
> ----
> # Read in the data
> data <- read.table("mydata.tsv", sep = "\t", header
= TRUE)
> 
> # Remove all rows that have duplicates in the column "names"
> unique_names <- unique(data$names)
> data_unique <- data[data$names %in% unique_names,]
> 
> # Calculate medians 
> Medians <- sapply(data_unique[, sapply(data_unique, is.numeric)],
median, na.rm = TRUE)
> 
> # Print last variable in order form largest to smallest
> rev(sort(Medians[length(Medians)]))
> ----
> 
> Note that the strategy is quite different (e.g using %in%, not duplicated()
), the interpretation of "last variable" is technically correct but
not what I had in mind (ChatGPT got that right though).
> 
> 
> Changing my prompts slightly resulted it going for a dplyr solution
instead, complete with %>% idioms etc ... again, syntactically correct but
not giving me the fully correct results.
> 
> ------------------------------------------------------
> 
> Bottom line: The AI's ability to translate natural language
instructions into code is astounding. Errors the AI makes are subtle and
probably not easy to fix if you don't already know what you are doing. But
the way that this can be "confidently incorrect" and plausible makes
it nearly impossible to detect unless you actually run the code (you may have
noticed that when you read the code).
> 
> Will our students use it? Absolutely.
> 
> Will they successfully cheat with it? That depends on the assignment. We
probably need to _encourage_ them to use it rather than sanction - but require
them to attribute the AI, document prompts, and identify their own, additional
contributions.
> 
> Will it help them learn? When you are aware of the issues, it may be quite
useful. It may be especially useful to teach them to specify their code
carefully and completely, and to ask questions in the right way. Test cases are
crucial.
> 
> How will it affect what we do as instructors? I don't know. Really. 
> 
> And the future? I am not pleased to extrapolate to a job market in which
they compete with knowledge workers who work 24/7 without benefits, vacation
pay, or even a salary. They'll need to rethink the value of their investment
in an academic education. We'll need to rethink what we do to provide value
above and beyond what AI's can do. (Nb. all of the arguments I hear about
why humans will always be better etc. are easily debunked, but that's even
more OT :-)
> 
> --------------------------------------------------------
> 
> If you have thoughts to share how your institution is thinking about
academic integrity in this situation, or creative ideas how to integrate this
into teaching, I'd love to hear from you.
> 
> 
> All the best!
> Boris
> 
> 
> --
> Boris Steipe MD, PhD
> University of Toronto
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Dec 2022 - Amazing AI

[R] Amazing AI

[R] Amazing AI

[R] Amazing AI

[R] Amazing AI