Sébastien Lahaie
2020-Jun-19 12:49 UTC
[R] Strange behavior when sampling rows of a data frame
I ran into some strange behavior in R when trying to assign a treatment to rows in a data frame. I'm wondering whether any R experts can explain what's going on. First, let's assign a treatment to 3 out of 10 rows as follows.> df <- data.frame(unit = 1:10)> df$treated <- FALSE>> s <- sample(nrow(df), 3)> df[s,]$treated <- TRUE>> dfunit treated 1 1 FALSE 2 2 TRUE 3 3 FALSE 4 4 FALSE 5 5 TRUE 6 6 FALSE 7 7 TRUE 8 8 FALSE 9 9 FALSE 10 10 FALSE This is as expected. Now we'll just skip the intermediate step of saving the sampled indices, and apply the treatment directly as follows.> df <- data.frame(unit = 1:10)> df$treated <- FALSE>> df[sample(nrow(df), 3),]$treated <- TRUE>> dfunit treated 1 6 TRUE 2 2 FALSE 3 3 FALSE 4 9 TRUE 5 5 FALSE 6 6 FALSE 7 7 FALSE 8 5 TRUE 9 9 FALSE 10 10 FALSE Now the data frame still has 10 rows with 3 assigned to the treatment. But the units are garbled. Units 1 and 4 have disappeared, for instance, and there are duplicates for 6 and 9, one assigned to treatment and the other to control. Why would this happen? Thanks, Sebastien [[alternative HTML version deleted]]
Rui Barradas
2020-Jun-19 15:45 UTC
[R] Strange behavior when sampling rows of a data frame
Hello, I don't have an answer on the reason why this happens but it seems like a bug. Where? In which of? `[<-.data.frame` or `[<-.default`? A solution is to subset and assign the vector: set.seed(2020) df2 <- data.frame(unit = 1:10) df2$treated <- FALSE df2$treated[sample(nrow(df2), 3)] <- TRUE df2 #? unit treated #1???? 1?? FALSE #2???? 2?? FALSE #3???? 3?? FALSE #4???? 4?? FALSE #5???? 5?? FALSE #6???? 6??? TRUE #7???? 7??? TRUE #8???? 8??? TRUE #9???? 9?? FALSE #10?? 10?? FALSE Or set.seed(2020) df3 <- data.frame(unit = 1:10) df3$treated <- FALSE df3[sample(nrow(df3), 3), "treated"] <- TRUE df3 # result as expected Hope this helps, Rui? Barradas ?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu:> I ran into some strange behavior in R when trying to assign a treatment to > rows in a data frame. I'm wondering whether any R experts can explain > what's going on. > > First, let's assign a treatment to 3 out of 10 rows as follows. > >> df <- data.frame(unit = 1:10) >> df$treated <- FALSE >> s <- sample(nrow(df), 3) >> df[s,]$treated <- TRUE >> df > unit treated > > 1 1 FALSE > > 2 2 TRUE > > 3 3 FALSE > > 4 4 FALSE > > 5 5 TRUE > > 6 6 FALSE > > 7 7 TRUE > > 8 8 FALSE > > 9 9 FALSE > > 10 10 FALSE > > This is as expected. Now we'll just skip the intermediate step of saving > the sampled indices, and apply the treatment directly as follows. > >> df <- data.frame(unit = 1:10) >> df$treated <- FALSE >> df[sample(nrow(df), 3),]$treated <- TRUE >> df > unit treated > > 1 6 TRUE > > 2 2 FALSE > > 3 3 FALSE > > 4 9 TRUE > > 5 5 FALSE > > 6 6 FALSE > > 7 7 FALSE > > 8 5 TRUE > > 9 9 FALSE > > 10 10 FALSE > > Now the data frame still has 10 rows with 3 assigned to the treatment. But > the units are garbled. Units 1 and 4 have disappeared, for instance, and > there are duplicates for 6 and 9, one assigned to treatment and the other > to control. Why would this happen? > > Thanks, > Sebastien > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast. https://www.avast.com/antivirus
William Dunlap
2020-Jun-19 16:20 UTC
[R] Strange behavior when sampling rows of a data frame
The first subscript argument is getting evaluated twice.> trace(sample) > set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUEtrace: sample(10, 3) trace: sample(10, 3)> i[1] 1 10 4> set.seed(2020); sample(10,3)trace: sample(10, 3) [1] 7 6 8> sample(10,3)trace: sample(10, 3) [1] 1 10 4 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > I don't have an answer on the reason why this happens but it seems like > a bug. Where? > > In which of `[<-.data.frame` or `[<-.default`? > > A solution is to subset and assign the vector: > > > set.seed(2020) > df2 <- data.frame(unit = 1:10) > df2$treated <- FALSE > > df2$treated[sample(nrow(df2), 3)] <- TRUE > df2 > # unit treated > #1 1 FALSE > #2 2 FALSE > #3 3 FALSE > #4 4 FALSE > #5 5 FALSE > #6 6 TRUE > #7 7 TRUE > #8 8 TRUE > #9 9 FALSE > #10 10 FALSE > > > Or > > > set.seed(2020) > df3 <- data.frame(unit = 1:10) > df3$treated <- FALSE > > df3[sample(nrow(df3), 3), "treated"] <- TRUE > df3 > # result as expected > > > Hope this helps, > > Rui Barradas > > > > ?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu: > > I ran into some strange behavior in R when trying to assign a treatment > to > > rows in a data frame. I'm wondering whether any R experts can explain > > what's going on. > > > > First, let's assign a treatment to 3 out of 10 rows as follows. > > > >> df <- data.frame(unit = 1:10) > >> df$treated <- FALSE > >> s <- sample(nrow(df), 3) > >> df[s,]$treated <- TRUE > >> df > > unit treated > > > > 1 1 FALSE > > > > 2 2 TRUE > > > > 3 3 FALSE > > > > 4 4 FALSE > > > > 5 5 TRUE > > > > 6 6 FALSE > > > > 7 7 TRUE > > > > 8 8 FALSE > > > > 9 9 FALSE > > > > 10 10 FALSE > > > > This is as expected. Now we'll just skip the intermediate step of saving > > the sampled indices, and apply the treatment directly as follows. > > > >> df <- data.frame(unit = 1:10) > >> df$treated <- FALSE > >> df[sample(nrow(df), 3),]$treated <- TRUE > >> df > > unit treated > > > > 1 6 TRUE > > > > 2 2 FALSE > > > > 3 3 FALSE > > > > 4 9 TRUE > > > > 5 5 FALSE > > > > 6 6 FALSE > > > > 7 7 FALSE > > > > 8 5 TRUE > > > > 9 9 FALSE > > > > 10 10 FALSE > > > > Now the data frame still has 10 rows with 3 assigned to the treatment. > But > > the units are garbled. Units 1 and 4 have disappeared, for instance, and > > there are duplicates for 6 and 9, one assigned to treatment and the other > > to control. Why would this happen? > > > > Thanks, > > Sebastien > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Este e-mail foi verificado em termos de v?rus pelo software antiv?rus > Avast. > https://www.avast.com/antivirus > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Daniel Nordlund
2020-Jun-19 23:04 UTC
[R] Strange behavior when sampling rows of a data frame
On 6/19/2020 5:49 AM, S?bastien Lahaie wrote:> I ran into some strange behavior in R when trying to assign a treatment to > rows in a data frame. I'm wondering whether any R experts can explain > what's going on. > > First, let's assign a treatment to 3 out of 10 rows as follows. > > df <- data.frame(unit = 1:10) > df$treated <- FALSE > s <- sample(nrow(df), 3) > df[s,]$treated <- TRUE > df > unit treated > 1 1 FALSE > 2 2 TRUE > 3 3 FALSE > 4 4 FALSE > 5 5 TRUE > 6 6 FALSE > 7 7 TRUE > 8 8 FALSE > 9 9 FALSE > 10 10 FALSE > > This is as expected. Now we'll just skip the intermediate step of saving > the sampled indices, and apply the treatment directly as follows. > > df <- data.frame(unit = 1:10) > df$treated <- FALSE > df[sample(nrow(df), 3),]$treated <- TRUE > df > unit treated > 1 6 TRUE > 2 2 FALSE > 3 3 FALSE > 4 9 TRUE > 5 5 FALSE > 6 6 FALSE > 7 7 FALSE > 8 5 TRUE > 9 9 FALSE > 10 10 FALSE > > Now the data frame still has 10 rows with 3 assigned to the treatment. But > the units are garbled. Units 1 and 4 have disappeared, for instance, and > there are duplicates for 6 and 9, one assigned to treatment and the other > to control. Why would this happen? > > Thanks, > Sebastien >S?bastien, You have received good explanations of what is going on with your code.? I think you can get what you want by making a simple modification of your treatment assignment statement. At least it works for me. df[sample(nrow(df),3), 'treated'] <- TRUE Hope this is helpful, Dan -- Daniel Nordlund Port Townsend, WA USA
Sébastien Lahaie
2020-Jun-19 23:45 UTC
[R] Strange behavior when sampling rows of a data frame
Thank you all for the responses, these are the insights I was hoping for. There are many ways to get this right, and I happened to run into one that has a glitch. I see from Luke's explanation how the strange output came about. Glad to hear that this bug/behavior is already known. On Fri, Jun 19, 2020 at 7:04 PM Daniel Nordlund <djnordlund at gmail.com> wrote:> On 6/19/2020 5:49 AM, S?bastien Lahaie wrote: > > I ran into some strange behavior in R when trying to assign a treatment > to > > rows in a data frame. I'm wondering whether any R experts can explain > > what's going on. > > > > First, let's assign a treatment to 3 out of 10 rows as follows. > > > > df <- data.frame(unit = 1:10) > > df$treated <- FALSE > > s <- sample(nrow(df), 3) > > df[s,]$treated <- TRUE > > df > > unit treated > > 1 1 FALSE > > 2 2 TRUE > > 3 3 FALSE > > 4 4 FALSE > > 5 5 TRUE > > 6 6 FALSE > > 7 7 TRUE > > 8 8 FALSE > > 9 9 FALSE > > 10 10 FALSE > > > > This is as expected. Now we'll just skip the intermediate step of saving > > the sampled indices, and apply the treatment directly as follows. > > > > df <- data.frame(unit = 1:10) > > df$treated <- FALSE > > df[sample(nrow(df), 3),]$treated <- TRUE > > df > > unit treated > > 1 6 TRUE > > 2 2 FALSE > > 3 3 FALSE > > 4 9 TRUE > > 5 5 FALSE > > 6 6 FALSE > > 7 7 FALSE > > 8 5 TRUE > > 9 9 FALSE > > 10 10 FALSE > > > > Now the data frame still has 10 rows with 3 assigned to the treatment. > But > > the units are garbled. Units 1 and 4 have disappeared, for instance, and > > there are duplicates for 6 and 9, one assigned to treatment and the other > > to control. Why would this happen? > > > > Thanks, > > Sebastien > > > S?bastien, > > You have received good explanations of what is going on with your code. > I think you can get what you want by making a simple modification of > your treatment assignment statement. At least it works for me. > > df[sample(nrow(df),3), 'treated'] <- TRUE > > Hope this is helpful, > > Dan > > -- > Daniel Nordlund > Port Townsend, WA USA > >[[alternative HTML version deleted]]