William Dunlap
2020-Jun-19 16:20 UTC
[R] Strange behavior when sampling rows of a data frame
The first subscript argument is getting evaluated twice.> trace(sample) > set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUEtrace: sample(10, 3) trace: sample(10, 3)> i[1] 1 10 4> set.seed(2020); sample(10,3)trace: sample(10, 3) [1] 7 6 8> sample(10,3)trace: sample(10, 3) [1] 1 10 4 Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > I don't have an answer on the reason why this happens but it seems like > a bug. Where? > > In which of `[<-.data.frame` or `[<-.default`? > > A solution is to subset and assign the vector: > > > set.seed(2020) > df2 <- data.frame(unit = 1:10) > df2$treated <- FALSE > > df2$treated[sample(nrow(df2), 3)] <- TRUE > df2 > # unit treated > #1 1 FALSE > #2 2 FALSE > #3 3 FALSE > #4 4 FALSE > #5 5 FALSE > #6 6 TRUE > #7 7 TRUE > #8 8 TRUE > #9 9 FALSE > #10 10 FALSE > > > Or > > > set.seed(2020) > df3 <- data.frame(unit = 1:10) > df3$treated <- FALSE > > df3[sample(nrow(df3), 3), "treated"] <- TRUE > df3 > # result as expected > > > Hope this helps, > > Rui Barradas > > > > ?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu: > > I ran into some strange behavior in R when trying to assign a treatment > to > > rows in a data frame. I'm wondering whether any R experts can explain > > what's going on. > > > > First, let's assign a treatment to 3 out of 10 rows as follows. > > > >> df <- data.frame(unit = 1:10) > >> df$treated <- FALSE > >> s <- sample(nrow(df), 3) > >> df[s,]$treated <- TRUE > >> df > > unit treated > > > > 1 1 FALSE > > > > 2 2 TRUE > > > > 3 3 FALSE > > > > 4 4 FALSE > > > > 5 5 TRUE > > > > 6 6 FALSE > > > > 7 7 TRUE > > > > 8 8 FALSE > > > > 9 9 FALSE > > > > 10 10 FALSE > > > > This is as expected. Now we'll just skip the intermediate step of saving > > the sampled indices, and apply the treatment directly as follows. > > > >> df <- data.frame(unit = 1:10) > >> df$treated <- FALSE > >> df[sample(nrow(df), 3),]$treated <- TRUE > >> df > > unit treated > > > > 1 6 TRUE > > > > 2 2 FALSE > > > > 3 3 FALSE > > > > 4 9 TRUE > > > > 5 5 FALSE > > > > 6 6 FALSE > > > > 7 7 FALSE > > > > 8 5 TRUE > > > > 9 9 FALSE > > > > 10 10 FALSE > > > > Now the data frame still has 10 rows with 3 assigned to the treatment. > But > > the units are garbled. Units 1 and 4 have disappeared, for instance, and > > there are duplicates for 6 and 9, one assigned to treatment and the other > > to control. Why would this happen? > > > > Thanks, > > Sebastien > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Este e-mail foi verificado em termos de v?rus pelo software antiv?rus > Avast. > https://www.avast.com/antivirus > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Rui Barradas
2020-Jun-19 17:37 UTC
[R] Strange behavior when sampling rows of a data frame
Hello, Thanks, I hadn't thought of that. But, why? Is it evaluated once before assignment and a second time when the assignment occurs? To trace both sample and `[<-` gives 2 calls to sample. trace(sample) trace(`[<-`) df[sample(nrow(df), 3),]$treated <- TRUE trace: sample(nrow(df), 3) trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L, 6L, 8L), treated = c(TRUE, TRUE, TRUE))) trace: sample(nrow(df), 3) Regards, Rui Barradas ?s 17:20 de 19/06/2020, William Dunlap escreveu:> The first subscript argument is getting evaluated twice. > > trace(sample) > > set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE > trace: sample(10, 3) > trace: sample(10, 3) > > i > [1] ?1 10 ?4 > > set.seed(2020); sample(10,3) > trace: sample(10, 3) > [1] 7 6 8 > > sample(10,3) > trace: sample(10, 3) > [1] ?1 10 ?4 > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > > On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas at sapo.pt > <mailto:ruipbarradas at sapo.pt>> wrote: > > Hello, > > I don't have an answer on the reason why this happens but it seems > like > a bug. Where? > > In which of? `[<-.data.frame` or `[<-.default`? > > A solution is to subset and assign the vector: > > > set.seed(2020) > df2 <- data.frame(unit = 1:10) > df2$treated <- FALSE > > df2$treated[sample(nrow(df2), 3)] <- TRUE > df2 > #? unit treated > #1???? 1?? FALSE > #2???? 2?? FALSE > #3???? 3?? FALSE > #4???? 4?? FALSE > #5???? 5?? FALSE > #6???? 6??? TRUE > #7???? 7??? TRUE > #8???? 8??? TRUE > #9???? 9?? FALSE > #10?? 10?? FALSE > > > Or > > > set.seed(2020) > df3 <- data.frame(unit = 1:10) > df3$treated <- FALSE > > df3[sample(nrow(df3), 3), "treated"] <- TRUE > df3 > # result as expected > > > Hope this helps, > > Rui? Barradas > > > > ?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu: > > I ran into some strange behavior in R when trying to assign a > treatment to > > rows in a data frame. I'm wondering whether any R experts can > explain > > what's going on. > > > > First, let's assign a treatment to 3 out of 10 rows as follows. > > > >> df <- data.frame(unit = 1:10) > >> df$treated <- FALSE > >> s <- sample(nrow(df), 3) > >> df[s,]$treated <- TRUE > >> df > >? ? ?unit treated > > > > 1? ? ?1? ?FALSE > > > > 2? ? ?2? ? TRUE > > > > 3? ? ?3? ?FALSE > > > > 4? ? ?4? ?FALSE > > > > 5? ? ?5? ? TRUE > > > > 6? ? ?6? ?FALSE > > > > 7? ? ?7? ? TRUE > > > > 8? ? ?8? ?FALSE > > > > 9? ? ?9? ?FALSE > > > > 10? ?10? ?FALSE > > > > This is as expected. Now we'll just skip the intermediate step > of saving > > the sampled indices, and apply the treatment directly as follows. > > > >> df <- data.frame(unit = 1:10) > >> df$treated <- FALSE > >> df[sample(nrow(df), 3),]$treated <- TRUE > >> df > >? ? ?unit treated > > > > 1? ? ?6? ? TRUE > > > > 2? ? ?2? ?FALSE > > > > 3? ? ?3? ?FALSE > > > > 4? ? ?9? ? TRUE > > > > 5? ? ?5? ?FALSE > > > > 6? ? ?6? ?FALSE > > > > 7? ? ?7? ?FALSE > > > > 8? ? ?5? ? TRUE > > > > 9? ? ?9? ?FALSE > > > > 10? ?10? ?FALSE > > > > Now the data frame still has 10 rows with 3 assigned to the > treatment. But > > the units are garbled. Units 1 and 4 have disappeared, for > instance, and > > there are duplicates for 6 and 9, one assigned to treatment and > the other > > to control. Why would this happen? > > > > Thanks, > > Sebastien > > > >? ? ? ?[[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list > -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Este e-mail foi verificado em termos de v?rus pelo software > antiv?rus Avast. > https://www.avast.com/antivirus > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast. https://www.avast.com/antivirus
William Dunlap
2020-Jun-19 17:42 UTC
[R] Strange behavior when sampling rows of a data frame
It is a bug that has been present in R since at least R-2.14.0 (the oldest that I have installed on my laptop). Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jun 19, 2020 at 10:37 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > > Thanks, I hadn't thought of that. > > But, why? Is it evaluated once before assignment and a second time when > the assignment occurs? > > To trace both sample and `[<-` gives 2 calls to sample. > > > trace(sample) > trace(`[<-`) > df[sample(nrow(df), 3),]$treated <- TRUE > trace: sample(nrow(df), 3) > trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L, > 6L, 8L), treated = c(TRUE, TRUE, TRUE))) > trace: sample(nrow(df), 3) > > > Regards, > > Rui Barradas > > > ?s 17:20 de 19/06/2020, William Dunlap escreveu: > > The first subscript argument is getting evaluated twice. > > > trace(sample) > > > set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE > > trace: sample(10, 3) > > trace: sample(10, 3) > > > i > > [1] 1 10 4 > > > set.seed(2020); sample(10,3) > > trace: sample(10, 3) > > [1] 7 6 8 > > > sample(10,3) > > trace: sample(10, 3) > > [1] 1 10 4 > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com <http://tibco.com> > > > > > > On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas at sapo.pt > > <mailto:ruipbarradas at sapo.pt>> wrote: > > > > Hello, > > > > I don't have an answer on the reason why this happens but it seems > > like > > a bug. Where? > > > > In which of `[<-.data.frame` or `[<-.default`? > > > > A solution is to subset and assign the vector: > > > > > > set.seed(2020) > > df2 <- data.frame(unit = 1:10) > > df2$treated <- FALSE > > > > df2$treated[sample(nrow(df2), 3)] <- TRUE > > df2 > > # unit treated > > #1 1 FALSE > > #2 2 FALSE > > #3 3 FALSE > > #4 4 FALSE > > #5 5 FALSE > > #6 6 TRUE > > #7 7 TRUE > > #8 8 TRUE > > #9 9 FALSE > > #10 10 FALSE > > > > > > Or > > > > > > set.seed(2020) > > df3 <- data.frame(unit = 1:10) > > df3$treated <- FALSE > > > > df3[sample(nrow(df3), 3), "treated"] <- TRUE > > df3 > > # result as expected > > > > > > Hope this helps, > > > > Rui Barradas > > > > > > > > ?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu: > > > I ran into some strange behavior in R when trying to assign a > > treatment to > > > rows in a data frame. I'm wondering whether any R experts can > > explain > > > what's going on. > > > > > > First, let's assign a treatment to 3 out of 10 rows as follows. > > > > > >> df <- data.frame(unit = 1:10) > > >> df$treated <- FALSE > > >> s <- sample(nrow(df), 3) > > >> df[s,]$treated <- TRUE > > >> df > > > unit treated > > > > > > 1 1 FALSE > > > > > > 2 2 TRUE > > > > > > 3 3 FALSE > > > > > > 4 4 FALSE > > > > > > 5 5 TRUE > > > > > > 6 6 FALSE > > > > > > 7 7 TRUE > > > > > > 8 8 FALSE > > > > > > 9 9 FALSE > > > > > > 10 10 FALSE > > > > > > This is as expected. Now we'll just skip the intermediate step > > of saving > > > the sampled indices, and apply the treatment directly as follows. > > > > > >> df <- data.frame(unit = 1:10) > > >> df$treated <- FALSE > > >> df[sample(nrow(df), 3),]$treated <- TRUE > > >> df > > > unit treated > > > > > > 1 6 TRUE > > > > > > 2 2 FALSE > > > > > > 3 3 FALSE > > > > > > 4 9 TRUE > > > > > > 5 5 FALSE > > > > > > 6 6 FALSE > > > > > > 7 7 FALSE > > > > > > 8 5 TRUE > > > > > > 9 9 FALSE > > > > > > 10 10 FALSE > > > > > > Now the data frame still has 10 rows with 3 assigned to the > > treatment. But > > > the units are garbled. Units 1 and 4 have disappeared, for > > instance, and > > > there are duplicates for 6 and 9, one assigned to treatment and > > the other > > > to control. Why would this happen? > > > > > > Thanks, > > > Sebastien > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list > > -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Este e-mail foi verificado em termos de v?rus pelo software > > antiv?rus Avast. > > https://www.avast.com/antivirus > > > > ______________________________________________ > > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > > To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Este e-mail foi verificado em termos de v?rus pelo software antiv?rus > Avast. > https://www.avast.com/antivirus > >[[alternative HTML version deleted]]