thr3ads.net - R help - [R] Strange behavior when sampling rows of a data frame [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Sébastien Lahaie

2020-Jun-19 12:49 UTC

[R] Strange behavior when sampling rows of a data frame

I ran into some strange behavior in R when trying to assign a treatment to
rows in a data frame. I'm wondering whether any R experts can explain
what's going on.

First, let's assign a treatment to 3 out of 10 rows as follows.
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
>
> s <- sample(nrow(df), 3)
> df[s,]$treated <- TRUE
>
> df
   unit treated

1     1   FALSE

2     2    TRUE

3     3   FALSE

4     4   FALSE

5     5    TRUE

6     6   FALSE

7     7    TRUE

8     8   FALSE

9     9   FALSE

10   10   FALSE

This is as expected. Now we'll just skip the intermediate step of saving
the sampled indices, and apply the treatment directly as follows.
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
>
> df[sample(nrow(df), 3),]$treated <- TRUE
>
> df
   unit treated

1     6    TRUE

2     2   FALSE

3     3   FALSE

4     9    TRUE

5     5   FALSE

6     6   FALSE

7     7   FALSE

8     5    TRUE

9     9   FALSE

10   10   FALSE

Now the data frame still has 10 rows with 3 assigned to the treatment. But
the units are garbled. Units 1 and 4 have disappeared, for instance, and
there are duplicates for 6 and 9, one assigned to treatment and the other
to control. Why would this happen?

Thanks,
Sebastien

	[[alternative HTML version deleted]]

Rui Barradas

2020-Jun-19 15:45 UTC

head link

[R] Strange behavior when sampling rows of a data frame

Hello,

I don't have an answer on the reason why this happens but it seems like 
a bug. Where?

In which of? `[<-.data.frame` or `[<-.default`?

A solution is to subset and assign the vector:


set.seed(2020)
df2 <- data.frame(unit = 1:10)
df2$treated <- FALSE

df2$treated[sample(nrow(df2), 3)] <- TRUE
df2
#? unit treated
#1???? 1?? FALSE
#2???? 2?? FALSE
#3???? 3?? FALSE
#4???? 4?? FALSE
#5???? 5?? FALSE
#6???? 6??? TRUE
#7???? 7??? TRUE
#8???? 8??? TRUE
#9???? 9?? FALSE
#10?? 10?? FALSE


Or


set.seed(2020)
df3 <- data.frame(unit = 1:10)
df3$treated <- FALSE

df3[sample(nrow(df3), 3), "treated"] <- TRUE
df3
# result as expected


Hope this helps,

Rui? Barradas



?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu:> I ran into some strange behavior in R when trying to assign a treatment to
> rows in a data frame. I'm wondering whether any R experts can explain
> what's going on.
>
> First, let's assign a treatment to 3 out of 10 rows as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> s <- sample(nrow(df), 3)
>> df[s,]$treated <- TRUE
>> df
>     unit treated
>
> 1     1   FALSE
>
> 2     2    TRUE
>
> 3     3   FALSE
>
> 4     4   FALSE
>
> 5     5    TRUE
>
> 6     6   FALSE
>
> 7     7    TRUE
>
> 8     8   FALSE
>
> 9     9   FALSE
>
> 10   10   FALSE
>
> This is as expected. Now we'll just skip the intermediate step of
saving
> the sampled indices, and apply the treatment directly as follows.
>
>> df <- data.frame(unit = 1:10)
>> df$treated <- FALSE
>> df[sample(nrow(df), 3),]$treated <- TRUE
>> df
>     unit treated
>
> 1     6    TRUE
>
> 2     2   FALSE
>
> 3     3   FALSE
>
> 4     9    TRUE
>
> 5     5   FALSE
>
> 6     6   FALSE
>
> 7     7   FALSE
>
> 8     5    TRUE
>
> 9     9   FALSE
>
> 10   10   FALSE
>
> Now the data frame still has 10 rows with 3 assigned to the treatment. But
> the units are garbled. Units 1 and 4 have disappeared, for instance, and
> there are duplicates for 6 and 9, one assigned to treatment and the other
> to control. Why would this happen?
>
> Thanks,
> Sebastien
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast.
https://www.avast.com/antivirus

William Dunlap

2020-Jun-19 16:20 UTC

head link

[R] Strange behavior when sampling rows of a data frame

The first subscript argument is getting evaluated twice.> trace(sample)
> set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUEtrace: sample(10, 3)
trace: sample(10, 3)> i
[1]  1 10  4> set.seed(2020); sample(10,3)trace: sample(10, 3)
[1] 7 6 8> sample(10,3)trace: sample(10, 3)
[1]  1 10  4

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> I don't have an answer on the reason why this happens but it seems like
> a bug. Where?
>
> In which of  `[<-.data.frame` or `[<-.default`?
>
> A solution is to subset and assign the vector:
>
>
> set.seed(2020)
> df2 <- data.frame(unit = 1:10)
> df2$treated <- FALSE
>
> df2$treated[sample(nrow(df2), 3)] <- TRUE
> df2
> #  unit treated
> #1     1   FALSE
> #2     2   FALSE
> #3     3   FALSE
> #4     4   FALSE
> #5     5   FALSE
> #6     6    TRUE
> #7     7    TRUE
> #8     8    TRUE
> #9     9   FALSE
> #10   10   FALSE
>
>
> Or
>
>
> set.seed(2020)
> df3 <- data.frame(unit = 1:10)
> df3$treated <- FALSE
>
> df3[sample(nrow(df3), 3), "treated"] <- TRUE
> df3
> # result as expected
>
>
> Hope this helps,
>
> Rui  Barradas
>
>
>
> ?s 13:49 de 19/06/2020, S?bastien Lahaie escreveu:
> > I ran into some strange behavior in R when trying to assign a
treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can
explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> >> df <- data.frame(unit = 1:10)
> >> df$treated <- FALSE
> >> s <- sample(nrow(df), 3)
> >> df[s,]$treated <- TRUE
> >> df
> >     unit treated
> >
> > 1     1   FALSE
> >
> > 2     2    TRUE
> >
> > 3     3   FALSE
> >
> > 4     4   FALSE
> >
> > 5     5    TRUE
> >
> > 6     6   FALSE
> >
> > 7     7    TRUE
> >
> > 8     8   FALSE
> >
> > 9     9   FALSE
> >
> > 10   10   FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of
saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> >> df <- data.frame(unit = 1:10)
> >> df$treated <- FALSE
> >> df[sample(nrow(df), 3),]$treated <- TRUE
> >> df
> >     unit treated
> >
> > 1     6    TRUE
> >
> > 2     2   FALSE
> >
> > 3     3   FALSE
> >
> > 4     9    TRUE
> >
> > 5     5   FALSE
> >
> > 6     6   FALSE
> >
> > 7     7   FALSE
> >
> > 8     5    TRUE
> >
> > 9     9   FALSE
> >
> > 10   10   FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance,
and
> > there are duplicates for 6 and 9, one assigned to treatment and the
other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Este e-mail foi verificado em termos de v?rus pelo software antiv?rus
> Avast.
> https://www.avast.com/antivirus
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Daniel Nordlund

2020-Jun-19 23:04 UTC

head link

[R] Strange behavior when sampling rows of a data frame

On 6/19/2020 5:49 AM, S?bastien Lahaie wrote:> I ran into some strange behavior in R when trying to assign a treatment to
> rows in a data frame. I'm wondering whether any R experts can explain
> what's going on.
>
> First, let's assign a treatment to 3 out of 10 rows as follows.
>
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
> s <- sample(nrow(df), 3)
> df[s,]$treated <- TRUE
> df
>     unit treated
> 1     1   FALSE
> 2     2    TRUE
> 3     3   FALSE
> 4     4   FALSE
> 5     5    TRUE
> 6     6   FALSE
> 7     7    TRUE
> 8     8   FALSE
> 9     9   FALSE
> 10   10   FALSE
>
> This is as expected. Now we'll just skip the intermediate step of
saving
> the sampled indices, and apply the treatment directly as follows.
>
> df <- data.frame(unit = 1:10)
> df$treated <- FALSE
> df[sample(nrow(df), 3),]$treated <- TRUE
> df
>     unit treated
> 1     6    TRUE
> 2     2   FALSE
> 3     3   FALSE
> 4     9    TRUE
> 5     5   FALSE
> 6     6   FALSE
> 7     7   FALSE
> 8     5    TRUE
> 9     9   FALSE
> 10   10   FALSE
>
> Now the data frame still has 10 rows with 3 assigned to the treatment. But
> the units are garbled. Units 1 and 4 have disappeared, for instance, and
> there are duplicates for 6 and 9, one assigned to treatment and the other
> to control. Why would this happen?
>
> Thanks,
> Sebastien
>S?bastien,

You have received good explanations of what is going on with your code.? 
I think you can get what you want by making a simple modification of 
your treatment assignment statement. At least it works for me.

df[sample(nrow(df),3), 'treated'] <- TRUE

Hope this is helpful,

Dan

-- 
Daniel Nordlund
Port Townsend, WA  USA

Sébastien Lahaie

2020-Jun-19 23:45 UTC

head link

[R] Strange behavior when sampling rows of a data frame

Thank you all for the responses, these are the insights I was hoping for.
There are many ways to get this right, and I happened to run into one that
has a glitch. I see from Luke's explanation how the strange output came
about. Glad to hear that this bug/behavior is already known.

On Fri, Jun 19, 2020 at 7:04 PM Daniel Nordlund <djnordlund at gmail.com>
wrote:
> On 6/19/2020 5:49 AM, S?bastien Lahaie wrote:
> > I ran into some strange behavior in R when trying to assign a
treatment
> to
> > rows in a data frame. I'm wondering whether any R experts can
explain
> > what's going on.
> >
> > First, let's assign a treatment to 3 out of 10 rows as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > s <- sample(nrow(df), 3)
> > df[s,]$treated <- TRUE
> > df
> >     unit treated
> > 1     1   FALSE
> > 2     2    TRUE
> > 3     3   FALSE
> > 4     4   FALSE
> > 5     5    TRUE
> > 6     6   FALSE
> > 7     7    TRUE
> > 8     8   FALSE
> > 9     9   FALSE
> > 10   10   FALSE
> >
> > This is as expected. Now we'll just skip the intermediate step of
saving
> > the sampled indices, and apply the treatment directly as follows.
> >
> > df <- data.frame(unit = 1:10)
> > df$treated <- FALSE
> > df[sample(nrow(df), 3),]$treated <- TRUE
> > df
> >     unit treated
> > 1     6    TRUE
> > 2     2   FALSE
> > 3     3   FALSE
> > 4     9    TRUE
> > 5     5   FALSE
> > 6     6   FALSE
> > 7     7   FALSE
> > 8     5    TRUE
> > 9     9   FALSE
> > 10   10   FALSE
> >
> > Now the data frame still has 10 rows with 3 assigned to the treatment.
> But
> > the units are garbled. Units 1 and 4 have disappeared, for instance,
and
> > there are duplicates for 6 and 9, one assigned to treatment and the
other
> > to control. Why would this happen?
> >
> > Thanks,
> > Sebastien
> >
> S?bastien,
>
> You have received good explanations of what is going on with your code.
> I think you can get what you want by making a simple modification of
> your treatment assignment statement. At least it works for me.
>
> df[sample(nrow(df),3), 'treated'] <- TRUE
>
> Hope this is helpful,
>
> Dan
>
> --
> Daniel Nordlund
> Port Townsend, WA  USA
>
>
	[[alternative HTML version deleted]]

R help - Jun 2020 - Strange behavior when sampling rows of a data frame

[R] Strange behavior when sampling rows of a data frame

[R] Strange behavior when sampling rows of a data frame

[R] Strange behavior when sampling rows of a data frame

[R] Strange behavior when sampling rows of a data frame

[R] Strange behavior when sampling rows of a data frame