thr3ads.net - R help - [R] simple randomization question: How to perform "sample" in chunks [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Tal Galili

2009-Aug-20 15:22 UTC

[R] simple randomization question: How to perform "sample" in chunks

Hello dear R-help group.

My task looks simple, but I can't seem to find a "smart" (e.g: non
loop)
solution to it.

Task: I wish to randomize a data.frame by one column, while keeping the
inner-order in the second column as is.

So for example, let's say I have the following data.frame:

xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
                        b =  c(1,1,2,1,2,3,1,2,3,4) )

I would like to shuffle it by column "a", while keeping the order in
column
"b".

Here is my "not-smart" way of doing it:

# R example
xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
                        b =  c(1,1,2,1,2,3,1,2,3,4) )

randomize.by.column.a <- function(xx)
{
new.a.order <- sample(unique(xx$a))
new.xx <- NULL
for(i in new.a.order)
{
  xx.subset <- xx[ xx$a %in% i ,]
  new.xx <- rbind(new.xx ,  xx.subset)
}

return(new.xx)
}
randomize.by.column.a(xx)
# END of - R example



I would love for a better, faster, way of doing it.

Thanks,
Tal










-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
http://www.r-statistics.com/
http://www.talgalili.com
http://www.biostatistics.co.il

	[[alternative HTML version deleted]]

Charles C. Berry

2009-Aug-20 16:57 UTC

head link

[R] simple randomization question: How to perform "sample" in chunks

On Thu, 20 Aug 2009, Tal Galili wrote:
> Hello dear R-help group.
>
> My task looks simple, but I can't seem to find a "smart"
(e.g: non loop)
> solution to it.
>
> Task: I wish to randomize a data.frame by one column, while keeping the
> inner-order in the second column as is.

 	xx[ order( sample( unique( xx$a ) )[ xx$a ] ), ]


HTH,

Chuck

>
> So for example, let's say I have the following data.frame:
>
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                        b =  c(1,1,2,1,2,3,1,2,3,4) )
>
> I would like to shuffle it by column "a", while keeping the order
in column
> "b".
>
> Here is my "not-smart" way of doing it:
>
> # R example
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                        b =  c(1,1,2,1,2,3,1,2,3,4) )
>
> randomize.by.column.a <- function(xx)
> {
> new.a.order <- sample(unique(xx$a))
> new.xx <- NULL
> for(i in new.a.order)
> {
>  xx.subset <- xx[ xx$a %in% i ,]
>  new.xx <- rbind(new.xx ,  xx.subset)
> }
>
> return(new.xx)
> }
> randomize.by.column.a(xx)
> # END of - R example
>
>
>
> I would love for a better, faster, way of doing it.
>
> Thanks,
> Tal
>
>
>
>
>
>
>
>
>
>
> -- 
> ----------------------------------------------
>
>
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> http://www.r-statistics.com/
> http://www.talgalili.com
> http://www.biostatistics.co.il
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

Don MacQueen

2009-Aug-20 16:58 UTC

head link

[R] simple randomization question: How to perform "sample" in chunks

I believe this will do what you want:

   tmp1 <- split(xx, xx$a)
   do.call(rbind, tmp1[ sample(length(unique(xx$a))) ])

The idea is to split the dataframe, and then reassemble in a random order.

Whether or not it will be faster for a large dataframe, I don't know.

There's probably also an indexing solution, perhaps using rle(), but 
I thought of this first...

-Don

At 6:22 PM +0300 8/20/09, Tal Galili wrote:>Hello dear R-help group.
>
>My task looks simple, but I can't seem to find a "smart" (e.g:
non loop)
>solution to it.
>
>Task: I wish to randomize a data.frame by one column, while keeping the
>inner-order in the second column as is.
>
>So for example, let's say I have the following data.frame:
>
>xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                         b =  c(1,1,2,1,2,3,1,2,3,4) )
>
>I would like to shuffle it by column "a", while keeping the order
in column
>"b".
>
>Here is my "not-smart" way of doing it:
>
># R example
>xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                         b =  c(1,1,2,1,2,3,1,2,3,4) )
>
>randomize.by.column.a <- function(xx)
>{
>new.a.order <- sample(unique(xx$a))
>new.xx <- NULL
>for(i in new.a.order)
>{
>   xx.subset <- xx[ xx$a %in% i ,]
>   new.xx <- rbind(new.xx ,  xx.subset)
>}
>
>return(new.xx)
>}
>randomize.by.column.a(xx)
># END of - R example
>
>
>
>I would love for a better, faster, way of doing it.
>
>Thanks,
>Tal
>
>
>
>
>
>
>
>
>
>
>--
>----------------------------------------------
>
>
>My contact information:
>Tal Galili
>Phone number: 972-50-3373767
>FaceBook: Tal Galili
>My Blogs:
>http://*www.*r-statistics.com/
>http://*www.*talgalili.com
>http://*www.*biostatistics.co.il
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

David Winsemius

2009-Aug-20 16:58 UTC

head link

[R] simple randomization question: How to perform "sample" in chunks

On Aug 20, 2009, at 11:22 AM, Tal Galili wrote:
> Hello dear R-help group.
>
> My task looks simple, but I can't seem to find a "smart"
(e.g: non
> loop)
> solution to it.
>
> Task: I wish to randomize a data.frame by one column, while keeping  
> the
> inner-order in the second column as is.
>
> So for example, let's say I have the following data.frame:
>
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                        b =  c(1,1,2,1,2,3,1,2,3,4) )
>
> I would like to shuffle it by column "a", while keeping the order
in
> column
> "b".
>
> Here is my "not-smart" way of doing it:
>
> # R example
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                        b =  c(1,1,2,1,2,3,1,2,3,4) )
>
> randomize.by.column.a <- function(xx)
> {
> new.a.order <- sample(unique(xx$a))
> new.xx <- NULL
> for(i in new.a.order)
> {
>  xx.subset <- xx[ xx$a %in% i ,]
>  new.xx <- rbind(new.xx ,  xx.subset)
> }
>
> return(new.xx)
> }
> randomize.by.column.a(xx)
> # END of - R example
>
It was a bit confusing to read that you wanted to "keep the order in  
column "b"", but your code implies that you wanted to carry the
b-
values along with the sorted a-values. I think this achieves the same  
goal:

xx[sample(1:nrow(xx)), ]

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Greg Snow

2009-Aug-20 17:04 UTC

head link

[R] simple randomization question: How to perform "sample" in chunks

Here is a one liner:

(yy <- do.call( rbind, sample( split(xx, xx$a) ) ))

Basically reading from inside out, it splits the data frame by a (keeping the
structure of b intact within each data frame) and returns it as a list, then
that list is randomized, then put back together into a single data frame again.

Does this do what you want?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Tal Galili
> Sent: Thursday, August 20, 2009 9:22 AM
> To: r-help at r-project.org
> Subject: [R] simple randomization question: How to perform
"sample" in
> chunks
> 
> Hello dear R-help group.
> 
> My task looks simple, but I can't seem to find a "smart"
(e.g: non
> loop)
> solution to it.
> 
> Task: I wish to randomize a data.frame by one column, while keeping the
> inner-order in the second column as is.
> 
> So for example, let's say I have the following data.frame:
> 
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                         b =  c(1,1,2,1,2,3,1,2,3,4) )
> 
> I would like to shuffle it by column "a", while keeping the order
in
> column
> "b".
> 
> Here is my "not-smart" way of doing it:
> 
> # R example
> xx <-data.frame(a=  c(1,2,2,3,3,3,4,4,4,4) ,
>                         b =  c(1,1,2,1,2,3,1,2,3,4) )
> 
> randomize.by.column.a <- function(xx)
> {
> new.a.order <- sample(unique(xx$a))
> new.xx <- NULL
> for(i in new.a.order)
> {
>   xx.subset <- xx[ xx$a %in% i ,]
>   new.xx <- rbind(new.xx ,  xx.subset)
> }
> 
> return(new.xx)
> }
> randomize.by.column.a(xx)
> # END of - R example
> 
> 
> 
> I would love for a better, faster, way of doing it.
> 
> Thanks,
> Tal
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> ----------------------------------------------
> 
> 
> My contact information:
> Tal Galili
> Phone number: 972-50-3373767
> FaceBook: Tal Galili
> My Blogs:
> http://www.r-statistics.com/
> http://www.talgalili.com
> http://www.biostatistics.co.il
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Aug 2009 - simple randomization question: How to perform "sample" in chunks

[R] simple randomization question: How to perform "sample" in chunks

[R] simple randomization question: How to perform "sample" in chunks

[R] simple randomization question: How to perform "sample" in chunks

[R] simple randomization question: How to perform "sample" in chunks

[R] simple randomization question: How to perform "sample" in chunks

Seemingly Similar Threads