Tal Galili
2009-Aug-20 15:22 UTC
[R] simple randomization question: How to perform "sample" in chunks
Hello dear R-help group. My task looks simple, but I can't seem to find a "smart" (e.g: non loop) solution to it. Task: I wish to randomize a data.frame by one column, while keeping the inner-order in the second column as is. So for example, let's say I have the following data.frame: xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , b = c(1,1,2,1,2,3,1,2,3,4) ) I would like to shuffle it by column "a", while keeping the order in column "b". Here is my "not-smart" way of doing it: # R example xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , b = c(1,1,2,1,2,3,1,2,3,4) ) randomize.by.column.a <- function(xx) { new.a.order <- sample(unique(xx$a)) new.xx <- NULL for(i in new.a.order) { xx.subset <- xx[ xx$a %in% i ,] new.xx <- rbind(new.xx , xx.subset) } return(new.xx) } randomize.by.column.a(xx) # END of - R example I would love for a better, faster, way of doing it. Thanks, Tal -- ---------------------------------------------- My contact information: Tal Galili Phone number: 972-50-3373767 FaceBook: Tal Galili My Blogs: http://www.r-statistics.com/ http://www.talgalili.com http://www.biostatistics.co.il [[alternative HTML version deleted]]
Charles C. Berry
2009-Aug-20 16:57 UTC
[R] simple randomization question: How to perform "sample" in chunks
On Thu, 20 Aug 2009, Tal Galili wrote:> Hello dear R-help group. > > My task looks simple, but I can't seem to find a "smart" (e.g: non loop) > solution to it. > > Task: I wish to randomize a data.frame by one column, while keeping the > inner-order in the second column as is.xx[ order( sample( unique( xx$a ) )[ xx$a ] ), ] HTH, Chuck> > So for example, let's say I have the following data.frame: > > xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > > I would like to shuffle it by column "a", while keeping the order in column > "b". > > Here is my "not-smart" way of doing it: > > # R example > xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > > randomize.by.column.a <- function(xx) > { > new.a.order <- sample(unique(xx$a)) > new.xx <- NULL > for(i in new.a.order) > { > xx.subset <- xx[ xx$a %in% i ,] > new.xx <- rbind(new.xx , xx.subset) > } > > return(new.xx) > } > randomize.by.column.a(xx) > # END of - R example > > > > I would love for a better, faster, way of doing it. > > Thanks, > Tal > > > > > > > > > > > -- > ---------------------------------------------- > > > My contact information: > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > http://www.r-statistics.com/ > http://www.talgalili.com > http://www.biostatistics.co.il > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Don MacQueen
2009-Aug-20 16:58 UTC
[R] simple randomization question: How to perform "sample" in chunks
I believe this will do what you want: tmp1 <- split(xx, xx$a) do.call(rbind, tmp1[ sample(length(unique(xx$a))) ]) The idea is to split the dataframe, and then reassemble in a random order. Whether or not it will be faster for a large dataframe, I don't know. There's probably also an indexing solution, perhaps using rle(), but I thought of this first... -Don At 6:22 PM +0300 8/20/09, Tal Galili wrote:>Hello dear R-help group. > >My task looks simple, but I can't seem to find a "smart" (e.g: non loop) >solution to it. > >Task: I wish to randomize a data.frame by one column, while keeping the >inner-order in the second column as is. > >So for example, let's say I have the following data.frame: > >xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > >I would like to shuffle it by column "a", while keeping the order in column >"b". > >Here is my "not-smart" way of doing it: > ># R example >xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > >randomize.by.column.a <- function(xx) >{ >new.a.order <- sample(unique(xx$a)) >new.xx <- NULL >for(i in new.a.order) >{ > xx.subset <- xx[ xx$a %in% i ,] > new.xx <- rbind(new.xx , xx.subset) >} > >return(new.xx) >} >randomize.by.column.a(xx) ># END of - R example > > > >I would love for a better, faster, way of doing it. > >Thanks, >Tal > > > > > > > > > > >-- >---------------------------------------------- > > >My contact information: >Tal Galili >Phone number: 972-50-3373767 >FaceBook: Tal Galili >My Blogs: >http://*www.*r-statistics.com/ >http://*www.*talgalili.com >http://*www.*biostatistics.co.il > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://*stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062
David Winsemius
2009-Aug-20 16:58 UTC
[R] simple randomization question: How to perform "sample" in chunks
On Aug 20, 2009, at 11:22 AM, Tal Galili wrote:> Hello dear R-help group. > > My task looks simple, but I can't seem to find a "smart" (e.g: non > loop) > solution to it. > > Task: I wish to randomize a data.frame by one column, while keeping > the > inner-order in the second column as is. > > So for example, let's say I have the following data.frame: > > xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > > I would like to shuffle it by column "a", while keeping the order in > column > "b". > > Here is my "not-smart" way of doing it: > > # R example > xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > > randomize.by.column.a <- function(xx) > { > new.a.order <- sample(unique(xx$a)) > new.xx <- NULL > for(i in new.a.order) > { > xx.subset <- xx[ xx$a %in% i ,] > new.xx <- rbind(new.xx , xx.subset) > } > > return(new.xx) > } > randomize.by.column.a(xx) > # END of - R example >It was a bit confusing to read that you wanted to "keep the order in column "b"", but your code implies that you wanted to carry the b- values along with the sorted a-values. I think this achieves the same goal: xx[sample(1:nrow(xx)), ] -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Greg Snow
2009-Aug-20 17:04 UTC
[R] simple randomization question: How to perform "sample" in chunks
Here is a one liner: (yy <- do.call( rbind, sample( split(xx, xx$a) ) )) Basically reading from inside out, it splits the data frame by a (keeping the structure of b intact within each data frame) and returns it as a list, then that list is randomized, then put back together into a single data frame again. Does this do what you want? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Tal Galili > Sent: Thursday, August 20, 2009 9:22 AM > To: r-help at r-project.org > Subject: [R] simple randomization question: How to perform "sample" in > chunks > > Hello dear R-help group. > > My task looks simple, but I can't seem to find a "smart" (e.g: non > loop) > solution to it. > > Task: I wish to randomize a data.frame by one column, while keeping the > inner-order in the second column as is. > > So for example, let's say I have the following data.frame: > > xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > > I would like to shuffle it by column "a", while keeping the order in > column > "b". > > Here is my "not-smart" way of doing it: > > # R example > xx <-data.frame(a= c(1,2,2,3,3,3,4,4,4,4) , > b = c(1,1,2,1,2,3,1,2,3,4) ) > > randomize.by.column.a <- function(xx) > { > new.a.order <- sample(unique(xx$a)) > new.xx <- NULL > for(i in new.a.order) > { > xx.subset <- xx[ xx$a %in% i ,] > new.xx <- rbind(new.xx , xx.subset) > } > > return(new.xx) > } > randomize.by.column.a(xx) > # END of - R example > > > > I would love for a better, faster, way of doing it. > > Thanks, > Tal > > > > > > > > > > > -- > ---------------------------------------------- > > > My contact information: > Tal Galili > Phone number: 972-50-3373767 > FaceBook: Tal Galili > My Blogs: > http://www.r-statistics.com/ > http://www.talgalili.com > http://www.biostatistics.co.il > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Seemingly Similar Threads
- Can I use "mcnemar.test" for 3*3 tables (or is there a bug in the command?)
- boxplot - code for labeling outliers - any suggestions for improvements?
- labelling points plotted in a 2D plan
- Sweave question - Setting Soutput code chunks to stay inside page margins?
- [package-car:Anova] extracting residuals from Anova for Type II/III Repeated Measures ?