Kevin E. Thorpe
2015-Mar-31 17:05 UTC
[R] Randomly interleaving data frames while preserving order
Hello. I am trying to simulate recruitment in a randomized trial. Suppose I have three streams (strata) of patients represented by these data frames. df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) What I need to do is construct a data frame with all of these combined where the order of selection from one of the three data frames is randomized but once a stratum is selected patients are selected sequentially from that data frame. To see what I'm looking to achieve, suppose the first five subjects were to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected result should look like this: rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) strat id pid 1 1 1 1001 2 2 1 2001 21 1 2 1002 4 3 1 3001 22 2 2 2002 I hope what I'm trying to accomplish makes sense. Maybe I'm missing something obvious, but I really have no idea at the moment how to achieve this elegantly. Since I need to simulate many trial recruitments it needs to be general and compact. I appreciate any advice. Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
Sarah Goslee
2015-Mar-31 17:41 UTC
[R] Randomly interleaving data frames while preserving order
That's a fun one. Here's one possible approach. (Note that it can be done without using a loop, but I find that a loop here increases readability.) I wrote it to work on a list of data frames. If the selection is random, I'd set it up so that size is passed to the function, but selection is generated within the function using sample(). recruitment <- function(dflist, selection) { results <- data.frame(matrix(NA, nrow=length(selection), ncol=ncol(dflist[[1]]))) colnames(results) <- colnames(dflist[[1]]) for(i in unique(selection)) { results[selection == i, ] <- dflist[[i]][seq_len(sum(selection == i)),] } results } # and your example: df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) touse <- c(1, 2, 1, 3, 1) # could be generated using sample dfall <- list(df1, df2, df3) touse <- c(1, 2, 1, 3, 1) # could be generated using sample given the size argument # touse <- sample(seq_along(dfall), size=5, replace=TRUE)> recruitment(dfall, touse)strat id pid 1 1 1 1001 2 2 1 2001 3 1 2 1002 4 3 1 3001 5 1 3 1003 Sarah On Tue, Mar 31, 2015 at 1:05 PM, Kevin E. Thorpe <kevin.thorpe at utoronto.ca> wrote:> Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I have > three streams (strata) of patients represented by these data frames. > > df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) > df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) > df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) > > What I need to do is construct a data frame with all of these combined where > the order of selection from one of the three data frames is randomized but > once a stratum is selected patients are selected sequentially from that data > frame. > > To see what I'm looking to achieve, suppose the first five subjects were to > come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected > result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to achieve > this elegantly. Since I need to simulate many trial recruitments it needs to > be general and compact. > > I appreciate any advice. > > Kevin >-- Sarah Goslee http://www.functionaldiversity.org
Duncan Murdoch
2015-Mar-31 17:44 UTC
[R] Randomly interleaving data frames while preserving order
On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:> Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I > have three streams (strata) of patients represented by these data frames. > > df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) > df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) > df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) > > What I need to do is construct a data frame with all of these combined > where the order of selection from one of the three data frames is > randomized but once a stratum is selected patients are selected > sequentially from that data frame. > > To see what I'm looking to achieve, suppose the first five subjects were > to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > expected result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to > achieve this elegantly. Since I need to simulate many trial recruitments > it needs to be general and compact. > > I appreciate any advice.How about something like this: # Permute an ordered vector of selections: sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3)))) # Create an empty dataframe to hold the results df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] # Put the original dataframes into the appropriate slots: df[sel == 1,] <- df1 df[sel == 2,] <- df2 df[sel == 3,] <- df3 # Clean up the rownames rownames(df) <- NULL Duncan Murdoch
Kevin E. Thorpe
2015-Mar-31 17:52 UTC
[R] Randomly interleaving data frames while preserving order
On 03/31/2015 01:44 PM, Duncan Murdoch wrote:> On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote: >> Hello. >> >> I am trying to simulate recruitment in a randomized trial. Suppose I >> have three streams (strata) of patients represented by these data frames. >> >> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) >> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) >> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) >> >> What I need to do is construct a data frame with all of these combined >> where the order of selection from one of the three data frames is >> randomized but once a stratum is selected patients are selected >> sequentially from that data frame. >> >> To see what I'm looking to achieve, suppose the first five subjects were >> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The >> expected result should look like this: >> >> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) >> strat id pid >> 1 1 1 1001 >> 2 2 1 2001 >> 21 1 2 1002 >> 4 3 1 3001 >> 22 2 2 2002 >> >> I hope what I'm trying to accomplish makes sense. Maybe I'm missing >> something obvious, but I really have no idea at the moment how to >> achieve this elegantly. Since I need to simulate many trial recruitments >> it needs to be general and compact. >> >> I appreciate any advice. > > How about something like this: > > # Permute an ordered vector of selections: > sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3)))) > > # Create an empty dataframe to hold the results > df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] > > # Put the original dataframes into the appropriate slots: > df[sel == 1,] <- df1 > df[sel == 2,] <- df2 > df[sel == 3,] <- df3 > > # Clean up the rownames > rownames(df) <- NULL > > Duncan Murdoch >Thanks Duncan. Once you see the solution it is indeed obvious. Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
Tom Wright
2015-Mar-31 17:59 UTC
[R] Randomly interleaving data frames while preserving order
samples<-sample(c(rep(1,10),rep(2,10),rep(3,10)),30) samples[samples==1]<-1001:1010 samples[samples==2]<-2001:2010 samples[samples==3]<-3001:3010 fullDf<-rbind(df1,df2,df3) fullDf[sort(order(samples),index.return=TRUE)$ix,] On Tue, 2015-03-31 at 13:05 -0400, Kevin E. Thorpe wrote:> Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I > have three streams (strata) of patients represented by these data frames. >> > What I need to do is construct a data frame with all of these combined > where the order of selection from one of the three data frames is > randomized but once a stratum is selected patients are selected > sequentially from that data frame. > > To see what I'm looking to achieve, suppose the first five subjects were > to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > expected result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to > achieve this elegantly. Since I need to simulate many trial recruitments > it needs to be general and compact. > > I appreciate any advice. > > Kevin >