Kevin E. Thorpe
2015-Mar-31 17:05 UTC
[R] Randomly interleaving data frames while preserving order
Hello.
I am trying to simulate recruitment in a randomized trial. Suppose I
have three streams (strata) of patients represented by these data frames.
df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
What I need to do is construct a data frame with all of these combined
where the order of selection from one of the three data frames is
randomized but once a stratum is selected patients are selected
sequentially from that data frame.
To see what I'm looking to achieve, suppose the first five subjects were
to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The
expected result should look like this:
rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
strat id pid
1 1 1 1001
2 2 1 2001
21 1 2 1002
4 3 1 3001
22 2 2 2002
I hope what I'm trying to accomplish makes sense. Maybe I'm missing
something obvious, but I really have no idea at the moment how to
achieve this elegantly. Since I need to simulate many trial recruitments
it needs to be general and compact.
I appreciate any advice.
Kevin
--
Kevin E. Thorpe
Head of Biostatistics, Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
Sarah Goslee
2015-Mar-31 17:41 UTC
[R] Randomly interleaving data frames while preserving order
That's a fun one. Here's one possible approach. (Note that it can be
done without using a loop, but I find that a loop here increases
readability.)
I wrote it to work on a list of data frames. If the selection is
random, I'd set it up so that size is passed to the function, but
selection is generated within the function using sample().
recruitment <- function(dflist, selection) {
results <- data.frame(matrix(NA, nrow=length(selection),
ncol=ncol(dflist[[1]])))
colnames(results) <- colnames(dflist[[1]])
for(i in unique(selection)) {
results[selection == i, ] <- dflist[[i]][seq_len(sum(selection ==
i)),]
}
results
}
# and your example:
df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
touse <- c(1, 2, 1, 3, 1) # could be generated using sample
dfall <- list(df1, df2, df3)
touse <- c(1, 2, 1, 3, 1)
# could be generated using sample given the size argument
# touse <- sample(seq_along(dfall), size=5, replace=TRUE)
> recruitment(dfall, touse)
strat id pid
1 1 1 1001
2 2 1 2001
3 1 2 1002
4 3 1 3001
5 1 3 1003
Sarah
On Tue, Mar 31, 2015 at 1:05 PM, Kevin E. Thorpe
<kevin.thorpe at utoronto.ca> wrote:> Hello.
>
> I am trying to simulate recruitment in a randomized trial. Suppose I have
> three streams (strata) of patients represented by these data frames.
>
> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010)
> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010)
> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010)
>
> What I need to do is construct a data frame with all of these combined
where
> the order of selection from one of the three data frames is randomized but
> once a stratum is selected patients are selected sequentially from that
data
> frame.
>
> To see what I'm looking to achieve, suppose the first five subjects
were to
> come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The expected
> result should look like this:
>
> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,])
> strat id pid
> 1 1 1 1001
> 2 2 1 2001
> 21 1 2 1002
> 4 3 1 3001
> 22 2 2 2002
>
> I hope what I'm trying to accomplish makes sense. Maybe I'm missing
> something obvious, but I really have no idea at the moment how to achieve
> this elegantly. Since I need to simulate many trial recruitments it needs
to
> be general and compact.
>
> I appreciate any advice.
>
> Kevin
>
--
Sarah Goslee
http://www.functionaldiversity.org
Duncan Murdoch
2015-Mar-31 17:44 UTC
[R] Randomly interleaving data frames while preserving order
On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote:> Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I > have three streams (strata) of patients represented by these data frames. > > df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) > df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) > df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) > > What I need to do is construct a data frame with all of these combined > where the order of selection from one of the three data frames is > randomized but once a stratum is selected patients are selected > sequentially from that data frame. > > To see what I'm looking to achieve, suppose the first five subjects were > to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > expected result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to > achieve this elegantly. Since I need to simulate many trial recruitments > it needs to be general and compact. > > I appreciate any advice.How about something like this: # Permute an ordered vector of selections: sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3)))) # Create an empty dataframe to hold the results df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] # Put the original dataframes into the appropriate slots: df[sel == 1,] <- df1 df[sel == 2,] <- df2 df[sel == 3,] <- df3 # Clean up the rownames rownames(df) <- NULL Duncan Murdoch
Kevin E. Thorpe
2015-Mar-31 17:52 UTC
[R] Randomly interleaving data frames while preserving order
On 03/31/2015 01:44 PM, Duncan Murdoch wrote:> On 31/03/2015 1:05 PM, Kevin E. Thorpe wrote: >> Hello. >> >> I am trying to simulate recruitment in a randomized trial. Suppose I >> have three streams (strata) of patients represented by these data frames. >> >> df1 <- data.frame(strat=rep(1,10),id=1:10,pid=1001:1010) >> df2 <- data.frame(strat=rep(2,10),id=1:10,pid=2001:2010) >> df3 <- data.frame(strat=rep(3,10),id=1:10,pid=3001:3010) >> >> What I need to do is construct a data frame with all of these combined >> where the order of selection from one of the three data frames is >> randomized but once a stratum is selected patients are selected >> sequentially from that data frame. >> >> To see what I'm looking to achieve, suppose the first five subjects were >> to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The >> expected result should look like this: >> >> rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) >> strat id pid >> 1 1 1 1001 >> 2 2 1 2001 >> 21 1 2 1002 >> 4 3 1 3001 >> 22 2 2 2002 >> >> I hope what I'm trying to accomplish makes sense. Maybe I'm missing >> something obvious, but I really have no idea at the moment how to >> achieve this elegantly. Since I need to simulate many trial recruitments >> it needs to be general and compact. >> >> I appreciate any advice. > > How about something like this: > > # Permute an ordered vector of selections: > sel <- sample(c(rep(1, nrow(df1)), rep(2, nrow(df2)), rep(3, nrow(df3)))) > > # Create an empty dataframe to hold the results > df <- data.frame(strat=NA, id=NA, pid=NA)[rep(1, length(sel)),] > > # Put the original dataframes into the appropriate slots: > df[sel == 1,] <- df1 > df[sel == 2,] <- df2 > df[sel == 3,] <- df3 > > # Clean up the rownames > rownames(df) <- NULL > > Duncan Murdoch >Thanks Duncan. Once you see the solution it is indeed obvious. Kevin -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
Tom Wright
2015-Mar-31 17:59 UTC
[R] Randomly interleaving data frames while preserving order
samples<-sample(c(rep(1,10),rep(2,10),rep(3,10)),30) samples[samples==1]<-1001:1010 samples[samples==2]<-2001:2010 samples[samples==3]<-3001:3010 fullDf<-rbind(df1,df2,df3) fullDf[sort(order(samples),index.return=TRUE)$ix,] On Tue, 2015-03-31 at 13:05 -0400, Kevin E. Thorpe wrote:> Hello. > > I am trying to simulate recruitment in a randomized trial. Suppose I > have three streams (strata) of patients represented by these data frames. >> > What I need to do is construct a data frame with all of these combined > where the order of selection from one of the three data frames is > randomized but once a stratum is selected patients are selected > sequentially from that data frame. > > To see what I'm looking to achieve, suppose the first five subjects were > to come, in order, from strata (data frames) 1, 2, 1, 3 and 2. The > expected result should look like this: > > rbind(df1[1,],df2[1,],df1[2,],df3[1,],df2[2,]) > strat id pid > 1 1 1 1001 > 2 2 1 2001 > 21 1 2 1002 > 4 3 1 3001 > 22 2 2 2002 > > I hope what I'm trying to accomplish makes sense. Maybe I'm missing > something obvious, but I really have no idea at the moment how to > achieve this elegantly. Since I need to simulate many trial recruitments > it needs to be general and compact. > > I appreciate any advice. > > Kevin >