Fellow R users, I am stumped on what would seem to be something fairly simple. I have a dataframe that has a variable named 'WEEK' that takes the numbers 1:26 (26 week time-period) with each number repeated five times consecutively (once for each weekday, Monday through Friday). Ex. 111112222233333.....2626262626. I would like to randomly extract two weekdays per five day week for each of 26 weeks and store this data as a separate dataframe. I have been unable to get the sample function to work properly. I have also tried using the runif function to assign random numbers to each row of my dataframe, sort the dataframe first by week number then by random number value, and finally select the first two elements from each week subset (26 weeks total, giving 52 randomly selected values). I can't figure out how to select the first two elements. My goal is to randomly select two weekdays per week (without replacement) for each of 26 consecutive weeks. Any advice would be greatly appreciated. Thank you, Mike
Mike - Perhaps these suggestions will be helpful: somedata = data.frame(week=rep(1:26,rep(5,26)),day=rep(1:5,26)) res = by(somedata,somedata$week,function(x)x[sample(1:nrow(x),2),]) do.call(rbind,res) or do.call(rbind,lapply(split(somedata,somedata$week), function(x)x[sample(1:nrow(x),2),])) or do.call(rbind,tapply(1:nrow(somedata),list(somedata$week), function(x)somedata[sample(x,2),])) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Fri, 12 Mar 2010, Hosack, Michael wrote:> Fellow R users, > > I am stumped on what would seem to be something fairly simple. > I have a dataframe that has a variable named 'WEEK' that takes > the numbers 1:26 (26 week time-period) with each number repeated > five times consecutively (once for each weekday, Monday through > Friday). Ex. 111112222233333.....2626262626. I would like to > randomly extract two weekdays per five day week for each of > 26 weeks and store this data as a separate dataframe. I have > been unable to get the sample function to work properly. > I have also tried using the runif function to assign random > numbers to each row of my dataframe, sort the dataframe first > by week number then by random number value, and finally select > the first two elements from each week subset (26 weeks total, > giving 52 randomly selected values). I can't figure out how > to select the first two elements. My goal is to randomly > select two weekdays per week (without replacement) for each of > 26 consecutive weeks. Any advice would be greatly appreciated. > > Thank you, > > Mike > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Mar 12, 2010, at 3:06 PM, Hosack, Michael wrote:> Fellow R users, > > I am stumped on what would seem to be something fairly simple. > I have a dataframe that has a variable named 'WEEK' that takes > the numbers 1:26 (26 week time-period) with each number repeated > five times consecutively (once for each weekday, Monday through > Friday). Ex. 111112222233333.....2626262626. I would like to > randomly extract two weekdays per five day week for each of > 26 weeks and store this data as a separate dataframe. I have > been unable to get the sample function to work properly. > I have also tried using the runif function to assign random > numbers to each row of my dataframe, sort the dataframe first > by week number then by random number value, and finally select > the first two elements from each week subset (26 weeks total, > giving 52 randomly selected values). I can't figure out how > to select the first two elements. My goal is to randomly > select two weekdays per week (without replacement) for each of > 26 consecutive weeks. Any advice would be greatly appreciated.> replicate(26,sample(1:5, 2)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [, 13] [,14] [,15] [,16] [,17] [1,] 4 1 3 2 3 1 3 5 1 1 2 4 2 5 1 1 5 [2,] 1 3 4 1 2 3 4 3 3 2 4 5 1 2 3 5 1 [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [1,] 2 4 5 4 5 3 3 4 4 [2,] 4 2 2 1 2 1 1 1 2 > replicate(26,sample(1:5, 2))[,1] [1] 1 4 -- David Winsemius, MD West Hartford, CT
Hi Mike, take an index vector that selects Monday and Tuesday out of each week, and then run a restricted random permutation on this vector which only permutes indices within each week. rperm() is in the sna package. library(sna) foo <- rep(c(TRUE,TRUE,FALSE,FALSE,FALSE),26) your.data[foo[rperm(rep(seq(1,26),each=5))],] HTH, Stephan Hosack, Michael schrieb:> Fellow R users, > > I am stumped on what would seem to be something fairly simple. > I have a dataframe that has a variable named 'WEEK' that takes > the numbers 1:26 (26 week time-period) with each number repeated > five times consecutively (once for each weekday, Monday through > Friday). Ex. 111112222233333.....2626262626. I would like to > randomly extract two weekdays per five day week for each of > 26 weeks and store this data as a separate dataframe. I have > been unable to get the sample function to work properly. > I have also tried using the runif function to assign random > numbers to each row of my dataframe, sort the dataframe first > by week number then by random number value, and finally select > the first two elements from each week subset (26 weeks total, > giving 52 randomly selected values). I can't figure out how > to select the first two elements. My goal is to randomly > select two weekdays per week (without replacement) for each of > 26 consecutive weeks. Any advice would be greatly appreciated. > > Thank you, > > Mike > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi: A ddply solution: library(plyr) somedata = data.frame(week=rep(1:26,rep(5,26)),day=rep(1:5,26)) # sample two rows out of five per week daysamp <- function(x) x[sample(1:5, 2), ] # Ram it through ddply: ddply(somedata, .(week), daysamp) First part of output: week day 1 1 4 2 1 3 3 2 2 4 2 1 5 3 4 6 3 1 7 4 1 8 4 5 (52 rows in all, as expected) HTH, Dennis On Fri, Mar 12, 2010 at 12:06 PM, Hosack, Michael <mhosack@state.pa.us>wrote:> Fellow R users, > > I am stumped on what would seem to be something fairly simple. > I have a dataframe that has a variable named 'WEEK' that takes > the numbers 1:26 (26 week time-period) with each number repeated > five times consecutively (once for each weekday, Monday through > Friday). Ex. 111112222233333.....2626262626. I would like to > randomly extract two weekdays per five day week for each of > 26 weeks and store this data as a separate dataframe. I have > been unable to get the sample function to work properly. > I have also tried using the runif function to assign random > numbers to each row of my dataframe, sort the dataframe first > by week number then by random number value, and finally select > the first two elements from each week subset (26 weeks total, > giving 52 randomly selected values). I can't figure out how > to select the first two elements. My goal is to randomly > select two weekdays per week (without replacement) for each of > 26 consecutive weeks. Any advice would be greatly appreciated. > > Thank you, > > Mike > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 3/12/2010 3:06 PM, Hosack, Michael wrote:> Fellow R users, > > I am stumped on what would seem to be something fairly simple. > I have a dataframe that has a variable named 'WEEK' that takes > the numbers 1:26 (26 week time-period) with each number repeated > five times consecutively (once for each weekday, Monday through > Friday). Ex. 111112222233333.....2626262626. I would like to > randomly extract two weekdays per five day week for each of > 26 weeks and store this data as a separate dataframe. I have > been unable to get the sample function to work properly. > I have also tried using the runif function to assign random > numbers to each row of my dataframe, sort the dataframe first > by week number then by random number value, and finally select > the first two elements from each week subset (26 weeks total, > giving 52 randomly selected values). I can't figure out how > to select the first two elements. My goal is to randomly > select two weekdays per week (without replacement) for each of > 26 consecutive weeks. Any advice would be greatly appreciated.DF <- data.frame(WEEK = rep(1:26, each=5), DAY = rep(1:5, 26), X runif(5*26)) DF2 <- data.frame(DAY = c(replicate(26, sample(5, 2, replace=FALSE))), WEEK = rep(1:26, each=2)) new.DF <- merge(DF, DF2, all=FALSE)> Thank you, > > Mike > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
I would just get row indices: row.indices <- as.vector(sapply(0:25 * 5 + 1, function(x) {sort(sample(x:(x+4), 2))})) new.data.fram <- your.data.frame[row.indices, ] Cheers, /Ali On Fri, Mar 12, 2010 at 15:06, Hosack, Michael <mhosack at state.pa.us> wrote:> Fellow R users, > > I am stumped on what would seem to be something fairly simple. > I have a dataframe that has a variable named 'WEEK' that takes > the numbers 1:26 (26 week time-period) with each number repeated > five times consecutively (once for each weekday, Monday through > Friday). Ex. 111112222233333.....2626262626. I would like to > randomly extract two weekdays per five day week for each of > 26 weeks and store this data as a separate dataframe. I have > been unable to get the sample function to work properly. > I have also tried using the runif function to assign random > numbers to each row of my dataframe, sort the dataframe first > by week number then by random number value, and finally select > the first two elements from each week subset (26 weeks total, > giving 52 randomly selected values). ?I can't figure out how > to select the first two elements. My goal is to randomly > select two weekdays per week (without replacement) for each of > 26 consecutive weeks. Any advice would be greatly appreciated. > > Thank you, > > Mike > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >