Thank you David! I rerun the your script and it is giving me the first three time periods is it doing random sampling? tab.fan time X1 X2 2 2 5 230 3 3 1 300 5 5 2 10 On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote:> Use dput() to send data to the list as it is more compact: > >> dput(tab) > structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, > 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", > "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) > > You can just remove the lines with X1 = 0 since you don't want to use them. > >> tab.sub <- tab[tab$X1>0, ] > > Then the following gives you a sample: > >> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] > > Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. > > > David L. Carlson > Department of Anthropology > Texas A&M University > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta > Sent: Saturday, November 21, 2015 11:53 AM > To: R help <r-help at r-project.org> > Subject: [R] Conditional Random selection > > Hi all, > > I have a data set that contains samples collected over time. In > each time period the total number of samples are given (X2) The goal > is to select 500 random samples. The selection should be based on > time (select time periods until I reach 500 samples). Also the time > period should have greater than 0 for X1 variable. X1 is an indicator > variable. > > Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 > > tab <- read.table(textConnection(" time X1 X2 > 1 0 251 > 2 5 230 > 3 1 300 > 4 0 25 > 5 2 10 > 6 3 101 > 7 1 300 > 8 4 185 "),header = TRUE) > > In the above example, samples from time 1 and 4 will not be selected > ( X1 is zero) > So I could reach my target by selecting time 6,7, and 8 or time 2 and > 3 and so on. > > Can any one help to do that? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David's "solution" is incorrect. It can also fail to give you times with a total of 500 items to sample from in the time periods. It is not entirely clear what you want. The solution below gives you a random sample of time periods in which X1>0 and the total number of samples among them is >= 500. It does not give you the fewest number of periods that can do this. Is this what you want? tab[with(tab,{ rownums<- sample(seq_len(nrow(tab))[X1>0]) sz <- cumsum(X2[rownums]) rownums[c(TRUE,sz<500)] }),] Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote:> Thank you David! > > I rerun the your script and it is giving me the first three time periods > is it doing random sampling? > > tab.fan > time X1 X2 > 2 2 5 230 > 3 3 1 300 > 5 5 2 10 > > > > On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: >> Use dput() to send data to the list as it is more compact: >> >>> dput(tab) >> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", >> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >> >> You can just remove the lines with X1 = 0 since you don't want to use them. >> >>> tab.sub <- tab[tab$X1>0, ] >> >> Then the following gives you a sample: >> >>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >> >> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. >> >> >> David L. Carlson >> Department of Anthropology >> Texas A&M University >> >> -----Original Message----- >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >> Sent: Saturday, November 21, 2015 11:53 AM >> To: R help <r-help at r-project.org> >> Subject: [R] Conditional Random selection >> >> Hi all, >> >> I have a data set that contains samples collected over time. In >> each time period the total number of samples are given (X2) The goal >> is to select 500 random samples. The selection should be based on >> time (select time periods until I reach 500 samples). Also the time >> period should have greater than 0 for X1 variable. X1 is an indicator >> variable. >> >> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >> >> tab <- read.table(textConnection(" time X1 X2 >> 1 0 251 >> 2 5 230 >> 3 1 300 >> 4 0 25 >> 5 2 10 >> 6 3 101 >> 7 1 300 >> 8 4 185 "),header = TRUE) >> >> In the above example, samples from time 1 and 4 will not be selected >> ( X1 is zero) >> So I could reach my target by selecting time 6,7, and 8 or time 2 and >> 3 and so on. >> >> Can any one help to do that? >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you Bert! What I want is at least 500 samples based on random sampling of time period. This allows samples collected at the same time period are included together. Your script is doing what I wanted to do!! Many thanks On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> David's "solution" is incorrect. It can also fail to give you times > with a total of 500 items to sample from in the time periods. > > It is not entirely clear what you want. The solution below gives you a > random sample of time periods in which X1>0 and the total number of > samples among them is >= 500. It does not give you the fewest number > of periods that can do this. Is this what you want? > > tab[with(tab,{ > rownums<- sample(seq_len(nrow(tab))[X1>0]) > sz <- cumsum(X2[rownums]) > rownums[c(TRUE,sz<500)] > }),] > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >> Thank you David! >> >> I rerun the your script and it is giving me the first three time periods >> is it doing random sampling? >> >> tab.fan >> time X1 X2 >> 2 2 5 230 >> 3 3 1 300 >> 5 5 2 10 >> >> >> >> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: >>> Use dput() to send data to the list as it is more compact: >>> >>>> dput(tab) >>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", >>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>> >>> You can just remove the lines with X1 = 0 since you don't want to use them. >>> >>>> tab.sub <- tab[tab$X1>0, ] >>> >>> Then the following gives you a sample: >>> >>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>> >>> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. >>> >>> >>> David L. Carlson >>> Department of Anthropology >>> Texas A&M University >>> >>> -----Original Message----- >>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >>> Sent: Saturday, November 21, 2015 11:53 AM >>> To: R help <r-help at r-project.org> >>> Subject: [R] Conditional Random selection >>> >>> Hi all, >>> >>> I have a data set that contains samples collected over time. In >>> each time period the total number of samples are given (X2) The goal >>> is to select 500 random samples. The selection should be based on >>> time (select time periods until I reach 500 samples). Also the time >>> period should have greater than 0 for X1 variable. X1 is an indicator >>> variable. >>> >>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >>> >>> tab <- read.table(textConnection(" time X1 X2 >>> 1 0 251 >>> 2 5 230 >>> 3 1 300 >>> 4 0 25 >>> 5 2 10 >>> 6 3 101 >>> 7 1 300 >>> 8 4 185 "),header = TRUE) >>> >>> In the above example, samples from time 1 and 4 will not be selected >>> ( X1 is zero) >>> So I could reach my target by selecting time 6,7, and 8 or time 2 and >>> 3 and so on. >>> >>> Can any one help to do that? >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.