Thank you Bert! What I want is at least 500 samples based on random sampling of time period. This allows samples collected at the same time period are included together. Your script is doing what I wanted to do!! Many thanks On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> David's "solution" is incorrect. It can also fail to give you times > with a total of 500 items to sample from in the time periods. > > It is not entirely clear what you want. The solution below gives you a > random sample of time periods in which X1>0 and the total number of > samples among them is >= 500. It does not give you the fewest number > of periods that can do this. Is this what you want? > > tab[with(tab,{ > rownums<- sample(seq_len(nrow(tab))[X1>0]) > sz <- cumsum(X2[rownums]) > rownums[c(TRUE,sz<500)] > }),] > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >> Thank you David! >> >> I rerun the your script and it is giving me the first three time periods >> is it doing random sampling? >> >> tab.fan >> time X1 X2 >> 2 2 5 230 >> 3 3 1 300 >> 5 5 2 10 >> >> >> >> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: >>> Use dput() to send data to the list as it is more compact: >>> >>>> dput(tab) >>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", >>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>> >>> You can just remove the lines with X1 = 0 since you don't want to use them. >>> >>>> tab.sub <- tab[tab$X1>0, ] >>> >>> Then the following gives you a sample: >>> >>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>> >>> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. >>> >>> >>> David L. Carlson >>> Department of Anthropology >>> Texas A&M University >>> >>> -----Original Message----- >>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >>> Sent: Saturday, November 21, 2015 11:53 AM >>> To: R help <r-help at r-project.org> >>> Subject: [R] Conditional Random selection >>> >>> Hi all, >>> >>> I have a data set that contains samples collected over time. In >>> each time period the total number of samples are given (X2) The goal >>> is to select 500 random samples. The selection should be based on >>> time (select time periods until I reach 500 samples). Also the time >>> period should have greater than 0 for X1 variable. X1 is an indicator >>> variable. >>> >>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >>> >>> tab <- read.table(textConnection(" time X1 X2 >>> 1 0 251 >>> 2 5 230 >>> 3 1 300 >>> 4 0 25 >>> 5 2 10 >>> 6 3 101 >>> 7 1 300 >>> 8 4 185 "),header = TRUE) >>> >>> In the above example, samples from time 1 and 4 will not be selected >>> ( X1 is zero) >>> So I could reach my target by selecting time 6,7, and 8 or time 2 and >>> 3 and so on. >>> >>> Can any one help to do that? >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
Hi Bert and all, I have related question. In each time period there were different locations where the samples were collected (S1). I want count the number of unique locations (S1) for each unique time period . So in time 1 the samples were collected from two locations and time 2 only from one location and time 3 from three locations.. tab <- read.table(textConnection(" time S1 rep 1 1 1 1 2 1 1 2 2 2 1 1 2 1 2 2 1 3 2 1 4 3 1 1 3 2 1 3 3 1 "),header = TRUE) what I want is time S1 1 2 2 1 3 3 Thank you again. On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote:> Thank you Bert! > > What I want is at least 500 samples based on random sampling of time > period. This allows samples collected at the same time period are > included together. > > Your script is doing what I wanted to do!! > > Many thanks > > > > > On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> David's "solution" is incorrect. It can also fail to give you times >> with a total of 500 items to sample from in the time periods. >> >> It is not entirely clear what you want. The solution below gives you a >> random sample of time periods in which X1>0 and the total number of >> samples among them is >= 500. It does not give you the fewest number >> of periods that can do this. Is this what you want? >> >> tab[with(tab,{ >> rownums<- sample(seq_len(nrow(tab))[X1>0]) >> sz <- cumsum(X2[rownums]) >> rownums[c(TRUE,sz<500)] >> }),] >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> -- Clifford Stoll >> >> >> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >>> Thank you David! >>> >>> I rerun the your script and it is giving me the first three time periods >>> is it doing random sampling? >>> >>> tab.fan >>> time X1 X2 >>> 2 2 5 230 >>> 3 3 1 300 >>> 5 5 2 10 >>> >>> >>> >>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: >>>> Use dput() to send data to the list as it is more compact: >>>> >>>>> dput(tab) >>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", >>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>>> >>>> You can just remove the lines with X1 = 0 since you don't want to use them. >>>> >>>>> tab.sub <- tab[tab$X1>0, ] >>>> >>>> Then the following gives you a sample: >>>> >>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>>> >>>> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. >>>> >>>> >>>> David L. Carlson >>>> Department of Anthropology >>>> Texas A&M University >>>> >>>> -----Original Message----- >>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >>>> Sent: Saturday, November 21, 2015 11:53 AM >>>> To: R help <r-help at r-project.org> >>>> Subject: [R] Conditional Random selection >>>> >>>> Hi all, >>>> >>>> I have a data set that contains samples collected over time. In >>>> each time period the total number of samples are given (X2) The goal >>>> is to select 500 random samples. The selection should be based on >>>> time (select time periods until I reach 500 samples). Also the time >>>> period should have greater than 0 for X1 variable. X1 is an indicator >>>> variable. >>>> >>>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >>>> >>>> tab <- read.table(textConnection(" time X1 X2 >>>> 1 0 251 >>>> 2 5 230 >>>> 3 1 300 >>>> 4 0 25 >>>> 5 2 10 >>>> 6 3 101 >>>> 7 1 300 >>>> 8 4 185 "),header = TRUE) >>>> >>>> In the above example, samples from time 1 and 4 will not be selected >>>> ( X1 is zero) >>>> So I could reach my target by selecting time 6,7, and 8 or time 2 and >>>> 3 and so on. >>>> >>>> Can any one help to do that? >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.
Time to do your own homework by working through an R tutorial or two. There are many on the web -- or see the Intro to R tutorial that ships with R. ?tapply ?unique is one of many answers to your query. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Sat, Nov 21, 2015 at 11:52 AM, Ashta <sewashm at gmail.com> wrote:> Hi Bert and all, > I have related question. In each time period there were different > locations where the samples were collected (S1). I want count the > number of unique locations (S1) for each unique time period . So in > time 1 the samples were collected from two locations and time 2 only > from one location and time 3 from three locations.. > > tab <- read.table(textConnection(" time S1 rep > 1 1 1 > 1 2 1 > 1 2 2 > 2 1 1 > 2 1 2 > 2 1 3 > 2 1 4 > 3 1 1 > 3 2 1 > 3 3 1 "),header = TRUE) > > what I want is > > time S1 > 1 2 > 2 1 > 3 3 > > Thank you again. > > > > On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote: >> Thank you Bert! >> >> What I want is at least 500 samples based on random sampling of time >> period. This allows samples collected at the same time period are >> included together. >> >> Your script is doing what I wanted to do!! >> >> Many thanks >> >> >> >> >> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>> David's "solution" is incorrect. It can also fail to give you times >>> with a total of 500 items to sample from in the time periods. >>> >>> It is not entirely clear what you want. The solution below gives you a >>> random sample of time periods in which X1>0 and the total number of >>> samples among them is >= 500. It does not give you the fewest number >>> of periods that can do this. Is this what you want? >>> >>> tab[with(tab,{ >>> rownums<- sample(seq_len(nrow(tab))[X1>0]) >>> sz <- cumsum(X2[rownums]) >>> rownums[c(TRUE,sz<500)] >>> }),] >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >>>> Thank you David! >>>> >>>> I rerun the your script and it is giving me the first three time periods >>>> is it doing random sampling? >>>> >>>> tab.fan >>>> time X1 X2 >>>> 2 2 5 230 >>>> 3 3 1 300 >>>> 5 5 2 10 >>>> >>>> >>>> >>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: >>>>> Use dput() to send data to the list as it is more compact: >>>>> >>>>>> dput(tab) >>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", >>>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>>>> >>>>> You can just remove the lines with X1 = 0 since you don't want to use them. >>>>> >>>>>> tab.sub <- tab[tab$X1>0, ] >>>>> >>>>> Then the following gives you a sample: >>>>> >>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>>>> >>>>> Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. >>>>> >>>>> >>>>> David L. Carlson >>>>> Department of Anthropology >>>>> Texas A&M University >>>>> >>>>> -----Original Message----- >>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >>>>> Sent: Saturday, November 21, 2015 11:53 AM >>>>> To: R help <r-help at r-project.org> >>>>> Subject: [R] Conditional Random selection >>>>> >>>>> Hi all, >>>>> >>>>> I have a data set that contains samples collected over time. In >>>>> each time period the total number of samples are given (X2) The goal >>>>> is to select 500 random samples. The selection should be based on >>>>> time (select time periods until I reach 500 samples). Also the time >>>>> period should have greater than 0 for X1 variable. X1 is an indicator >>>>> variable. >>>>> >>>>> Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 >>>>> >>>>> tab <- read.table(textConnection(" time X1 X2 >>>>> 1 0 251 >>>>> 2 5 230 >>>>> 3 1 300 >>>>> 4 0 25 >>>>> 5 2 10 >>>>> 6 3 101 >>>>> 7 1 300 >>>>> 8 4 185 "),header = TRUE) >>>>> >>>>> In the above example, samples from time 1 and 4 will not be selected >>>>> ( X1 is zero) >>>>> So I could reach my target by selecting time 6,7, and 8 or time 2 and >>>>> 3 and so on. >>>>> >>>>> Can any one help to do that? >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code.
Hello, Try tapply(tab$S1, tab$time, function(x) length(unique(x))) Hope this helps, Rui Barradas ? Citando Ashta <sewashm at gmail.com>:> Hi? Bert? and all, > I have related question.? In each? time period there were different > locations where the samples were collected (S1).? ?I? want count? the > number of unique locations (S1)? for each unique time period . So in > time 1 the samples were collected from two locations and time 2 only > from one location and time 3? from? three locations.. > > tab? <- read.table(textConnection(" time? ?S1? rep > 1? ? ? 1? ? ? ?1 > 1? ? ? 2? ? ? ?1 > 1? ? ? 2? ? ? ?2 > 2? ? ? 1? ? ? ?1 > 2? ? ? 1? ? ? ?2 > 2? ? ? 1? ? ? ?3 > 2? ? ? 1? ? ? ?4 > 3? ? ? 1? ? ? ?1 > 3? ? ? 2? ? ? ?1 > 3? ? ? 3? ? ? ?1? ?"),header = TRUE) > > what I want is > > time? S1 > ? ?1? ? 2 > ? ?2? ? 1 > ? ?3? ? 3 > > Thank you again. > > On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote: >> Thank you Bert! >> >> What I want is at least 500 samples based on random? sampling of time >> period. This allows samples? collected at the same time period are >> included together. >> >> Your script is doing what I wanted to do!! >> >> Many thanks >> >> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>> David's "solution" is incorrect. It can also fail to give you times >>> with a total of 500 items to sample from in the time periods. >>> >>> It is not entirely clear what you want. The solution below gives you a >>> random sample of time periods in which X1>0 and the total number of >>> samples among them is >= 500. It does not give you the fewest number >>> of periods that can do this. Is this what you want? >>> >>> tab[with(tab,{ >>> ? rownums<- sample(seq_len(nrow(tab))[X1>0]) >>> ? sz <- cumsum(X2[rownums]) >>> ? rownums[c(TRUE,sz<500)] >>> }),] >>> >>> Cheers, >>> Bert >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> ? ?-- Clifford Stoll >>> >>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >>>> Thank you? David! >>>> >>>> I rerun the your script and it is giving me the first three time periods >>>> is it doing random sampling? >>>> >>>> ? ? ? tab.fan >>>> ? time X1? X2 >>>> 2? ? 2? 5 230 >>>> 3? ? 3? 1 300 >>>> 5? ? 5? 2? 10 >>>> >>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson >>>> <dcarlson at tamu.edu> wrote: >>>>> Use dput() to send data to the list as it is more compact: >>>>>> dput(tab) >>>>> >>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), >>>>> .Names = c("time", >>>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>>>> >>>>> You can just remove the lines with X1 = 0 since you don't want >>>>> to use them. >>>>>> tab.sub <- tab[tab$X1>0, ] >>>>> >>>>> Then the following gives you a sample: >>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>>>> >>>>> Note, that your "solution" of times 6, 7, and 8 will never >>>>> appear because the sum of the values is 586. >>>>> >>>>> David L. Carlson >>>>> Department of Anthropology >>>>> Texas A&M University >>>>> >>>>> -----Original Message----- >>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >>>>> Sent: Saturday, November 21, 2015 11:53 AM >>>>> To: R help <r-help at r-project.org> >>>>> Subject: [R] Conditional Random selection >>>>> >>>>> Hi all, >>>>> >>>>> I have a data set that contains samples collected over time.? ?In >>>>> each time period the total number of samples are given (X2)? ?The goal >>>>> is to? select 500? random samples.? ? The selection should be based on >>>>> time? (select time periods until I reach 500 samples). Also the time >>>>> period should have greater than 0 for? X1 variable. X1 is an indicator >>>>> variable. >>>>> >>>>> Select "time" until reaching the? sum of X2? is > 500 and if? >>>>> ?X1 is? >? 0 >>>>> >>>>> tab? <- read.table(textConnection(" time? ?X1 X2 >>>>> 1? ? ? 0? ? ? ? 251 >>>>> 2? ? ? 5? ? ? ? 230 >>>>> 3? ? ? 1? ? ? ? 300 >>>>> 4? ? ? 0? ? ? ? ?25 >>>>> 5? ? ? 2? ? ? ? ?10 >>>>> 6? ? ? 3? ? ? ? ?101 >>>>> 7? ? ? 1? ? ? ? ?300 >>>>> 8? ? ?4? ? ? ? ?185? ?"),header = TRUE) >>>>> >>>>> In the above example,? samples from time 1 and 4? will not be selected >>>>> ( X1 is zero) >>>>> So I could reach my target by selecting time 6,7, and 8 or? time 2 and >>>>> 3 and so on. >>>>> >>>>> Can any one help to do that? >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, > minimal, self-contained, reproducible code.? [[alternative HTML version deleted]]