Hello, Try tapply(tab$S1, tab$time, function(x) length(unique(x))) Hope this helps, Rui Barradas ? Citando Ashta <sewashm at gmail.com>:> Hi? Bert? and all, > I have related question.? In each? time period there were different > locations where the samples were collected (S1).? ?I? want count? the > number of unique locations (S1)? for each unique time period . So in > time 1 the samples were collected from two locations and time 2 only > from one location and time 3? from? three locations.. > > tab? <- read.table(textConnection(" time? ?S1? rep > 1? ? ? 1? ? ? ?1 > 1? ? ? 2? ? ? ?1 > 1? ? ? 2? ? ? ?2 > 2? ? ? 1? ? ? ?1 > 2? ? ? 1? ? ? ?2 > 2? ? ? 1? ? ? ?3 > 2? ? ? 1? ? ? ?4 > 3? ? ? 1? ? ? ?1 > 3? ? ? 2? ? ? ?1 > 3? ? ? 3? ? ? ?1? ?"),header = TRUE) > > what I want is > > time? S1 > ? ?1? ? 2 > ? ?2? ? 1 > ? ?3? ? 3 > > Thank you again. > > On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote: >> Thank you Bert! >> >> What I want is at least 500 samples based on random? sampling of time >> period. This allows samples? collected at the same time period are >> included together. >> >> Your script is doing what I wanted to do!! >> >> Many thanks >> >> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>> David's "solution" is incorrect. It can also fail to give you times >>> with a total of 500 items to sample from in the time periods. >>> >>> It is not entirely clear what you want. The solution below gives you a >>> random sample of time periods in which X1>0 and the total number of >>> samples among them is >= 500. It does not give you the fewest number >>> of periods that can do this. Is this what you want? >>> >>> tab[with(tab,{ >>> ? rownums<- sample(seq_len(nrow(tab))[X1>0]) >>> ? sz <- cumsum(X2[rownums]) >>> ? rownums[c(TRUE,sz<500)] >>> }),] >>> >>> Cheers, >>> Bert >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> ? ?-- Clifford Stoll >>> >>> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >>>> Thank you? David! >>>> >>>> I rerun the your script and it is giving me the first three time periods >>>> is it doing random sampling? >>>> >>>> ? ? ? tab.fan >>>> ? time X1? X2 >>>> 2? ? 2? 5 230 >>>> 3? ? 3? 1 300 >>>> 5? ? 5? 2? 10 >>>> >>>> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson >>>> <dcarlson at tamu.edu> wrote: >>>>> Use dput() to send data to the list as it is more compact: >>>>>> dput(tab) >>>>> >>>>> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >>>>> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), >>>>> .Names = c("time", >>>>> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >>>>> >>>>> You can just remove the lines with X1 = 0 since you don't want >>>>> to use them. >>>>>> tab.sub <- tab[tab$X1>0, ] >>>>> >>>>> Then the following gives you a sample: >>>>>> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >>>>> >>>>> Note, that your "solution" of times 6, 7, and 8 will never >>>>> appear because the sum of the values is 586. >>>>> >>>>> David L. Carlson >>>>> Department of Anthropology >>>>> Texas A&M University >>>>> >>>>> -----Original Message----- >>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >>>>> Sent: Saturday, November 21, 2015 11:53 AM >>>>> To: R help <r-help at r-project.org> >>>>> Subject: [R] Conditional Random selection >>>>> >>>>> Hi all, >>>>> >>>>> I have a data set that contains samples collected over time.? ?In >>>>> each time period the total number of samples are given (X2)? ?The goal >>>>> is to? select 500? random samples.? ? The selection should be based on >>>>> time? (select time periods until I reach 500 samples). Also the time >>>>> period should have greater than 0 for? X1 variable. X1 is an indicator >>>>> variable. >>>>> >>>>> Select "time" until reaching the? sum of X2? is > 500 and if? >>>>> ?X1 is? >? 0 >>>>> >>>>> tab? <- read.table(textConnection(" time? ?X1 X2 >>>>> 1? ? ? 0? ? ? ? 251 >>>>> 2? ? ? 5? ? ? ? 230 >>>>> 3? ? ? 1? ? ? ? 300 >>>>> 4? ? ? 0? ? ? ? ?25 >>>>> 5? ? ? 2? ? ? ? ?10 >>>>> 6? ? ? 3? ? ? ? ?101 >>>>> 7? ? ? 1? ? ? ? ?300 >>>>> 8? ? ?4? ? ? ? ?185? ?"),header = TRUE) >>>>> >>>>> In the above example,? samples from time 1 and 4? will not be selected >>>>> ( X1 is zero) >>>>> So I could reach my target by selecting time 6,7, and 8 or? time 2 and >>>>> 3 and so on. >>>>> >>>>> Can any one help to do that? >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, > minimal, self-contained, reproducible code.? [[alternative HTML version deleted]]
Hi Rui , I tried that one before I send out my original message. it gave me only this, tapply(tab$S1, tab$time, function(x) length(unique(x))) 1 2 3 2 1 3 I am expecting an output of like this time S1 1 2 2 1 3 3 On Sat, Nov 21, 2015 at 2:38 PM, <ruipbarradas at sapo.pt> wrote:> Hello, > > Try > > tapply(tab$S1, tab$time, function(x) length(unique(x))) > > Hope this helps, > > Rui Barradas > > > Citando Ashta <sewashm at gmail.com>: > > Hi Bert and all, > I have related question. In each time period there were different > locations where the samples were collected (S1). I want count the > number of unique locations (S1) for each unique time period . So in > time 1 the samples were collected from two locations and time 2 only > from one location and time 3 from three locations.. > > tab <- read.table(textConnection(" time S1 rep > 1 1 1 > 1 2 1 > 1 2 2 > 2 1 1 > 2 1 2 > 2 1 3 > 2 1 4 > 3 1 1 > 3 2 1 > 3 3 1 "),header = TRUE) > > what I want is > > time S1 > 1 2 > 2 1 > 3 3 > > Thank you again. > > > > On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote: > > Thank you Bert! > > What I want is at least 500 samples based on random sampling of time > period. This allows samples collected at the same time period are > included together. > > Your script is doing what I wanted to do!! > > Many thanks > > > > > On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > David's "solution" is incorrect. It can also fail to give you times > with a total of 500 items to sample from in the time periods. > > It is not entirely clear what you want. The solution below gives you a > random sample of time periods in which X1>0 and the total number of > samples among them is >= 500. It does not give you the fewest number > of periods that can do this. Is this what you want? > > tab[with(tab,{ > rownums<- sample(seq_len(nrow(tab))[X1>0]) > sz <- cumsum(X2[rownums]) > rownums[c(TRUE,sz<500)] > }),] > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: > > Thank you David! > > I rerun the your script and it is giving me the first three time periods > is it doing random sampling? > > tab.fan > time X1 X2 > 2 2 5 230 > 3 3 1 300 > 5 5 2 10 > > > > On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: > > Use dput() to send data to the list as it is more compact: > > dput(tab) > > structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, > 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names > c("time", > "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) > > You can just remove the lines with X1 = 0 since you don't want to use them. > > tab.sub <- tab[tab$X1>0, ] > > Then the following gives you a sample: > > tab.sub[cumsum(sample(tab.sub$X2))<=500, ] > > Note, that your "solution" of times 6, 7, and 8 will never appear because > the sum of the values is 586. > > > David L. Carlson > Department of Anthropology > Texas A&M University > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta > Sent: Saturday, November 21, 2015 11:53 AM > To: R help <r-help at r-project.org> > Subject: [R] Conditional Random selection > > Hi all, > > I have a data set that contains samples collected over time. In > each time period the total number of samples are given (X2) The goal > is to select 500 random samples. The selection should be based on > time (select time periods until I reach 500 samples). Also the time > period should have greater than 0 for X1 variable. X1 is an indicator > variable. > > Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 > > tab <- read.table(textConnection(" time X1 X2 > 1 0 251 > 2 5 230 > 3 1 300 > 4 0 25 > 5 2 10 > 6 3 101 > 7 1 300 > 8 4 185 "),header = TRUE) > > In the above example, samples from time 1 and 4 will not be selected > ( X1 is zero) > So I could reach my target by selecting time 6,7, and 8 or time 2 and > 3 and so on. > > Can any one help to do that? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, minimal, > self-contained, reproducible code. > > >
Hello, Is that a real doubt? Like Bert said, you should spend some time with an R tutorial. All you need is to know how to form a data.frame. tmp <- tapply(tab1$S1, tab1$time, function(x) length(unique(x))) data.frame(time = names(tmp), S1 = tmp) Rui Barradas ? Citando Ashta <sewashm at gmail.com>:> Hi? Rui , > > I tried that one? before I send out my original message. > it gave me only this, > > tapply(tab$S1, tab$time, function(x) length(unique(x))) > 1 2 3 > 2 1 3 > > I am expecting an output of like this > > time? S1 > ? ?1? ? 2 > ? ?2? ? 1 > ? ?3? ? 3 > > On Sat, Nov 21, 2015 at 2:38 PM,? <ruipbarradas at sapo.pt> wrote: >> Hello, >> >> Try >> >> tapply(tab$S1, tab$time, function(x) length(unique(x))) >> >> Hope this helps, >> >> Rui Barradas >> >> Citando Ashta <sewashm at gmail.com>: >> >> Hi? Bert? and all, >> I have related question.? In each? time period there were different >> locations where the samples were collected (S1).? ?I? want count? the >> number of unique locations (S1)? for each unique time period . So in >> time 1 the samples were collected from two locations and time 2 only >> from one location and time 3? from? three locations.. >> >> tab? <- read.table(textConnection(" time? ?S1? rep >> 1? ? ? 1? ? ? ?1 >> 1? ? ? 2? ? ? ?1 >> 1? ? ? 2? ? ? ?2 >> 2? ? ? 1? ? ? ?1 >> 2? ? ? 1? ? ? ?2 >> 2? ? ? 1? ? ? ?3 >> 2? ? ? 1? ? ? ?4 >> 3? ? ? 1? ? ? ?1 >> 3? ? ? 2? ? ? ?1 >> 3? ? ? 3? ? ? ?1? ?"),header = TRUE) >> >> what I want is >> >> time? S1 >> ? ?1? ? 2 >> ? ?2? ? 1 >> ? ?3? ? 3 >> >> Thank you again. >> >> On Sat, Nov 21, 2015 at 1:30 PM, Ashta <sewashm at gmail.com> wrote: >> >> Thank you Bert! >> >> What I want is at least 500 samples based on random? sampling of time >> period. This allows samples? collected at the same time period are >> included together. >> >> Your script is doing what I wanted to do!! >> >> Many thanks >> >> On Sat, Nov 21, 2015 at 1:15 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> >> David's "solution" is incorrect. It can also fail to give you times >> with a total of 500 items to sample from in the time periods. >> >> It is not entirely clear what you want. The solution below gives you a >> random sample of time periods in which X1>0 and the total number of >> samples among them is >= 500. It does not give you the fewest number >> of periods that can do this. Is this what you want? >> >> tab[with(tab,{ >> ? rownums<- sample(seq_len(nrow(tab))[X1>0]) >> ? sz <- cumsum(X2[rownums]) >> ? rownums[c(TRUE,sz<500)] >> }),] >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> ? ?-- Clifford Stoll >> >> On Sat, Nov 21, 2015 at 10:56 AM, Ashta <sewashm at gmail.com> wrote: >> >> Thank you? David! >> >> I rerun the your script and it is giving me the first three time periods >> is it doing random sampling? >> >> ? ? ? tab.fan >> ? time X1? X2 >> 2? ? 2? 5 230 >> 3? ? 3? 1 300 >> 5? ? 5? 2? 10 >> >> On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote: >> >> Use dput() to send data to the list as it is more compact: >> >> dput(tab) >> >> structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, >> 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names >> c("time", >> "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) >> >> You can just remove the lines with X1 = 0 since you don't want to use them. >> >> tab.sub <- tab[tab$X1>0, ] >> >> Then the following gives you a sample: >> >> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] >> >> Note, that your "solution" of times 6, 7, and 8 will never appear because >> the sum of the values is 586. >> >> David L. Carlson >> Department of Anthropology >> Texas A&M University >> >> -----Original Message----- >> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta >> Sent: Saturday, November 21, 2015 11:53 AM >> To: R help <r-help at r-project.org> >> Subject: [R] Conditional Random selection >> >> Hi all, >> >> I have a data set that contains samples collected over time.? ?In >> each time period the total number of samples are given (X2)? ?The goal >> is to? select 500? random samples.? ? The selection should be based on >> time? (select time periods until I reach 500 samples). Also the time >> period should have greater than 0 for? X1 variable. X1 is an indicator >> variable. >> >> Select "time" until reaching the? sum of X2? is > 500 and if? ?X1 is? >? 0 >> >> tab? <- read.table(textConnection(" time? ?X1 X2 >> 1? ? ? 0? ? ? ? 251 >> 2? ? ? 5? ? ? ? 230 >> 3? ? ? 1? ? ? ? 300 >> 4? ? ? 0? ? ? ? ?25 >> 5? ? ? 2? ? ? ? ?10 >> 6? ? ? 3? ? ? ? ?101 >> 7? ? ? 1? ? ? ? ?300 >> 8? ? ?4? ? ? ? ?185? ?"),header = TRUE) >> >> In the above example,? samples from time 1 and 4? will not be selected >> ( X1 is zero) >> So I could reach my target by selecting time 6,7, and 8 or? time 2 and >> 3 and so on. >> >> Can any one help to do that? >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.htmland provide commented, minimal, >> self-contained, reproducible code. >> ? > > ?[[alternative HTML version deleted]]