Hi all, I have a data set that contains samples collected over time. In each time period the total number of samples are given (X2) The goal is to select 500 random samples. The selection should be based on time (select time periods until I reach 500 samples). Also the time period should have greater than 0 for X1 variable. X1 is an indicator variable. Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 tab <- read.table(textConnection(" time X1 X2 1 0 251 2 5 230 3 1 300 4 0 25 5 2 10 6 3 101 7 1 300 8 4 185 "),header = TRUE) In the above example, samples from time 1 and 4 will not be selected ( X1 is zero) So I could reach my target by selecting time 6,7, and 8 or time 2 and 3 and so on. Can any one help to do that?
Use dput() to send data to the list as it is more compact:> dput(tab)structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) You can just remove the lines with X1 = 0 since you don't want to use them.> tab.sub <- tab[tab$X1>0, ]Then the following gives you a sample:> tab.sub[cumsum(sample(tab.sub$X2))<=500, ]Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. David L. Carlson Department of Anthropology Texas A&M University -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta Sent: Saturday, November 21, 2015 11:53 AM To: R help <r-help at r-project.org> Subject: [R] Conditional Random selection Hi all, I have a data set that contains samples collected over time. In each time period the total number of samples are given (X2) The goal is to select 500 random samples. The selection should be based on time (select time periods until I reach 500 samples). Also the time period should have greater than 0 for X1 variable. X1 is an indicator variable. Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 tab <- read.table(textConnection(" time X1 X2 1 0 251 2 5 230 3 1 300 4 0 25 5 2 10 6 3 101 7 1 300 8 4 185 "),header = TRUE) In the above example, samples from time 1 and 4 will not be selected ( X1 is zero) So I could reach my target by selecting time 6,7, and 8 or time 2 and 3 and so on. Can any one help to do that? ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you David! I rerun the your script and it is giving me the first three time periods is it doing random sampling? tab.fan time X1 X2 2 2 5 230 3 3 1 300 5 5 2 10 On Sat, Nov 21, 2015 at 12:20 PM, David L Carlson <dcarlson at tamu.edu> wrote:> Use dput() to send data to the list as it is more compact: > >> dput(tab) > structure(list(time = 1:8, X1 = c(0L, 5L, 1L, 0L, 2L, 3L, 1L, > 4L), X2 = c(251L, 230L, 300L, 25L, 10L, 101L, 300L, 185L)), .Names = c("time", > "X1", "X2"), class = "data.frame", row.names = c(NA, -8L)) > > You can just remove the lines with X1 = 0 since you don't want to use them. > >> tab.sub <- tab[tab$X1>0, ] > > Then the following gives you a sample: > >> tab.sub[cumsum(sample(tab.sub$X2))<=500, ] > > Note, that your "solution" of times 6, 7, and 8 will never appear because the sum of the values is 586. > > > David L. Carlson > Department of Anthropology > Texas A&M University > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ashta > Sent: Saturday, November 21, 2015 11:53 AM > To: R help <r-help at r-project.org> > Subject: [R] Conditional Random selection > > Hi all, > > I have a data set that contains samples collected over time. In > each time period the total number of samples are given (X2) The goal > is to select 500 random samples. The selection should be based on > time (select time periods until I reach 500 samples). Also the time > period should have greater than 0 for X1 variable. X1 is an indicator > variable. > > Select "time" until reaching the sum of X2 is > 500 and if X1 is > 0 > > tab <- read.table(textConnection(" time X1 X2 > 1 0 251 > 2 5 230 > 3 1 300 > 4 0 25 > 5 2 10 > 6 3 101 > 7 1 300 > 8 4 185 "),header = TRUE) > > In the above example, samples from time 1 and 4 will not be selected > ( X1 is zero) > So I could reach my target by selecting time 6,7, and 8 or time 2 and > 3 and so on. > > Can any one help to do that? > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.