Hi R users, I have been struggling to select the equal number of samples from each strata. I have the data collected in different years in different regions with different sample size. Basically, I have two two conditions (year and region). I wanted to make smaple sample size for both year and region. I found that "strata.sampling' package can use if I had one condition, but I have two conditions . Is there any package from which I can put two conditions and select the rows randomly 999 times and put the mean value? Your help would be really appreciated. I am spending so much time... Here What I did for the example data raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001, 2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72, 16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13, 45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75, 16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9, 62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36, 53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L)) require(sampling) if (is.null(method)) method <- "srswor" if (!method %in% c("srswor", "srswr")) stop('method must be "srswor" or "srswr"') temp <- data[order(data[[group]]), ] ifelse(length(size) > 1, size <- size, ifelse(size < 1, size <- round(table(temp[group]) * size), size <- rep(size, times=length(table(temp[group]))))) strat = strata(temp, stratanames = names(temp[group]), size = size, method = method) getdata(temp, strat) } test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed BUT, I wanted to use "year" too. ("watershed", "year"). When I added the "year", it did not work test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by watershed and year> test1<-strata.sampling(raw, ("watershed", "year"), 2)Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed"," Here I want to select rows using tow conditions ("watershed", "year") with 999 times and put mean value of sp1,sp2,sp3, using random sampling 999. here is the output I wanted output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L), sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor")), .Names = c("watershed", "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, -4L)) Any suggestions? Thanks for your help. KG [[alternative HTML version deleted]]
Why? Presumably you want to bootstrap the distribution of the mean -- but why? Anyway, if this is correct, the boot package can do this for you. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Fri, Mar 7, 2014 at 11:56 AM, Kristi Glover <kristi.glover at hotmail.com> wrote:> Hi R users, > I have been struggling to select the equal number of samples from each strata. I have the data collected in different years in different regions with different sample size. Basically, I have two two conditions (year and region). I wanted to make smaple sample size for both year and region. > I found that "strata.sampling' package can use if I had one condition, but I have two conditions . Is there any package from which I can put two conditions and select the rows randomly 999 times and put the mean value? > > Your help would be really appreciated. I am spending so much time... > > Here What I did for the example data > raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), > year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001, > 2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72, > 16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13, > 45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75, > 16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9, > 62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36, > 53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1", > "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L)) > > require(sampling) > if (is.null(method)) method <- "srswor" > if (!method %in% c("srswor", "srswr")) > stop('method must be "srswor" or "srswr"') > temp <- data[order(data[[group]]), ] > ifelse(length(size) > 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]))))) > strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) > getdata(temp, strat) > } > > test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed > > BUT, I wanted to use "year" too. ("watershed", "year"). When I added the "year", it did not work > test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by watershed and year >> test1<-strata.sampling(raw, ("watershed", "year"), 2) > Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed"," > > Here I want to select rows using tow conditions ("watershed", "year") with 999 times and put mean value of sp1,sp2,sp3, using random sampling 999. here is the output I wanted > output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label = c("A", > "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L), > sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), > sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), > sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor")), .Names = c("watershed", > "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, > -4L)) > > Any suggestions? > Thanks for your help. > KG > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Kristi Glover > Sent: Friday, March 07, 2014 11:56 AM > To: R-help > Subject: [R] stratified sampling > > Hi R users, > I have been struggling to select the equal number of samples from each > strata. I have the data collected in different years in different regions > with different sample size. Basically, I have two two conditions (year and > region). I wanted to make smaple sample size for both year and region. > I found that "strata.sampling' package can use if I had one condition, but > I have two conditions . Is there any package from which I can put two > conditions and select the rows randomly 999 times and put the mean value? > > Your help would be really appreciated. I am spending so much time... > > Here What I did for the example data > raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), > year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001, > 2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72, > 16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13, > 45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75, > 16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9, > 62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36, > 53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1", > "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L)) > > require(sampling) > if (is.null(method)) method <- "srswor" > if (!method %in% c("srswor", "srswr")) > stop('method must be "srswor" or "srswr"') > temp <- data[order(data[[group]]), ] > ifelse(length(size) > 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]))))) > strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) > getdata(temp, strat) > } > > test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed > > BUT, I wanted to use "year" too. ("watershed", "year"). When I added the > "year", it did not work > test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by > watershed and year > > test1<-strata.sampling(raw, ("watershed", "year"), 2) > Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed"," > > Here I want to select rows using tow conditions ("watershed", "year") with > 999 times and put mean value of sp1,sp2,sp3, using random sampling 999. > here is the output I wanted > output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label > c("A", > "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L), > sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), > sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), > sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class > "factor")), .Names = c("watershed", > "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, > -4L)) > > Any suggestions? > Thanks for your help. > KG > > > > > >There seems to be something missing from your post (your code doesn't run as is even for a single stratum variable. But I might hazard a guess that when you want to pass multiple strata variables you need to pass them as a vector. c('watershed','year') and if you are passing multiple statum variables, you also need to pass a vector of desired sample sizes in the order that the strata appear in you data. In your case that would be size = c(2,2,2,2) If this doesn't solve the problem, then write back to the list with an example that works with a single variable with your data. Dan Daniel Nordlund Bothell, WA USA