Hi! Thanks for taking the time to read this. The code below creates a graph that takes 100 samples that are between 5% and 15% of the population (400). What I'd like to do, however, is add two other sections to the graph. It would look something like this: from 1-100 samples take 100 samples that are between 5% and 15% of the population (400). From 101-200 take 100 samples that are between 5% and 15% of the population (800). From 201-300 take 100 samples that are between 5% and 15% of the population (300). I assume this would require a nested for loop. Does anyone have advice as to how to do this? Thanks for your time. Kirsten ## Mark-Recapture ## Estimate popoulation from repeated sampling ## Population size N <- 400 N ## Vector labeling each item in the population pop <- c(1:N) pop ## Lower and upper bounds of sample size lower.bound <- round(x = .05 * N, digits = 0) lower.bound ## Smallest possible sample size upper.bound <- round(x = .15 * N, digits = 0) upper.bound ## Largest possible sample size ## Length of sample size interval length.ss.interval <- length(c(lower.bound:upper.bound)) length.ss.interval ## total possible sample sizes, ranging form lower.bound to upper.bound ## Determine a sample size randomly (not a global variable...simply for test purposes) ## Between lower and upper bounds set previously ## Give equal weight to each possible sample size in this interval sample(x = c(lower.bound:upper.bound), size = 1, prob = c(rep(1/length.ss.interval, length.ss.interval))) ## Specify number of samples to take n.samples <- 100 ## Initiate empty matrix ## 1st column is population (item 1 thorugh item 400) ## 2nd through nth column are all rounds of sampling dat <- matrix(data = NA, nrow = length(pop), ncol = n.samples + 1) dat[,1] <- pop dat ## Take samples of random sizes ## Record results in columns 2 through n ## 1 = sampled (marked) ## 0 = not sampled (not marked) for(i in 2:ncol(dat)) { a.sample <- sample(x = pop, size = sample(x = c(lower.bound:upper.bound), size = 1, prob = c(rep(1/length.ss.interval, length.ss.interval))), replace = FALSE) dat[,i] <- dat[,1] %in% a.sample } ## How large was each sample size? apply(X = dat, MARGIN = 2, FUN = sum) ## 1st element is irrelevant ## 2nd element through nth element: sample size for each of the 100 samples ## At this point, all computations can be done using dat ## Create Schnabel dataframe using dat ## Google the Schnabel formula schnabel.comp <- data.frame(sample = 1:n.samples, n.sampled = apply(X = dat, MARGIN = 2, FUN sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))] ) ## First column: which sample, 1-100 ## Second column: number selected in that sample ## How many items were previously sampled? ## For 1st sample, it's 0 ## For 2nd sample, code is different than for remaning samples n.prev.sampled <- c(0, rep(NA, n.samples-1)) n.prev.sampled n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1, yes = 1, no = 0)) n.prev.sampled for(i in 4:ncol(dat)) { n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 & rowSums(dat[,2:(i-1)]) > 0, yes = 1, no = 0)) } schnabel.comp$n.prev.sampled <- n.prev.sampled ## n.newly.sampled: in each sample, how many items were newly sampled? ## i.e., never seen before? schnabel.comp$n.newly.sampled <- with(schnabel.comp, n.sampled - n.prev.sampled) ## cum.sampled: how many total items have you seen? schnabel.comp$cum.sampled <- c(0, cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1]) ## numerator of schnabel formula schnabel.comp$numerator <- with(schnabel.comp, n.sampled * cum.sampled) ## denominator of schnable formula is n.prev.sampled ## pop.estimate -- after each sample (starting with 2nd -- need at least two samples) schnabel.comp$pop.estimate <- NA for(i in 1:length(schnabel.comp$pop.estimate)) { schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) / sum(schnabel.comp$n.prev.sampled[1:i]) } ## Plot population estimate after each sample if (!require("ggplot2")) {install.packages("ggplot2"); require("ggplot2")} if (!require("scales")) {install.packages("scales"); require("scales")} small.sample.dat <- schnabel.comp small.sample <- ggplot(data = small.sample.dat, mapping = aes(x = sample, y = pop.estimate)) + geom_point(size = 2) + geom_line() + geom_hline(yintercept = N, col = "red", lwd = 1) + coord_cartesian(xlim = c(0:100), ylim = c(300:500)) + scale_x_continuous(breaks = pretty_breaks(11)) + scale_y_continuous(breaks = pretty_breaks(11)) + labs(x = "\nSample", y = "Population estimate\n", title = "Sample sizes are between 5% and 15%\nof the population") + theme_bw(base_size = 12) + theme(aspect.ratio = 1) small.sample [[alternative HTML version deleted]]
Hi Kirsten, I can run your example code but I can't quite follow your division of sampling. Can you restate the the task? Below is what I think you are asking for, but I have the feeling I may be off the mark. Set A: 400 samples, draw 100 in range of 5 to 15 Set B: 800 samples, draw 100 in range of 5 to 15 Set C: 300 samples, draw 100 in range of 5 to 15 Ben> On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse <kmoreho1 at swarthmore.edu> wrote: > > Hi! Thanks for taking the time to read this. > > The code below creates a graph that takes 100 samples that are between 5% > and 15% of the population (400). > > What I'd like to do, however, is add two other sections to the graph. It > would look something like this: > > from 1-100 samples take 100 samples that are between 5% and 15% of the > population (400). From 101-200 take 100 samples that are between 5% and 15% > of the population (800). From 201-300 take 100 samples that are between 5% > and 15% of the population (300). > > I assume this would require a nested for loop. Does anyone have advice as > to how to do this? > > Thanks for your time. Kirsten > > ## Mark-Recapture > ## Estimate popoulation from repeated sampling > > ## Population size > N <- 400 > N > > ## Vector labeling each item in the population > pop <- c(1:N) > pop > > ## Lower and upper bounds of sample size > lower.bound <- round(x = .05 * N, digits = 0) > lower.bound ## Smallest possible sample size > > upper.bound <- round(x = .15 * N, digits = 0) > upper.bound ## Largest possible sample size > > ## Length of sample size interval > length.ss.interval <- length(c(lower.bound:upper.bound)) > length.ss.interval ## total possible sample sizes, ranging form lower.bound > to upper.bound > > ## Determine a sample size randomly (not a global variable...simply for > test purposes) > ## Between lower and upper bounds set previously > ## Give equal weight to each possible sample size in this interval > sample(x = c(lower.bound:upper.bound), > size = 1, > prob = c(rep(1/length.ss.interval, length.ss.interval))) > > ## Specify number of samples to take > n.samples <- 100 > > ## Initiate empty matrix > ## 1st column is population (item 1 thorugh item 400) > ## 2nd through nth column are all rounds of sampling > dat <- matrix(data = NA, > nrow = length(pop), > ncol = n.samples + 1) > > dat[,1] <- pop > > dat > > ## Take samples of random sizes > ## Record results in columns 2 through n > ## 1 = sampled (marked) > ## 0 = not sampled (not marked) > for(i in 2:ncol(dat)) { > a.sample <- sample(x = pop, > size = sample(x = c(lower.bound:upper.bound), > size = 1, > prob = c(rep(1/length.ss.interval, > length.ss.interval))), > replace = FALSE) > dat[,i] <- dat[,1] %in% a.sample > } > > ## How large was each sample size? > apply(X = dat, MARGIN = 2, FUN = sum) > ## 1st element is irrelevant > ## 2nd element through nth element: sample size for each of the 100 samples > > ## At this point, all computations can be done using dat > > ## Create Schnabel dataframe using dat > ## Google the Schnabel formula > > schnabel.comp <- data.frame(sample = 1:n.samples, > n.sampled = apply(X = dat, MARGIN = 2, FUN > sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))] > ) > > ## First column: which sample, 1-100 > ## Second column: number selected in that sample > > > ## How many items were previously sampled? > ## For 1st sample, it's 0 > ## For 2nd sample, code is different than for remaning samples > > n.prev.sampled <- c(0, rep(NA, n.samples-1)) > n.prev.sampled > > n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1, > yes = 1, > no = 0)) > > n.prev.sampled > > for(i in 4:ncol(dat)) { > n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 & > rowSums(dat[,2:(i-1)]) > 0, > yes = 1, > no = 0)) > } > > schnabel.comp$n.prev.sampled <- n.prev.sampled > > ## n.newly.sampled: in each sample, how many items were newly sampled? > ## i.e., never seen before? > schnabel.comp$n.newly.sampled <- with(schnabel.comp, > n.sampled - n.prev.sampled) > > ## cum.sampled: how many total items have you seen? > schnabel.comp$cum.sampled <- c(0, > cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1]) > > ## numerator of schnabel formula > schnabel.comp$numerator <- with(schnabel.comp, > n.sampled * cum.sampled) > > ## denominator of schnable formula is n.prev.sampled > > ## pop.estimate -- after each sample (starting with 2nd -- need at least > two samples) > schnabel.comp$pop.estimate <- NA > > for(i in 1:length(schnabel.comp$pop.estimate)) { > schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) / > sum(schnabel.comp$n.prev.sampled[1:i]) > } > > > ## Plot population estimate after each sample > if (!require("ggplot2")) {install.packages("ggplot2"); require("ggplot2")} > if (!require("scales")) {install.packages("scales"); require("scales")} > > > small.sample.dat <- schnabel.comp > > small.sample <- ggplot(data = small.sample.dat, > mapping = aes(x = sample, y = pop.estimate)) + > geom_point(size = 2) + > geom_line() + > geom_hline(yintercept = N, col = "red", lwd = 1) + > coord_cartesian(xlim = c(0:100), ylim = c(300:500)) + > scale_x_continuous(breaks = pretty_breaks(11)) + > scale_y_continuous(breaks = pretty_breaks(11)) + > labs(x = "\nSample", y = "Population estimate\n", > title = "Sample sizes are between 5% and 15%\nof the population") + > theme_bw(base_size = 12) + > theme(aspect.ratio = 1) > > small.sample > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org Ecocast Reports: http://seascapemodeling.org/ecocast.html Tick Reports: https://report.bigelow.org/tick/ Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/
Hi Ben, That's exactly right! Except for each set it's the sample population that is 400, 800 or 300. I want to take 3 samples, each of 100, where only the population differs. I can do this separately, but I'm having trouble putting them all on the same graph. I'd like to have sample on the x axis (1-300) and estimate on the y axis. I want to show how population affects the estimates. Does this make more sense? Thanks for your time! Kirsten On Sun, Aug 6, 2017 at 3:21 PM Ben Tupper <btupper at bigelow.org> wrote:> Hi Kirsten, > > > > I can run your example code but I can't quite follow your division of > sampling. Can you restate the the task? Below is what I think you are > asking for, but I have the feeling I may be off the mark. > > > > > > Set A: 400 samples, draw 100 in range of 5 to 15 > > > > Set B: 800 samples, draw 100 in range of 5 to 15 > > > > Set C: 300 samples, draw 100 in range of 5 to 15 > > > > Ben > > > > > On Aug 5, 2017, at 9:21 AM, Kirsten Morehouse <kmoreho1 at swarthmore.edu> > wrote: > > > > > > Hi! Thanks for taking the time to read this. > > > > > > The code below creates a graph that takes 100 samples that are between 5% > > > and 15% of the population (400). > > > > > > What I'd like to do, however, is add two other sections to the graph. It > > > would look something like this: > > > > > > from 1-100 samples take 100 samples that are between 5% and 15% of the > > > population (400). From 101-200 take 100 samples that are between 5% and > 15% > > > of the population (800). From 201-300 take 100 samples that are between > 5% > > > and 15% of the population (300). > > > > > > I assume this would require a nested for loop. Does anyone have advice as > > > to how to do this? > > > > > > Thanks for your time. Kirsten > > > > > > ## Mark-Recapture > > > ## Estimate popoulation from repeated sampling > > > > > > ## Population size > > > N <- 400 > > > N > > > > > > ## Vector labeling each item in the population > > > pop <- c(1:N) > > > pop > > > > > > ## Lower and upper bounds of sample size > > > lower.bound <- round(x = .05 * N, digits = 0) > > > lower.bound ## Smallest possible sample size > > > > > > upper.bound <- round(x = .15 * N, digits = 0) > > > upper.bound ## Largest possible sample size > > > > > > ## Length of sample size interval > > > length.ss.interval <- length(c(lower.bound:upper.bound)) > > > length.ss.interval ## total possible sample sizes, ranging form > lower.bound > > > to upper.bound > > > > > > ## Determine a sample size randomly (not a global variable...simply for > > > test purposes) > > > ## Between lower and upper bounds set previously > > > ## Give equal weight to each possible sample size in this interval > > > sample(x = c(lower.bound:upper.bound), > > > size = 1, > > > prob = c(rep(1/length.ss.interval, length.ss.interval))) > > > > > > ## Specify number of samples to take > > > n.samples <- 100 > > > > > > ## Initiate empty matrix > > > ## 1st column is population (item 1 thorugh item 400) > > > ## 2nd through nth column are all rounds of sampling > > > dat <- matrix(data = NA, > > > nrow = length(pop), > > > ncol = n.samples + 1) > > > > > > dat[,1] <- pop > > > > > > dat > > > > > > ## Take samples of random sizes > > > ## Record results in columns 2 through n > > > ## 1 = sampled (marked) > > > ## 0 = not sampled (not marked) > > > for(i in 2:ncol(dat)) { > > > a.sample <- sample(x = pop, > > > size = sample(x = c(lower.bound:upper.bound), > > > size = 1, > > > prob = c(rep(1/length.ss.interval, > > > length.ss.interval))), > > > replace = FALSE) > > > dat[,i] <- dat[,1] %in% a.sample > > > } > > > > > > ## How large was each sample size? > > > apply(X = dat, MARGIN = 2, FUN = sum) > > > ## 1st element is irrelevant > > > ## 2nd element through nth element: sample size for each of the 100 > samples > > > > > > ## At this point, all computations can be done using dat > > > > > > ## Create Schnabel dataframe using dat > > > ## Google the Schnabel formula > > > > > > schnabel.comp <- data.frame(sample = 1:n.samples, > > > n.sampled = apply(X = dat, MARGIN = 2, FUN > > > sum)[2:length(apply(X = dat, MARGIN = 2, FUN = sum))] > > > ) > > > > > > ## First column: which sample, 1-100 > > > ## Second column: number selected in that sample > > > > > > > > > ## How many items were previously sampled? > > > ## For 1st sample, it's 0 > > > ## For 2nd sample, code is different than for remaning samples > > > > > > n.prev.sampled <- c(0, rep(NA, n.samples-1)) > > > n.prev.sampled > > > > > > n.prev.sampled[2] <- sum(ifelse(test = dat[,3] == 1 & dat[,2] == 1, > > > yes = 1, > > > no = 0)) > > > > > > n.prev.sampled > > > > > > for(i in 4:ncol(dat)) { > > > n.prev.sampled[i-1] <- sum(ifelse(test = dat[,i] == 1 & > > > rowSums(dat[,2:(i-1)]) > 0, > > > yes = 1, > > > no = 0)) > > > } > > > > > > schnabel.comp$n.prev.sampled <- n.prev.sampled > > > > > > ## n.newly.sampled: in each sample, how many items were newly sampled? > > > ## i.e., never seen before? > > > schnabel.comp$n.newly.sampled <- with(schnabel.comp, > > > n.sampled - n.prev.sampled) > > > > > > ## cum.sampled: how many total items have you seen? > > > schnabel.comp$cum.sampled <- c(0, > > > cumsum(schnabel.comp$n.newly.sampled)[2:n.samples-1]) > > > > > > ## numerator of schnabel formula > > > schnabel.comp$numerator <- with(schnabel.comp, > > > n.sampled * cum.sampled) > > > > > > ## denominator of schnable formula is n.prev.sampled > > > > > > ## pop.estimate -- after each sample (starting with 2nd -- need at least > > > two samples) > > > schnabel.comp$pop.estimate <- NA > > > > > > for(i in 1:length(schnabel.comp$pop.estimate)) { > > > schnabel.comp$pop.estimate[i] <- sum(schnabel.comp$numerator[1:i]) / > > > sum(schnabel.comp$n.prev.sampled[1:i]) > > > } > > > > > > > > > ## Plot population estimate after each sample > > > if (!require("ggplot2")) {install.packages("ggplot2"); > require("ggplot2")} > > > if (!require("scales")) {install.packages("scales"); require("scales")} > > > > > > > > > small.sample.dat <- schnabel.comp > > > > > > small.sample <- ggplot(data = small.sample.dat, > > > mapping = aes(x = sample, y = pop.estimate)) + > > > geom_point(size = 2) + > > > geom_line() + > > > geom_hline(yintercept = N, col = "red", lwd = 1) + > > > coord_cartesian(xlim = c(0:100), ylim = c(300:500)) + > > > scale_x_continuous(breaks = pretty_breaks(11)) + > > > scale_y_continuous(breaks = pretty_breaks(11)) + > > > labs(x = "\nSample", y = "Population estimate\n", > > > title = "Sample sizes are between 5% and 15%\nof the population") + > > > theme_bw(base_size = 12) + > > > theme(aspect.ratio = 1) > > > > > > small.sample > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > Ben Tupper > > Bigelow Laboratory for Ocean Sciences > > 60 Bigelow Drive, P.O. Box 380 > > East Boothbay, Maine 04544 > > http://www.bigelow.org > > > > Ecocast Reports: http://seascapemodeling.org/ecocast.html > > Tick Reports: https://report.bigelow.org/tick/ > > Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/ > > > > > > > >[[alternative HTML version deleted]]