Nick Pardikes
2016-Sep-06 22:07 UTC
[R] Resample with replacement to produce many rarefaction curves with same number of samples
I am currently having difficulty producing a graph using rarecurve in the vegan package. I have produced rarefaction curves (seen below) using the following code. library(vegan) myMat <- round(matrix(rlnorm(2000), 50)) #creates distribution of communities netdata <- as.data.frame(myMat) #generates a matrix of communities (rows), species (columns) raredata <- rarecurve(netdata, label=F, col=rgb(0, 0, 1, 0.1)) #uses rarecurve to plot a rarefaction for each individual community (n=50) However I would like to produce a graph in which all rarefaction curves end at the same sample size. For example, in this graph it would be great to extend the x-axis (sample size) to 100 and have all curves end at this point. Is there any way to use rarecurve to resample a community (row) with replacement the same number of times for all 50 communities? With replacement is important because the communities differ greatly in their size (number of species). I understand that rarefaction is useful to compare communities with different sample efforts, but I would still like to generate the figure. My actual data has 5000 simulated communities that differ greatly in matrix size and number of samples. Thank you in advance for your help and suggestions. Cheers, Nick -- Nick Pardikes PhD Candidate Program in Ecology, Evolution, and Conservation Biology University of Nevada https://nickpardikes.wordpress.com/ nickpardikes at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: rarefaction.pdf Type: application/pdf Size: 19167 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160906/4015a1ee/attachment.pdf>
Luisfo
2016-Sep-07 15:17 UTC
[R] Resample with replacement to produce many rarefaction curves with same number of samples
Hi Nick, If you use the following raredata <- rarecurve(rrarefy(netdata, sample=100), label=F, col=rgb(0, 0, 1, 0.1)) should work for any sample size, e.g. sample=100. However, you will have a 'warning' if you don't have samples enough, because it has not replacement. If you type 'rrarefy' on the R console (without brackets), or any other function name, you will see the R code of the function. rrarefy uses the function 'sample()' for sampling, but has no option for replacement. I did the following. I created my custom rrarefy function from the original. rrarefy.custom <- function (x, sample, rep.param=F) { if (!identical(all.equal(x, round(x)), TRUE)) stop("function is meaningful only for integers (counts)") x <- as.matrix(x) if (ncol(x) == 1) x <- t(x) if (length(sample) > 1 && length(sample) != nrow(x)) stop(gettextf("length of 'sample' and number of rows of 'x' do not match")) sample <- rep(sample, length = nrow(x)) colnames(x) <- colnames(x, do.NULL = FALSE) nm <- colnames(x) if (!rep.param && any(rowSums(x) < sample)) warning("Some row sums < 'sample' and are not rarefied") for (i in 1:nrow(x)) { if (rep.param && sum(x[i, ]) <= sample[i]) next row <- sample(rep(nm, times = x[i, ]), sample[i], replace = rep.param) row <- table(row) ind <- names(row) x[i, ] <- 0 x[i, ind] <- row } x } You can check the differences with the original code if you type 'rrarefy' on the R console. So now, if you type the following raredata <- rarecurve(rrarefy.custom(netdata, sample=100,rep.param=T), label=F, col=rgb(0, 0, 1, 0.1)) you will have the desired behaviour. WARNING: I do not understand about rarefunction curves or communities in your context. So, be careful when resampling. It might not be statistically correct. Regards, *Luisfo Chiroque* /PhD Student | PhD Candidate IMDEA Networks Institute/ http://fourier.networks.imdea.org/people/~luis_nunez/ <http://fourier.networks.imdea.org/people/%7Eluis_nunez/> On 09/07/2016 12:07 AM, Nick Pardikes wrote:> I am currently having difficulty producing a graph using rarecurve in the > vegan package. I have produced rarefaction curves (seen below) using the > following code. > > > library(vegan) > > myMat <- round(matrix(rlnorm(2000), 50)) #creates distribution of > communities > > netdata <- as.data.frame(myMat) #generates a matrix of communities (rows), > species (columns) > > raredata <- rarecurve(netdata, label=F, col=rgb(0, 0, 1, 0.1)) #uses > rarecurve to plot a rarefaction for each individual community (n=50) > > > However I would like to produce a graph in which all rarefaction curves end > at the same sample size. For example, in this graph it would be great to > extend the x-axis (sample size) to 100 and have all curves end at this > point. Is there any way to use rarecurve to resample a community (row) with > replacement the same number of times for all 50 communities? With > replacement is important because the communities differ greatly in their > size (number of species). > > > I understand that rarefaction is useful to compare communities with > different sample efforts, but I would still like to generate the figure. My > actual data has 5000 simulated communities that differ greatly in matrix > size and number of samples. > > > Thank you in advance for your help and suggestions. > > > Cheers, > > Nick > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Nick Pardikes
2016-Sep-07 20:27 UTC
[R] Resample with replacement to produce many rarefaction curves with same number of samples
Hey Luisfo, This looks great, however I still get the same plot as before (seen below). The output looks the same. Here is the figure that was generated from this code: rrarefy.custom <- function (x, sample, rep.param=F) { if (!identical(all.equal(x, round(x)), TRUE)) stop("function is meaningful only for integers (counts)") x <- as.matrix(x) if (ncol(x) == 1) x <- t(x) if (length(sample) > 1 && length(sample) != nrow(x)) stop(gettextf("length of 'sample' and number of rows of 'x' do not match")) sample <- rep(sample, length = nrow(x)) colnames(x) <- colnames(x, do.NULL = FALSE) nm <- colnames(x) if (!rep.param && any(rowSums(x) < sample)) warning("Some row sums < 'sample' and are not rarefied") for (i in 1:nrow(x)) { if (rep.param && sum(x[i, ]) <= sample[i]) next row <- sample(rep(nm, times = x[i, ]), sample[i], replace = rep.param) row <- table(row) ind <- names(row) x[i, ] <- 0 x[i, ind] <- row } x } raredata <- rarecurve(rrarefy.custom(netdata, sample=100,rep.param=T), label=F, col=rgb(0, 0, 1, 0.1)) However, I like what you did to the rrarefy function to add the sample with replacement option. On Wed, Sep 7, 2016 at 8:17 AM, Luisfo <luisfo89 at yahoo.es> wrote:> Hi Nick, > > If you use the following > raredata <- rarecurve(rrarefy(netdata, sample=100), label=F, col=rgb(0, > 0, 1, 0.1)) > should work for any sample size, e.g. sample=100. > However, you will have a 'warning' if you don't have samples enough, because > it has not replacement. > > If you type 'rrarefy' on the R console (without brackets), or any other > function name, you will see the R code of the function. > rrarefy uses the function 'sample()' for sampling, but has no option for > replacement. > I did the following. I created my custom rrarefy function from the original. > rrarefy.custom <- function (x, sample, rep.param=F) > { > if (!identical(all.equal(x, round(x)), TRUE)) > stop("function is meaningful only for integers (counts)") > x <- as.matrix(x) > if (ncol(x) == 1) > x <- t(x) > if (length(sample) > 1 && length(sample) != nrow(x)) > stop(gettextf("length of 'sample' and number of rows of 'x' do not > match")) > sample <- rep(sample, length = nrow(x)) > colnames(x) <- colnames(x, do.NULL = FALSE) > nm <- colnames(x) > if (!rep.param && any(rowSums(x) < sample)) > warning("Some row sums < 'sample' and are not rarefied") > for (i in 1:nrow(x)) { > if (rep.param && sum(x[i, ]) <= sample[i]) > next > row <- sample(rep(nm, times = x[i, ]), sample[i], replace = rep.param) > row <- table(row) > ind <- names(row) > x[i, ] <- 0 > x[i, ind] <- row > } > x > } > You can check the differences with the original code if you type 'rrarefy' > on the R console. > > So now, if you type the following > raredata <- rarecurve(rrarefy.custom(netdata, sample=100,rep.param=T), > label=F, col=rgb(0, 0, 1, 0.1)) > you will have the desired behaviour. > > WARNING: I do not understand about rarefunction curves or communities in > your context. So, be careful when resampling. It might not be statistically > correct. > > Regards, > Luisfo Chiroque > PhD Student | PhD Candidate > IMDEA Networks Institute > http://fourier.networks.imdea.org/people/~luis_nunez/ > > On 09/07/2016 12:07 AM, Nick Pardikes wrote: > > I am currently having difficulty producing a graph using rarecurve in the > vegan package. I have produced rarefaction curves (seen below) using the > following code. > > > library(vegan) > > myMat <- round(matrix(rlnorm(2000), 50)) #creates distribution of > communities > > netdata <- as.data.frame(myMat) #generates a matrix of communities (rows), > species (columns) > > raredata <- rarecurve(netdata, label=F, col=rgb(0, 0, 1, 0.1)) #uses > rarecurve to plot a rarefaction for each individual community (n=50) > > > However I would like to produce a graph in which all rarefaction curves end > at the same sample size. For example, in this graph it would be great to > extend the x-axis (sample size) to 100 and have all curves end at this > point. Is there any way to use rarecurve to resample a community (row) with > replacement the same number of times for all 50 communities? With > replacement is important because the communities differ greatly in their > size (number of species). > > > I understand that rarefaction is useful to compare communities with > different sample efforts, but I would still like to generate the figure. My > actual data has 5000 simulated communities that differ greatly in matrix > size and number of samples. > > > Thank you in advance for your help and suggestions. > > > Cheers, > > Nick > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Nick Pardikes PhD Candidate Program in Ecology, Evolution, and Conservation Biology University of Nevada https://nickpardikes.wordpress.com/ nickpardikes at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: luis_rarefaction.pdf Type: application/pdf Size: 19167 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160907/483b4faa/attachment.pdf>
Stefan Evert
2016-Sep-08 12:46 UTC
[R] Resample with replacement to produce many rarefaction curves with same number of samples
> On 7 Sep 2016, at 00:07, Nick Pardikes <nickpardikes at gmail.com> wrote: > > Is there any way to use rarecurve to resample a community (row) with > replacement the same number of times for all 50 communities? With > replacement is important because the communities differ greatly in their > size (number of species).Are you sure it makes sense to resample with replacement? This will systematically underestimate the number of species at a given sample size (because of the artificial repetition) and will never find more species than there are in your original sample. Best, Stefan
David L Carlson
2016-Sep-08 14:25 UTC
[R] Resample with replacement to produce many rarefaction curves with same number of samples
Sampling without replacement will never find more species than there are in your original sample either! Sampling without replacement treats the sample as the population for the purposes of estimating the outcomes at smaller sample sizes. Sampling with replacement (the same as bootstrapping) treats the sample as one possible outcome of a larger population at that sample size. There is another consideration. A zero value means different things at different sample sizes. At sample size 10, it means approximately less than 10%, but at sample size 100, it means approximately less than 1%, and so on. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Stefan Evert Sent: Thursday, September 8, 2016 7:46 AM To: Nick Pardikes Cc: R-help Mailing List Subject: Re: [R] Resample with replacement to produce many rarefaction curves with same number of samples> On 7 Sep 2016, at 00:07, Nick Pardikes <nickpardikes at gmail.com> wrote: > > Is there any way to use rarecurve to resample a community (row) with > replacement the same number of times for all 50 communities? With > replacement is important because the communities differ greatly in their > size (number of species).Are you sure it makes sense to resample with replacement? This will systematically underestimate the number of species at a given sample size (because of the artificial repetition) and will never find more species than there are in your original sample. Best, Stefan ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.