I often work with tree data that is sampled with probability proportional to size, which presents a special challenge when describing the frequency distribution. For example, R functions like quantile() and fitdistr() expect each observation to have equal sample probability. As a workaround, I have been "exploding"/"mushrooming" my data based on the appropriate expansion factors. However, this can take a LONG TIME and I am reaching out for more efficient suggestions, particularly for the quantile() function. Example of my workaround: # trees.df represents random sample with probability proportional to size (of diameter) using "basal area factor" of 20 trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), TreesPerAcre=numeric(10)) trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576) # expansion factor for each observation # to obtain percentiles that are weighted by trees per acre, "explode" diameter data explodeFactor <- 10 # represents ten acres treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) k=0 # initialize counter k for (i in 1:length(trees.df$Diameter)){ for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ k <- k +1 explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] } } quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for trees per acre) quantile(trees.df$Diameter) # percentiles biased upwards Trevor Walker 14906 McKemey Pl. Charlotte, NC 28277 Cell: (936)591-2130 Email: trevordaviswalker@gmail.com [[alternative HTML version deleted]]
Rather than "exploding", I suggest you order your data according to tree diameter, then calculate the cumulative sum of the tree densities, and use linear interpolation to estimate the percentiles. For example ... library(plotrix) attach(trees.df) ord <- order(Diameter) CumDensOrdScaled <- rescale(cumsum(TreesPerAcre[ord]), 0:1) DiamOrd <- Diameter[ord] Quants <- c(0, 0.25, 0.5, 0.75, 1) Perc <- approx(x=CumDensScaled, y=DiamOrd, xout=Quants)$y plot(DiamOrd, CumDensScaled, type="o", lwd=2) abline(h=Quants, lty=2) abline(v=Perc, lty=2) Jean On Wed, Nov 20, 2013 at 1:35 PM, Trevor Walker <trevordaviswalker@gmail.com>wrote:> I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution. For example, R functions like quantile() and fitdistr() > expect each observation to have equal sample probability. As a workaround, > I have been "exploding"/"mushrooming" my data based on the appropriate > expansion factors. However, this can take a LONG TIME and I am reaching > out for more efficient suggestions, particularly for the quantile() > function. Example of my workaround: > > # trees.df represents random sample with probability proportional to size > (of diameter) using "basal area factor" of 20 > trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), > TreesPerAcre=numeric(10)) > trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576) # expansion > factor for each observation > > # to obtain percentiles that are weighted by trees per acre, "explode" > diameter data > explodeFactor <- 10 # represents ten acres > treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) > explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) > k=0 # initialize counter k > for (i in 1:length(trees.df$Diameter)){ > for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ > k <- k +1 > explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] > } > } > > quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for > trees per acre) > quantile(trees.df$Diameter) # percentiles biased upwards > > > > Trevor Walker > 14906 McKemey Pl. > Charlotte, NC 28277 > Cell: (936)591-2130 > Email: trevordaviswalker@gmail.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote:> I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution. For example, R functions like quantile() and fitdistr() > expect each observation to have equal sample probability. As a workaround, > I have been "exploding"/"mushrooming" my data based on the appropriate > expansion factors. However, this can take a LONG TIME and I am reaching > out for more efficient suggestions, particularly for the quantile() > function. Example of my workaround: >The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it might have been borrowed from the quantreg package.> # trees.df represents random sample with probability proportional to size > (of diameter) using "basal area factor" of 20 > trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), > TreesPerAcre=numeric(10)) > trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576) # expansion > factor for each observation > > # to obtain percentiles that are weighted by trees per acre, "explode" > diameter data > explodeFactor <- 10 # represents ten acres > treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) > explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) > k=0 # initialize counter k > for (i in 1:length(trees.df$Diameter)){ > for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ > k <- k +1 > explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] > } > } > > quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for > trees per acre) > quantile(trees.df$Diameter) # percentiles biased upwards > > > > Trevor Walker >-- David Winsemius Alameda, CA, USA
You could try: require(quantreg) qs <- rq(x ~ 1, weights = w, tau = 1:3/4) Roger Koenker rkoenker at illinois.edu On Nov 20, 2013, at 4:56 PM, David Winsemius wrote:> > On Nov 20, 2013, at 11:35 AM, Trevor Walker wrote: > >> I often work with tree data that is sampled with probability proportional >> to size, which presents a special challenge when describing the frequency >> distribution. For example, R functions like quantile() and fitdistr() >> expect each observation to have equal sample probability. As a workaround, >> I have been "exploding"/"mushrooming" my data based on the appropriate >> expansion factors. However, this can take a LONG TIME and I am reaching >> out for more efficient suggestions, particularly for the quantile() >> function. Example of my workaround: >> > > The 'Hmisc' package has a `wtd.quantile` function. I seem to remember that it might have been borrowed from the quantreg package. > > >> # trees.df represents random sample with probability proportional to size >> (of diameter) using "basal area factor" of 20 >> trees.df <- data.frame(Diameter=rnorm(10, mean=10, sd=2), >> TreesPerAcre=numeric(10)) >> trees.df$TreesPerAcre <- 20/(trees.df$Diameter^2*pi/576) # expansion >> factor for each observation >> >> # to obtain percentiles that are weighted by trees per acre, "explode" >> diameter data >> explodeFactor <- 10 # represents ten acres >> treeCount <- sum(round(trees.df$TreesPerAcre*explodeFactor )) >> explodedDiameters.df <- data.frame(Diameter=numeric(treeCount)) >> k=0 # initialize counter k >> for (i in 1:length(trees.df$Diameter)){ >> for (j in 1:round(trees.df$TreesPerAcre[i]*explodeFactor)){ >> k <- k +1 >> explodedDiameters.df$Diameter[k] <- trees.df$Diameter[i] >> } >> } >> >> quantile(explodedDiameters.df$Diameter) # appropriate percentiles (for >> trees per acre) >> quantile(trees.df$Diameter) # percentiles biased upwards >> >> >> >> Trevor Walker >> > -- > > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Thu, Nov 21, 2013 at 8:35 AM, Trevor Walker <trevordaviswalker@gmail.com>wrote:> I often work with tree data that is sampled with probability proportional > to size, which presents a special challenge when describing the frequency > distribution.The survey package does lots of calculations (including quantiles) for unequal-probability samples. -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]]