Hi All, I am looking at applications of percentiles to time sequenced data. I had just been using the quantile function to get percentiles over various periods, but am more interested in if there is an accepted (and/or R-implemented) method to apply weighting to the data so as to weigh recent data more heavily. I wrote the following function, but it seems quite inefficient, and not really very flexible in its applications - so if anyone has any suggestions on how to look at quantiles/percentiles within R while also using a weighting schema, I would be very interested. Note - this function supposes the data in X is time-sequenced, with the most recent (and thus heaviest weighted) data at the end of the vector WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1)) { Xprime <- NA for(i in 1:length(X)) { Xprime <- c(Xprime, rep(X[i], times=i)) } print("Percentiles:") print(quantile(X, pctile)) print("Weighted:") print(Xprime) print("Weighted Percentiles:") print(quantile(Xprime, pctile, na.rm=TRUE)) } WtPercentile(1:10) WtPercentile(rnorm(10)) [[alternative HTML version deleted]]
I do know that Harrell's Quantile function in the Hmisc package will allow quantile estimates from models. Whether it is general enough to extend to time series, I have no experience and cannot say. -- David Winsemius On Feb 17, 2009, at 11:57 AM, Brigid Mooney wrote:> Hi All, > > I am looking at applications of percentiles to time sequenced data. > I had > just been using the quantile function to get percentiles over various > periods, but am more interested in if there is an accepted (and/or > R-implemented) method to apply weighting to the data so as to weigh > recent > data more heavily. > > I wrote the following function, but it seems quite inefficient, and > not > really very flexible in its applications - so if anyone has any > suggestions > on how to look at quantiles/percentiles within R while also using a > weighting schema, I would be very interested. > > Note - this function supposes the data in X is time-sequenced, with > the most > recent (and thus heaviest weighted) data at the end of the vector > > WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1)) > { > Xprime <- NA > > for(i in 1:length(X)) > { > Xprime <- c(Xprime, rep(X[i], times=i)) > } > > print("Percentiles:") > print(quantile(X, pctile)) > print("Weighted:") > print(Xprime) > print("Weighted Percentiles:") > print(quantile(Xprime, pctile, na.rm=TRUE)) > } > > WtPercentile(1:10) > WtPercentile(rnorm(10)) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks for pointing me to the quantreg package as a resource. I was hoping to ask be able to address one quick follow-up question... I get slightly different variants between using the rq funciton with formula = mydata ~ 1 as I would if I ran the same data using the quantile function. Example: mydata <- (1:10)^2/2 pctile <- seq(.59, .99, .1) quantile(mydata, pctile) 59% 69% 79% 89% 99% 20.015 26.075 32.935 40.595 49.145 rq(mydata~1, tau=pctile) Call: rq(formula = mydata ~ 1, tau = pctile) Coefficients: tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99 (Intercept) 18 24.5 32 40.5 50 Degrees of freedom: 10 total; 9 residual Is it correct to assume this is due to the different accepted methods of calculating quantiles? If so, do you know where I would be able to see the algorithms used in these functions? I'm not finding it in the documentation for function rq, and am new enough to R that I don't know where those references would generally be. On Tue, Feb 17, 2009 at 12:29 PM, roger koenker <rkoenker@uiuc.edu> wrote:> http://www.nabble.com/weighted-quantiles-to19864562.html#a19865869 > > gives one possibility... > > url: www.econ.uiuc.edu/~roger Roger Koenker > email rkoenker@uiuc.edu Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 Champaign, IL 61820 > > > > > On Feb 17, 2009, at 10:57 AM, Brigid Mooney wrote: > > Hi All, >> >> I am looking at applications of percentiles to time sequenced data. I had >> just been using the quantile function to get percentiles over various >> periods, but am more interested in if there is an accepted (and/or >> R-implemented) method to apply weighting to the data so as to weigh recent >> data more heavily. >> >> I wrote the following function, but it seems quite inefficient, and not >> really very flexible in its applications - so if anyone has any >> suggestions >> on how to look at quantiles/percentiles within R while also using a >> weighting schema, I would be very interested. >> >> Note - this function supposes the data in X is time-sequenced, with the >> most >> recent (and thus heaviest weighted) data at the end of the vector >> >> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1)) >> { >> Xprime <- NA >> >> for(i in 1:length(X)) >> { >> Xprime <- c(Xprime, rep(X[i], times=i)) >> } >> >> print("Percentiles:") >> print(quantile(X, pctile)) >> print("Weighted:") >> print(Xprime) >> print("Weighted Percentiles:") >> print(quantile(Xprime, pctile, na.rm=TRUE)) >> } >> >> WtPercentile(1:10) >> WtPercentile(rnorm(10)) >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]
Here is one kind of weighted quantile function. The basic idea is very simple: wquantile <- function( v, w, p ) { v <- v[order(v)] w <- w[order(v)] v [ which.max( cumsum(w) / sum(w) >= p ) ] } With some more error-checking and general clean-up, it looks like this: # Simple weighted quantile # # v A numeric vector of observations # w A numeric vector of positive weights # p The probability 0<=p<=1 # # Nothing fancy: no interpolation etc. # Basic idea wquantile <- function( v, w, p ) { v <- v[order(v)] w <- w[order(v)] v [ which.max( cumsum(w) / sum(w) >= p ) ] } # Simple weighted quantile # # v A numeric vector of observations # w A numeric vector of positive weights # p The probability 0<=p<=1 # # Nothing fancy: no interpolation etc. wquantile <- function(v,w=rep(1,length(v)),p=.5) { if (!is.numeric(v) || !is.numeric(w) || length(v) != length(w)) stop("Values and weights must be equal-length numeric vectors") if ( !is.numeric(p) || any( p<0 | p>1 ) ) stop("Quantiles must be 0<=p<=1") ranking <- order(v) sumw <- cumsum(w[ranking]) if ( is.na(w[1]) || w[1]<0 ) stop("Weights must be non-negative numbers") plist <- sumw/sumw[length(sumw)] sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ]) } I would appreciate any comments people have on this -- whether correctness, efficiency, style, .... -s On Tue, Feb 17, 2009 at 11:57 AM, Brigid Mooney <bkmooney at gmail.com> wrote:> Hi All, > > I am looking at applications of percentiles to time sequenced data. I had > just been using the quantile function to get percentiles over various > periods, but am more interested in if there is an accepted (and/or > R-implemented) method to apply weighting to the data so as to weigh recent > data more heavily. > > I wrote the following function, but it seems quite inefficient, and not > really very flexible in its applications - so if anyone has any suggestions > on how to look at quantiles/percentiles within R while also using a > weighting schema, I would be very interested. > > Note - this function supposes the data in X is time-sequenced, with the most > recent (and thus heaviest weighted) data at the end of the vector > > WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1)) > { > Xprime <- NA > > for(i in 1:length(X)) > { > Xprime <- c(Xprime, rep(X[i], times=i)) > } > > print("Percentiles:") > print(quantile(X, pctile)) > print("Weighted:") > print(Xprime) > print("Weighted Percentiles:") > print(quantile(Xprime, pctile, na.rm=TRUE)) > } > > WtPercentile(1:10) > WtPercentile(rnorm(10)) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Some minor improvements and corrections below # Simple weighted quantile # # v A vector of sortable observations # w A numeric vector of positive weights # p The quantile 0<=p<=1 # # Nothing fancy: no interpolation etc.; NA cases not thought through wquantile <- function(v,w=rep(1,length(v)),p=.5) { if ( !is.numeric(w) || length(v) != length(w) ) stop("Values and weights must be equal-length vectors") if ( !is.numeric(p) || any( p<0 | p>1) ) stop("Quantiles must be 0<=p<=1") if ( min(w) < 0 ) stop("Weights must be non-negative numbers") ranking <- order(v) sumw <- cumsum(w[ranking]) plist <- sumw / sumw[ length(sumw) ] sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ] ) } I would appreciate any comments people have on this -- whether correctness, efficiency, style, .... -s
Stavros Macrakis <macrakis <at> alum.mit.edu> writes:> > Some minor improvements and corrections below > > # Simple weighted quantile > # > # v A vector of sortable observations > # w A numeric vector of positive weights > # p The quantile 0<=p<=1 > # > # Nothing fancy: no interpolation etc.; NA cases not thought through > > wquantile <- function(v,w=rep(1,length(v)),p=.5)You could compare it with wtd.quantile in Hmisc. Dieter
On Tue, 17 Feb 2009, Brigid Mooney wrote:> Thanks for pointing me to the quantreg package as a resource. I was hoping > to ask be able to address one quick follow-up question... > > I get slightly different variants between using the rq funciton with formula > = mydata ~ 1 as I would if I ran the same data using the quantile function. > > Example: > > mydata <- (1:10)^2/2 > pctile <- seq(.59, .99, .1) > > quantile(mydata, pctile) > 59% 69% 79% 89% 99% > 20.015 26.075 32.935 40.595 49.145 > > rq(mydata~1, tau=pctile) > Call: > rq(formula = mydata ~ 1, tau = pctile) > Coefficients: > tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99 > (Intercept) 18 24.5 32 40.5 50 > Degrees of freedom: 10 total; 9 residual > > Is it correct to assume this is due to the different accepted methods of > calculating quantiles?If you try lapply(1:9, function(i)quantile(mydata, pctile,type=i)) the answers from type=1 or 2 agree with rq(). -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle