Hi All,
I am looking at applications of percentiles to time sequenced data. I had
just been using the quantile function to get percentiles over various
periods, but am more interested in if there is an accepted (and/or
R-implemented) method to apply weighting to the data so as to weigh recent
data more heavily.
I wrote the following function, but it seems quite inefficient, and not
really very flexible in its applications - so if anyone has any suggestions
on how to look at quantiles/percentiles within R while also using a
weighting schema, I would be very interested.
Note - this function supposes the data in X is time-sequenced, with the most
recent (and thus heaviest weighted) data at the end of the vector
WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
{
Xprime <- NA
for(i in 1:length(X))
{
Xprime <- c(Xprime, rep(X[i], times=i))
}
print("Percentiles:")
print(quantile(X, pctile))
print("Weighted:")
print(Xprime)
print("Weighted Percentiles:")
print(quantile(Xprime, pctile, na.rm=TRUE))
}
WtPercentile(1:10)
WtPercentile(rnorm(10))
[[alternative HTML version deleted]]
I do know that Harrell's Quantile function in the Hmisc package will allow quantile estimates from models. Whether it is general enough to extend to time series, I have no experience and cannot say. -- David Winsemius On Feb 17, 2009, at 11:57 AM, Brigid Mooney wrote:> Hi All, > > I am looking at applications of percentiles to time sequenced data. > I had > just been using the quantile function to get percentiles over various > periods, but am more interested in if there is an accepted (and/or > R-implemented) method to apply weighting to the data so as to weigh > recent > data more heavily. > > I wrote the following function, but it seems quite inefficient, and > not > really very flexible in its applications - so if anyone has any > suggestions > on how to look at quantiles/percentiles within R while also using a > weighting schema, I would be very interested. > > Note - this function supposes the data in X is time-sequenced, with > the most > recent (and thus heaviest weighted) data at the end of the vector > > WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1)) > { > Xprime <- NA > > for(i in 1:length(X)) > { > Xprime <- c(Xprime, rep(X[i], times=i)) > } > > print("Percentiles:") > print(quantile(X, pctile)) > print("Weighted:") > print(Xprime) > print("Weighted Percentiles:") > print(quantile(Xprime, pctile, na.rm=TRUE)) > } > > WtPercentile(1:10) > WtPercentile(rnorm(10)) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks for pointing me to the quantreg package as a resource. I was hoping
to ask be able to address one quick follow-up question...
I get slightly different variants between using the rq funciton with formula
= mydata ~ 1 as I would if I ran the same data using the quantile function.
Example:
mydata <- (1:10)^2/2
pctile <- seq(.59, .99, .1)
quantile(mydata, pctile)
59% 69% 79% 89% 99%
20.015 26.075 32.935 40.595 49.145
rq(mydata~1, tau=pctile)
Call:
rq(formula = mydata ~ 1, tau = pctile)
Coefficients:
tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99
(Intercept) 18 24.5 32 40.5 50
Degrees of freedom: 10 total; 9 residual
Is it correct to assume this is due to the different accepted methods of
calculating quantiles? If so, do you know where I would be able to see the
algorithms used in these functions? I'm not finding it in the documentation
for function rq, and am new enough to R that I don't know where those
references would generally be.
On Tue, Feb 17, 2009 at 12:29 PM, roger koenker <rkoenker@uiuc.edu> wrote:
> http://www.nabble.com/weighted-quantiles-to19864562.html#a19865869
>
> gives one possibility...
>
> url: www.econ.uiuc.edu/~roger Roger Koenker
> email rkoenker@uiuc.edu Department of Economics
> vox: 217-333-4558 University of Illinois
> fax: 217-244-6678 Champaign, IL 61820
>
>
>
>
> On Feb 17, 2009, at 10:57 AM, Brigid Mooney wrote:
>
> Hi All,
>>
>> I am looking at applications of percentiles to time sequenced data. I
had
>> just been using the quantile function to get percentiles over various
>> periods, but am more interested in if there is an accepted (and/or
>> R-implemented) method to apply weighting to the data so as to weigh
recent
>> data more heavily.
>>
>> I wrote the following function, but it seems quite inefficient, and not
>> really very flexible in its applications - so if anyone has any
>> suggestions
>> on how to look at quantiles/percentiles within R while also using a
>> weighting schema, I would be very interested.
>>
>> Note - this function supposes the data in X is time-sequenced, with the
>> most
>> recent (and thus heaviest weighted) data at the end of the vector
>>
>> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
>> {
>> Xprime <- NA
>>
>> for(i in 1:length(X))
>> {
>> Xprime <- c(Xprime, rep(X[i], times=i))
>> }
>>
>> print("Percentiles:")
>> print(quantile(X, pctile))
>> print("Weighted:")
>> print(Xprime)
>> print("Weighted Percentiles:")
>> print(quantile(Xprime, pctile, na.rm=TRUE))
>> }
>>
>> WtPercentile(1:10)
>> WtPercentile(rnorm(10))
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]]
Here is one kind of weighted quantile function.
The basic idea is very simple:
wquantile <- function( v, w, p )
{
v <- v[order(v)]
w <- w[order(v)]
v [ which.max( cumsum(w) / sum(w) >= p ) ]
}
With some more error-checking and general clean-up, it looks like this:
# Simple weighted quantile
#
# v A numeric vector of observations
# w A numeric vector of positive weights
# p The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.
# Basic idea
wquantile <- function( v, w, p )
{
v <- v[order(v)]
w <- w[order(v)]
v [ which.max( cumsum(w) / sum(w) >= p ) ]
}
# Simple weighted quantile
#
# v A numeric vector of observations
# w A numeric vector of positive weights
# p The probability 0<=p<=1
#
# Nothing fancy: no interpolation etc.
wquantile <- function(v,w=rep(1,length(v)),p=.5)
{
if (!is.numeric(v) || !is.numeric(w) || length(v) != length(w))
stop("Values and weights must be equal-length numeric vectors")
if ( !is.numeric(p) || any( p<0 | p>1 ) )
stop("Quantiles must be 0<=p<=1")
ranking <- order(v)
sumw <- cumsum(w[ranking])
if ( is.na(w[1]) || w[1]<0 ) stop("Weights must be non-negative
numbers")
plist <- sumw/sumw[length(sumw)]
sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ])
}
I would appreciate any comments people have on this -- whether
correctness, efficiency, style, ....
-s
On Tue, Feb 17, 2009 at 11:57 AM, Brigid Mooney <bkmooney at gmail.com>
wrote:> Hi All,
>
> I am looking at applications of percentiles to time sequenced data. I had
> just been using the quantile function to get percentiles over various
> periods, but am more interested in if there is an accepted (and/or
> R-implemented) method to apply weighting to the data so as to weigh recent
> data more heavily.
>
> I wrote the following function, but it seems quite inefficient, and not
> really very flexible in its applications - so if anyone has any suggestions
> on how to look at quantiles/percentiles within R while also using a
> weighting schema, I would be very interested.
>
> Note - this function supposes the data in X is time-sequenced, with the
most
> recent (and thus heaviest weighted) data at the end of the vector
>
> WtPercentile <- function(X=rnorm(100), pctile=seq(.1,1,.1))
> {
> Xprime <- NA
>
> for(i in 1:length(X))
> {
> Xprime <- c(Xprime, rep(X[i], times=i))
> }
>
> print("Percentiles:")
> print(quantile(X, pctile))
> print("Weighted:")
> print(Xprime)
> print("Weighted Percentiles:")
> print(quantile(Xprime, pctile, na.rm=TRUE))
> }
>
> WtPercentile(1:10)
> WtPercentile(rnorm(10))
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Some minor improvements and corrections below
# Simple weighted quantile
#
# v A vector of sortable observations
# w A numeric vector of positive weights
# p The quantile 0<=p<=1
#
# Nothing fancy: no interpolation etc.; NA cases not thought through
wquantile <- function(v,w=rep(1,length(v)),p=.5)
{
if ( !is.numeric(w) || length(v) != length(w) )
stop("Values and weights must be equal-length vectors")
if ( !is.numeric(p) || any( p<0 | p>1) )
stop("Quantiles must be 0<=p<=1")
if ( min(w) < 0 ) stop("Weights must be non-negative numbers")
ranking <- order(v)
sumw <- cumsum(w[ranking])
plist <- sumw / sumw[ length(sumw) ]
sapply(p, function(p) v [ ranking [ which.max( plist >= p ) ] ] )
}
I would appreciate any comments people have on this -- whether
correctness, efficiency, style, ....
-s
Stavros Macrakis <macrakis <at> alum.mit.edu> writes:> > Some minor improvements and corrections below > > # Simple weighted quantile > # > # v A vector of sortable observations > # w A numeric vector of positive weights > # p The quantile 0<=p<=1 > # > # Nothing fancy: no interpolation etc.; NA cases not thought through > > wquantile <- function(v,w=rep(1,length(v)),p=.5)You could compare it with wtd.quantile in Hmisc. Dieter
On Tue, 17 Feb 2009, Brigid Mooney wrote:> Thanks for pointing me to the quantreg package as a resource. I was hoping > to ask be able to address one quick follow-up question... > > I get slightly different variants between using the rq funciton with formula > = mydata ~ 1 as I would if I ran the same data using the quantile function. > > Example: > > mydata <- (1:10)^2/2 > pctile <- seq(.59, .99, .1) > > quantile(mydata, pctile) > 59% 69% 79% 89% 99% > 20.015 26.075 32.935 40.595 49.145 > > rq(mydata~1, tau=pctile) > Call: > rq(formula = mydata ~ 1, tau = pctile) > Coefficients: > tau= 0.59 tau= 0.69 tau= 0.79 tau= 0.89 tau= 0.99 > (Intercept) 18 24.5 32 40.5 50 > Degrees of freedom: 10 total; 9 residual > > Is it correct to assume this is due to the different accepted methods of > calculating quantiles?If you try lapply(1:9, function(i)quantile(mydata, pctile,type=i)) the answers from type=1 or 2 agree with rq(). -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle