thr3ads.net - R help - [R] Confidence intervals of quantiles [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Mike White

2007-Feb-05 13:47 UTC

[R] Confidence intervals of quantiles

Can anyone please tell me if there is a function to calculate confidence
intervals for the results of the quantile function.
Some of my data is normally distributed but some is also a squewed
distribution or a capped normal distribution. Some of the data sets contain
about 700 values whereas others are smaller with about 100-150 values, so I
would like to see how the confidence intervals change for the different
distributions and different data sizes.

Thanks
Mike White

Dimitris Rizopoulos

2007-Feb-05 14:43 UTC

head link

[R] Confidence intervals of quantiles

you could use the Bootstrap method, using package 'boot', e.g.,

library(boot)

f.quantile <- function(x, ind, ...){
    quantile(x[ind], ...)
}

###########

x <- rgamma(750, 2)
quant.boot <- boot(x, f.quantile, R = 1000, probs = c(0.025, 0.25, 
0.5, 0.75, 0.975))
lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type = 
c("perc", "bca"), index = i))

y <- rgamma(150, 2)
quant.boot <- boot(y, f.quantile, R = 1000, probs = c(0.025, 0.25, 
0.5, 0.75, 0.975))
lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type = 
c("perc", "bca"), index = i))


However, you should be a little bit careful with Bootstrap if you wish 
to obtain CIs for extreme quantiles in small samples.

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Mike White" <mikewhite.diu at btconnect.com>
To: <R-help at stat.math.ethz.ch>
Sent: Monday, February 05, 2007 2:47 PM
Subject: [R] Confidence intervals of quantiles

> Can anyone please tell me if there is a function to calculate 
> confidence
> intervals for the results of the quantile function.
> Some of my data is normally distributed but some is also a squewed
> distribution or a capped normal distribution. Some of the data sets 
> contain
> about 700 values whereas others are smaller with about 100-150 
> values, so I
> would like to see how the confidence intervals change for the 
> different
> distributions and different data sizes.
>
> Thanks
> Mike White
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

(Ted Harding)

2007-Feb-05 23:41 UTC

head link

[R] Confidence intervals of quantiles

On 05-Feb-07 Mike White wrote:> Can anyone please tell me if there is a function to calculate
> confidence intervals for the results of the quantile function.
> Some of my data is normally distributed but some is also a
> squewed distribution or a capped normal distribution. Some of
> the data sets contain about 700 values whereas others are smaller
> with about 100-150 values, so I would like to see how the confidence
> intervals change for the different distributions and different data
> sizes.
As well as the bootstrap suggestion (which may do what you want)
I'd like to suggest calcualting exact CIs for the distribution
quantiles. I don't know of an R function which does this, though
the principle is straightforward enough (I once implemented it
for octave/matlab).

I'm assuming you're after a truly distribution-free CI (as your
query suggests). In that case, the sample quantiles provide the
CI limits for the distribution quantiles, with the proviso that
you will not in general get an exact match for the confidence
level P that you desire for your confidence interval, so you
need to use P as a lower bound for the confidence level.

I'll illustrate with an equi-tailed  95% CI.

Notation:

x[r] (r = 1:n) is the r-th order statistic of a sample of n
values of a random variable X, drawn from a continuous distribution.
X[r] is the corresponding random variable.

X(p) is the p-th quantile of the distribution in question,
i.e. the value of X such that P(X <= X[p]) = p. 0 < p < 1.

Objective: You want a 95% CI (X(p)L,X(p).R) for the quantile X(p)
for given p.

Principle:

  P( X[r] <= X(p) ) = pbinom(r, n, p)      [*]

(i.e. the probability that you get at least r of the x's
below X(p)).

Now, for the upper limit X(p).R of the CI, you want the
smallest value of X(p) such that the probability [*] is not
less than 0.925 (i.e. 1 - (1-P)/2 = 1 - (1-0.95)/2 ).

This is achieved by X(p).R = x[s] where

  s = min(which(pbinom((0:n),n,p) >= 0.025))

and, similalry, for the lower limit, X(p).L = x[r] where

  r = max(which(pbinom((0:n),n,p) <= 0.025))

(Note that the "which" counts the binomial case "r=0" as
number 1)

If I've got my head around the above right, the following
function does the above job, and caould be fairly easily
modified for asymmetric confidence intervals (e.g. a 95%
confidence interval with 96% for inclusion below the upper
limit and 99% for inclusion above the lower limit).

q.CI<-function(x,p,P){
# x is the sample, p (0<p<1) the quantile, P the confidence level
  x<-sort(x)
  n<-length(x)
  s <- min(which(pbinom((0:n),n,p) >= 1-(1-P)/2))
  r <- max(which(pbinom((0:n),n,p) <= (1-P)/2))
  c(x[r],x[s])
# x[r] is the lower limit, x[s] the upper limit, of the CI
}

Note that it gives a confidence level P at least equal to P,
not in general exactly equal to P.

I've tested it as follows (10000 simulations with samples
of size 101 from a Normal distribution -- why not, after
all? No loss of generality!):

p<-0.5; Xq<-qnorm(p); P<-0.95
N<-10000; incls<-numeric(N)
for(i in (1:N)){
  x<-rnorm(101)
  CI<-q.CI(x,p,P)
  if((CI[1]<=Xq)&(CI[2]>=Xq)){ incls[i]<-1 }
}; sum(incls)/N
# [1] 0.9564
# [1] 0.9548

p<-0.75; Xq<-qnorm(p); P<-0.95
[etc. ... ]
# [1] 0.9631
# [1] 0.9596

p<-0.90; Xq<-qnorm(p); P<-0.95
[etc. ... ]
# [1] 0.9540
# [1] 0.9533

p<-0.95; Xq<-qnorm(p); P<-0.95
[etc. ... ]
# [1] 0.9840
# [1] 0.9826

p<-0.99; Xq<-qnorm(p); P<-0.95
[etc. ... ]
Error in if ((CI[1] <= Xq) & (CI[2] >= Xq)) { : 
        missing value where TRUE/FALSE needed

showing that for extreme values of p relative to the
sample size, trouble occurs (as is to be expected)!

The solution is to test for "NA" in CI[1] and/or CI[2],
so I modified my test routine as follows, to give notional
lower confidence limit -Inf if CI[1] is NA, and/or upper
confidence limit +Inf if CI[2] is NA (of course this
could be done in the function q.CI() itself, but perhaps
it offers more flexibility for the user to do something
else with it, if it is left as NA).

p<-0.99; Xq<-qnorm(p); P<-0.95
N<-10000; incls<-numeric(N)
for(i in (1:N)){
  x<-rnorm(101)
  CI<-q.CI(x,p,P)
  if(is.na(CI[1])){ CI[1]<-(-Inf) }
  if(is.na(CI[2])){ CI[2]<-(+Inf) }
  if((CI[1]<=Xq)&(CI[2]>=Xq)){ incls[i]<-1 }
}; sum(incls)/N
# [1] 0.9841
# [1] 0.9821

Comments welcome!!!

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Feb-07                                       Time: 23:40:23
------------------------------ XFMail ------------------------------

mikewhite.diu at tiscali.co.uk

2007-Feb-09 16:51 UTC

head link

[R] Confidence intervals of quantiles

Many thanks for all the contributions to this problem.
As inferred by Ted Harding, I was after a distribution-free CIs as a lot
of
the data I use is not normally distributed.

The method provided by Ted for calculating exact CIs gave good results with
the limits almost symmetric about the quantile.  Even for my smaller data
set with only 115 samples and a very skewed distribution the results clearly
showed the increase in range and asymmetry of the confidence limits.

The bootstrap method provided by Dimitris also works reasonably well but
is
slower and the ranges for the CIs are sometimes very asymmetric and in one
case did not actually encompass the quantile. The method would not work at
all with the skewed distribution of 115 samples until I reduced the quantile
range from 0.1 and 0.9 to 0.2 and 0.8  However, as Dimitris warned, you have
to be careful with this method for extreme quantiles and small samples.

I was also sent a method by S?ren Merser, but this was incomplete.  This
method was written by Scott Chasalow and the full code can be found at
http://www.dpw.wau.nl/pv/pub/chasalow/S/win/ci.quantile/
The code was written for S-Plus but it worked ok for me in R.  This method
actually gives several ranges about the quantile, each with about the same
level of confidence and the level of confidence is also in the output. As
with Ted Harding's method, these may not exactly match the desired
confidence level.  There is an option to select the shortest range but it
would be easy enough to add code to give the most symmetric range.

As a chemist I am not able to comment on the statistical pros and cons of
the methods but they are certainly very helpful for my purposes.

Many thanks
Mike White

----- Original Message ----- 
From: "Dimitris Rizopoulos" <dimitris.rizopoulos at
med.kuleuven.be>
To: "Mike White" <mikewhite.diu at btconnect.com>
Cc: <R-help at stat.math.ethz.ch>
Sent: Monday, February 05, 2007 2:43 PM
Subject: Re: [R] Confidence intervals of quantiles

> you could use the Bootstrap method, using package 'boot', e.g.,
>
> library(boot)
>
> f.quantile <- function(x, ind, ...){
>     quantile(x[ind], ...)
> }
>
> ###########
>
> x <- rgamma(750, 2)
> quant.boot <- boot(x, f.quantile, R = 1000, probs = c(0.025, 0.25,
> 0.5, 0.75, 0.975))
> lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type >
c("perc", "bca"), index = i))
>
> y <- rgamma(150, 2)
> quant.boot <- boot(y, f.quantile, R = 1000, probs = c(0.025, 0.25,
> 0.5, 0.75, 0.975))
> lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type >
c("perc", "bca"), index = i))
>
>
> However, you should be a little bit careful with Bootstrap if you wish
> to obtain CIs for extreme quantiles in small samples.
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Ph.D. Student
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://med.kuleuven.be/biostat/
>      http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
> ----- Original Message ----- 
> From: "Mike White" <mikewhite.diu at btconnect.com>
> To: <R-help at stat.math.ethz.ch>
> Sent: Monday, February 05, 2007 2:47 PM
> Subject: [R] Confidence intervals of quantiles
>
>
> > Can anyone please tell me if there is a function to calculate
> > confidence
> > intervals for the results of the quantile function.
> > Some of my data is normally distributed but some is also a squewed
> > distribution or a capped normal distribution. Some of the data sets
> > contain
> > about 700 values whereas others are smaller with about 100-150
> > values, so I
> > would like to see how the confidence intervals change for the
> > different
> > distributions and different data sizes.
> >
> > Thanks
> > Mike White
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>

___________________________________________________________

Tiscali Broadband only 9.99 a month for your first 3 months!
http://www.tiscali.co.uk/products/broadband/

Reasonably Related Threads

Search for more apparently analagous threads

R help - Feb 2007 - Confidence intervals of quantiles

[R] Confidence intervals of quantiles

[R] Confidence intervals of quantiles

[R] Confidence intervals of quantiles

[R] Confidence intervals of quantiles

Reasonably Related Threads