Can anyone please tell me if there is a function to calculate confidence intervals for the results of the quantile function. Some of my data is normally distributed but some is also a squewed distribution or a capped normal distribution. Some of the data sets contain about 700 values whereas others are smaller with about 100-150 values, so I would like to see how the confidence intervals change for the different distributions and different data sizes. Thanks Mike White
you could use the Bootstrap method, using package 'boot', e.g., library(boot) f.quantile <- function(x, ind, ...){ quantile(x[ind], ...) } ########### x <- rgamma(750, 2) quant.boot <- boot(x, f.quantile, R = 1000, probs = c(0.025, 0.25, 0.5, 0.75, 0.975)) lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type = c("perc", "bca"), index = i)) y <- rgamma(150, 2) quant.boot <- boot(y, f.quantile, R = 1000, probs = c(0.025, 0.25, 0.5, 0.75, 0.975)) lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type = c("perc", "bca"), index = i)) However, you should be a little bit careful with Bootstrap if you wish to obtain CIs for extreme quantiles in small samples. I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Mike White" <mikewhite.diu at btconnect.com> To: <R-help at stat.math.ethz.ch> Sent: Monday, February 05, 2007 2:47 PM Subject: [R] Confidence intervals of quantiles> Can anyone please tell me if there is a function to calculate > confidence > intervals for the results of the quantile function. > Some of my data is normally distributed but some is also a squewed > distribution or a capped normal distribution. Some of the data sets > contain > about 700 values whereas others are smaller with about 100-150 > values, so I > would like to see how the confidence intervals change for the > different > distributions and different data sizes. > > Thanks > Mike White > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
On 05-Feb-07 Mike White wrote:> Can anyone please tell me if there is a function to calculate > confidence intervals for the results of the quantile function. > Some of my data is normally distributed but some is also a > squewed distribution or a capped normal distribution. Some of > the data sets contain about 700 values whereas others are smaller > with about 100-150 values, so I would like to see how the confidence > intervals change for the different distributions and different data > sizes.As well as the bootstrap suggestion (which may do what you want) I'd like to suggest calcualting exact CIs for the distribution quantiles. I don't know of an R function which does this, though the principle is straightforward enough (I once implemented it for octave/matlab). I'm assuming you're after a truly distribution-free CI (as your query suggests). In that case, the sample quantiles provide the CI limits for the distribution quantiles, with the proviso that you will not in general get an exact match for the confidence level P that you desire for your confidence interval, so you need to use P as a lower bound for the confidence level. I'll illustrate with an equi-tailed 95% CI. Notation: x[r] (r = 1:n) is the r-th order statistic of a sample of n values of a random variable X, drawn from a continuous distribution. X[r] is the corresponding random variable. X(p) is the p-th quantile of the distribution in question, i.e. the value of X such that P(X <= X[p]) = p. 0 < p < 1. Objective: You want a 95% CI (X(p)L,X(p).R) for the quantile X(p) for given p. Principle: P( X[r] <= X(p) ) = pbinom(r, n, p) [*] (i.e. the probability that you get at least r of the x's below X(p)). Now, for the upper limit X(p).R of the CI, you want the smallest value of X(p) such that the probability [*] is not less than 0.925 (i.e. 1 - (1-P)/2 = 1 - (1-0.95)/2 ). This is achieved by X(p).R = x[s] where s = min(which(pbinom((0:n),n,p) >= 0.025)) and, similalry, for the lower limit, X(p).L = x[r] where r = max(which(pbinom((0:n),n,p) <= 0.025)) (Note that the "which" counts the binomial case "r=0" as number 1) If I've got my head around the above right, the following function does the above job, and caould be fairly easily modified for asymmetric confidence intervals (e.g. a 95% confidence interval with 96% for inclusion below the upper limit and 99% for inclusion above the lower limit). q.CI<-function(x,p,P){ # x is the sample, p (0<p<1) the quantile, P the confidence level x<-sort(x) n<-length(x) s <- min(which(pbinom((0:n),n,p) >= 1-(1-P)/2)) r <- max(which(pbinom((0:n),n,p) <= (1-P)/2)) c(x[r],x[s]) # x[r] is the lower limit, x[s] the upper limit, of the CI } Note that it gives a confidence level P at least equal to P, not in general exactly equal to P. I've tested it as follows (10000 simulations with samples of size 101 from a Normal distribution -- why not, after all? No loss of generality!): p<-0.5; Xq<-qnorm(p); P<-0.95 N<-10000; incls<-numeric(N) for(i in (1:N)){ x<-rnorm(101) CI<-q.CI(x,p,P) if((CI[1]<=Xq)&(CI[2]>=Xq)){ incls[i]<-1 } }; sum(incls)/N # [1] 0.9564 # [1] 0.9548 p<-0.75; Xq<-qnorm(p); P<-0.95 [etc. ... ] # [1] 0.9631 # [1] 0.9596 p<-0.90; Xq<-qnorm(p); P<-0.95 [etc. ... ] # [1] 0.9540 # [1] 0.9533 p<-0.95; Xq<-qnorm(p); P<-0.95 [etc. ... ] # [1] 0.9840 # [1] 0.9826 p<-0.99; Xq<-qnorm(p); P<-0.95 [etc. ... ] Error in if ((CI[1] <= Xq) & (CI[2] >= Xq)) { : missing value where TRUE/FALSE needed showing that for extreme values of p relative to the sample size, trouble occurs (as is to be expected)! The solution is to test for "NA" in CI[1] and/or CI[2], so I modified my test routine as follows, to give notional lower confidence limit -Inf if CI[1] is NA, and/or upper confidence limit +Inf if CI[2] is NA (of course this could be done in the function q.CI() itself, but perhaps it offers more flexibility for the user to do something else with it, if it is left as NA). p<-0.99; Xq<-qnorm(p); P<-0.95 N<-10000; incls<-numeric(N) for(i in (1:N)){ x<-rnorm(101) CI<-q.CI(x,p,P) if(is.na(CI[1])){ CI[1]<-(-Inf) } if(is.na(CI[2])){ CI[2]<-(+Inf) } if((CI[1]<=Xq)&(CI[2]>=Xq)){ incls[i]<-1 } }; sum(incls)/N # [1] 0.9841 # [1] 0.9821 Comments welcome!!! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 05-Feb-07 Time: 23:40:23 ------------------------------ XFMail ------------------------------
mikewhite.diu at tiscali.co.uk
2007-Feb-09 16:51 UTC
[R] Confidence intervals of quantiles
Many thanks for all the contributions to this problem. As inferred by Ted Harding, I was after a distribution-free CIs as a lot of the data I use is not normally distributed. The method provided by Ted for calculating exact CIs gave good results with the limits almost symmetric about the quantile. Even for my smaller data set with only 115 samples and a very skewed distribution the results clearly showed the increase in range and asymmetry of the confidence limits. The bootstrap method provided by Dimitris also works reasonably well but is slower and the ranges for the CIs are sometimes very asymmetric and in one case did not actually encompass the quantile. The method would not work at all with the skewed distribution of 115 samples until I reduced the quantile range from 0.1 and 0.9 to 0.2 and 0.8 However, as Dimitris warned, you have to be careful with this method for extreme quantiles and small samples. I was also sent a method by S?ren Merser, but this was incomplete. This method was written by Scott Chasalow and the full code can be found at http://www.dpw.wau.nl/pv/pub/chasalow/S/win/ci.quantile/ The code was written for S-Plus but it worked ok for me in R. This method actually gives several ranges about the quantile, each with about the same level of confidence and the level of confidence is also in the output. As with Ted Harding's method, these may not exactly match the desired confidence level. There is an option to select the shortest range but it would be easy enough to add code to give the most symmetric range. As a chemist I am not able to comment on the statistical pros and cons of the methods but they are certainly very helpful for my purposes. Many thanks Mike White ----- Original Message ----- From: "Dimitris Rizopoulos" <dimitris.rizopoulos at med.kuleuven.be> To: "Mike White" <mikewhite.diu at btconnect.com> Cc: <R-help at stat.math.ethz.ch> Sent: Monday, February 05, 2007 2:43 PM Subject: Re: [R] Confidence intervals of quantiles> you could use the Bootstrap method, using package 'boot', e.g., > > library(boot) > > f.quantile <- function(x, ind, ...){ > quantile(x[ind], ...) > } > > ########### > > x <- rgamma(750, 2) > quant.boot <- boot(x, f.quantile, R = 1000, probs = c(0.025, 0.25, > 0.5, 0.75, 0.975)) > lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type > c("perc", "bca"), index = i)) > > y <- rgamma(150, 2) > quant.boot <- boot(y, f.quantile, R = 1000, probs = c(0.025, 0.25, > 0.5, 0.75, 0.975)) > lapply(1:5, function(i) boot.ci(quant.boot, c(0.90, 0.95), type > c("perc", "bca"), index = i)) > > > However, you should be a little bit careful with Bootstrap if you wish > to obtain CIs for extreme quantiles in small samples. > > I hope it helps. > > Best, > Dimitris > > ---- > Dimitris Rizopoulos > Ph.D. Student > Biostatistical Centre > School of Public Health > Catholic University of Leuven > > Address: Kapucijnenvoer 35, Leuven, Belgium > Tel: +32/(0)16/336899 > Fax: +32/(0)16/337015 > Web: http://med.kuleuven.be/biostat/ > http://www.student.kuleuven.be/~m0390867/dimitris.htm > > > ----- Original Message ----- > From: "Mike White" <mikewhite.diu at btconnect.com> > To: <R-help at stat.math.ethz.ch> > Sent: Monday, February 05, 2007 2:47 PM > Subject: [R] Confidence intervals of quantiles > > > > Can anyone please tell me if there is a function to calculate > > confidence > > intervals for the results of the quantile function. > > Some of my data is normally distributed but some is also a squewed > > distribution or a capped normal distribution. Some of the data sets > > contain > > about 700 values whereas others are smaller with about 100-150 > > values, so I > > would like to see how the confidence intervals change for the > > different > > distributions and different data sizes. > > > > Thanks > > Mike White > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > >___________________________________________________________ Tiscali Broadband only 9.99 a month for your first 3 months! http://www.tiscali.co.uk/products/broadband/