thr3ads.net - R help - [R] CDF of Sample Quantile [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Bentley Coffey

2011-Feb-14 14:58 UTC

[R] CDF of Sample Quantile

I need to calculate the probability that a sample quantile will exceed a
threshold given the size of the iid sample and the parameters describing the
distribution of each observation (normal, in my case). I can compute the
probability with brute force simulation: simulate a size N sample, apply R's
quantile() function on it, compare it to the threshold, replicate this MANY
times, and count the number of times the sample quantile exceeded the
threshold (dividing by the total number of replications yields the
probability of interest). The problem is that the number of replications
required to get sufficient precision (3 digits say) is so HUGE that this
takes FOREVER. I have to perform this task so much in my script (searching
over the sample size and repeated for several different distribution
parameters) that it takes too many hours to run.

I've searched for pre-existing code to do this in R and haven't found
anything. Perhaps I'm missing something. Is anyone aware of an R function to
compute this probability?

I've tried writing my own code using the fact that R's quantile()
function
is a linear combination of 2 order statistics. Basically, I wrote down the
mathematical form of the joint pdf for the 2 order statistics (a function of
the sample size and the distribution parameters) then performed a
pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
random draws) over the region where the sample quantile exceeds the
threshold. In theory, this should work and it takes about 1000 times fewer
clock cycles to compute than the Brute Force approach. My problem is that
there is a significant discrepancy between the results using Brute Force and
using this more efficient approach that I have coded up. I believe that the
problem is numerical error but it could be some programming bug; regardless,
I have been unable to locate the source of this problem and have spent over
20 hours trying to identify it this weekend. Please, somebody help!!!

So, again, my question: is there code in R for quickly evaluating the CDF of
a Sample Quantile given the sample size and the parameters governing the
distribution of each iid point in the sample?

Grateful for any help,

Bentley

	[[alternative HTML version deleted]]

Jonathan P Daily

2011-Feb-14 19:20 UTC

head link

[R] CDF of Sample Quantile

If I understand this, you have a value x, or a vector of values x, and you 
want to know the CDF that this value is drawn from a normal distribution?

I assume you are drawing from rnorm for your simulations, so look at the 
other functions listed when you ?rnorm.

HTH
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

r-help-bounces at r-project.org wrote on 02/14/2011 09:58:09 AM:
> [image removed] 
> 
> [R] CDF of Sample Quantile
> 
> Bentley Coffey 
> 
> to:
> 
> r-help
> 
> 02/14/2011 01:58 PM
> 
> Sent by:
> 
> r-help-bounces at r-project.org
> 
> I need to calculate the probability that a sample quantile will exceed a
> threshold given the size of the iid sample and the parameters describing 
the> distribution of each observation (normal, in my case). I can compute the
> probability with brute force simulation: simulate a size N sample, apply 
R's> quantile() function on it, compare it to the threshold, replicate this 
MANY> times, and count the number of times the sample quantile exceeded the
> threshold (dividing by the total number of replications yields the
> probability of interest). The problem is that the number of replications
> required to get sufficient precision (3 digits say) is so HUGE that this
> takes FOREVER. I have to perform this task so much in my script 
(searching> over the sample size and repeated for several different distribution
> parameters) that it takes too many hours to run.
> 
> I've searched for pre-existing code to do this in R and haven't
found
> anything. Perhaps I'm missing something. Is anyone aware of an R 
function to> compute this probability?
> 
> I've tried writing my own code using the fact that R's quantile() 
function> is a linear combination of 2 order statistics. Basically, I wrote down 
the> mathematical form of the joint pdf for the 2 order statistics (a 
function of> the sample size and the distribution parameters) then performed a
> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
> random draws) over the region where the sample quantile exceeds the
> threshold. In theory, this should work and it takes about 1000 times 
fewer> clock cycles to compute than the Brute Force approach. My problem is 
that> there is a significant discrepancy between the results using Brute Force 
and> using this more efficient approach that I have coded up. I believe that 
the> problem is numerical error but it could be some programming bug; 
regardless,> I have been unable to locate the source of this problem and have spent 
over> 20 hours trying to identify it this weekend. Please, somebody help!!!
> 
> So, again, my question: is there code in R for quickly evaluating the 
CDF of> a Sample Quantile given the sample size and the parameters governing the
> distribution of each iid point in the sample?
> 
> Grateful for any help,
> 
> Bentley
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Duncan Murdoch

2011-Feb-14 19:26 UTC

head link

[R] CDF of Sample Quantile

On 14/02/2011 9:58 AM, Bentley Coffey wrote:> I need to calculate the probability that a sample quantile will exceed a
> threshold given the size of the iid sample and the parameters describing
the
> distribution of each observation (normal, in my case). I can compute the
> probability with brute force simulation: simulate a size N sample, apply
R's
> quantile() function on it, compare it to the threshold, replicate this MANY
> times, and count the number of times the sample quantile exceeded the
> threshold (dividing by the total number of replications yields the
> probability of interest). The problem is that the number of replications
> required to get sufficient precision (3 digits say) is so HUGE that this
> takes FOREVER. I have to perform this task so much in my script (searching
> over the sample size and repeated for several different distribution
> parameters) that it takes too many hours to run.
>
> I've searched for pre-existing code to do this in R and haven't
found
> anything. Perhaps I'm missing something. Is anyone aware of an R
function to
> compute this probability?
>
> I've tried writing my own code using the fact that R's quantile()
function
> is a linear combination of 2 order statistics. Basically, I wrote down the
> mathematical form of the joint pdf for the 2 order statistics (a function
of
> the sample size and the distribution parameters) then performed a
> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
> random draws) over the region where the sample quantile exceeds the
> threshold. In theory, this should work and it takes about 1000 times fewer
> clock cycles to compute than the Brute Force approach. My problem is that
> there is a significant discrepancy between the results using Brute Force
and
> using this more efficient approach that I have coded up. I believe that the
> problem is numerical error but it could be some programming bug;
regardless,
> I have been unable to locate the source of this problem and have spent over
> 20 hours trying to identify it this weekend. Please, somebody help!!!
>
> So, again, my question: is there code in R for quickly evaluating the CDF
of
> a Sample Quantile given the sample size and the parameters governing the
> distribution of each iid point in the sample?
I think the answer to your question is no, but I think it's the wrong 
question.  Suppose Xm:n is the mth sample quantile in a sample of size 
n, and you want to calculate P(Xm:n > x).  Let X be a draw from the 
underlying distribution, and suppose P(X > x) = p.  Then the event Xm:n >
x
is the same as the event that out of n independent draws of X, at least 
n-m+1 are bigger than x:  a binomial probability.  And R can calculate 
binomial probabilities using pbinom().

Duncan Murdoch

Bentley Coffey

2011-Feb-17 17:42 UTC

head link

[R] CDF of Sample Quantile

Because my first attempt for help didn't pan out, I'm trying again with
a
more targeted question. See attached for the math I worked through and my
implementation of it in R. Can anyone spot what I'm missing here? I had
thought that it was a numerical issue but I'm starting to doubt myself -- I
fear that it might be the math, perhaps in the change of variables at the
end...

If we can crack this nut, then maybe we should put this into some
package...?

Grateful for any help,

Bentley


On Mon, Feb 14, 2011 at 9:58 AM, Bentley Coffey <bentleygcoffey at
gmail.com>wrote:
>
> I need to calculate the probability that a sample quantile will exceed a
> threshold given the size of the iid sample and the parameters describing
the
> distribution of each observation (normal, in my case). I can compute the
> probability with brute force simulation: simulate a size N sample, apply
R's
> quantile() function on it, compare it to the threshold, replicate this MANY
> times, and count the number of times the sample quantile exceeded the
> threshold (dividing by the total number of replications yields the
> probability of interest). The problem is that the number of replications
> required to get sufficient precision (3 digits say) is so HUGE that this
> takes FOREVER. I have to perform this task so much in my script (searching
> over the sample size and repeated for several different distribution
> parameters) that it takes too many hours to run.
>
> I've searched for pre-existing code to do this in R and haven't
found
> anything. Perhaps I'm missing something. Is anyone aware of an R
function to
> compute this probability?
>
> I've tried writing my own code using the fact that R's quantile()
function
> is a linear combination of 2 order statistics. Basically, I wrote down the
> mathematical form of the joint pdf for the 2 order statistics (a function
of
> the sample size and the distribution parameters) then performed a
> pseudo-Monte Carlo integration (i.e. using Halton Draws rather than R's
> random draws) over the region where the sample quantile exceeds the
> threshold. In theory, this should work and it takes about 1000 times fewer
> clock cycles to compute than the Brute Force approach. My problem is that
> there is a significant discrepancy between the results using Brute Force
and
> using this more efficient approach that I have coded up. I believe that the
> problem is numerical error but it could be some programming bug;
regardless,
> I have been unable to locate the source of this problem and have spent over
> 20 hours trying to identify it this weekend. Please, somebody help!!!
>
> So, again, my question: is there code in R for quickly evaluating the CDF
> of a Sample Quantile given the sample size and the parameters governing the
> distribution of each iid point in the sample?
>
> Grateful for any help,
>
> Bentley
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: CDF of Sample Quantile.pdf
Type: application/pdf
Size: 116452 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20110217/22a43328/attachment.pdf>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2011 - CDF of Sample Quantile

[R] CDF of Sample Quantile

[R] CDF of Sample Quantile

[R] CDF of Sample Quantile

[R] CDF of Sample Quantile

Possibly Parallel Threads