thr3ads.net - R help - [R] quantile() returns a value outside the data range [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Jenny Bryan

2007-Aug-21 21:11 UTC

[R] quantile() returns a value outside the data range

Hello,

I am getting an unexpected result from quantile().  Specifically, the
return value falls outside the range of the data, which I wouldn't
have thought possible for a weighted average of 2 order statistics.
Is this an unintended accuracy issue or am I being too casual in my
comparison (is there some analogue of 'all.equal' for
"<=")?

Small example:

 > foo <- h2[,"17"][h2$plate==222] # some data I have, details
not
interesting
 > foo <- sort(foo)[1:3]            # can demo with the 3 smallest  
values
 > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10))
 > fooQtile <- rep(0,9)
 > for(i in 1:9) {               # compute 0.01 quantile, for all  
'types'
+   fooQtile[i] <- quantile(foo,probs=0.01,type=i)
+   yo[,i+1] <- foo <= fooQtile[i]
+ }
 > names(yo) <-
c("myData",paste("qType",1:9,sep=""))
 > yo
     myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8  
qType9
1 6.402611   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE  FALSE  FALSE    
TRUE
2 6.402611   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE  FALSE  FALSE    
TRUE
3 6.420587  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE   
FALSE
 > fooQtile
[1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261  
6.40261
 > table(fooQtile)
fooQtile
6.40261053520674 6.40261053520674
                2                7

I expected the returned quantile to be either equal to or greater than
the minimum observed value and that is the case for types 1-6 and
9. But for types 7 (the default) and 8, the returned quantile is less
than the minimum observed value.  The difference between the type
(1-6,9) and type (7,8) return values and between the returned quantile
and the minimum is obviously very very small.

Is there any choice for 'type' that is guaranteed to return values
inside the observed range?

Thanks, Jenny

Dr. Jennifer Bryan
Assistant Professor
Department of Statistics and
  the Michael Smith Laboratories
University of British Columbia
--------------------------------------------
333-6356 Agricultural Road
Vancouver, BC Canada
V6T 1Z2

Henrik Bengtsson

2007-Aug-22 06:34 UTC

head link

[R] quantile() returns a value outside the data range

Hi,

yes, it is all about numerical accuracy;

x <- c(6.402611, 6.402611, 6.420587)
x001 <- sapply(1:9, function(type) quantile(x, probs=0.01, type=type))
names(x001) <- NULL
print(x001)
## [1] 6.402611 6.402611 6.402611 6.402611 6.402611
## [6] 6.402611 6.402611 6.402611 6.402611

diff(x001)
## [1]  0.000000e+00  0.000000e+00  0.000000e+00
## [4]  0.000000e+00 -8.881784e-16  8.881784e-16
## [7] -8.881784e-16  8.881784e-16

So, (even) the different types of 1%-quantile estimates of 'x' differ.
 Thus, the equality part of your comparison (<=) (with sort(x)[1]) can
only return TRUE for some of the elements, but not all.

The following should do for "all.equal()" for '<=':  Assume
'x' and
'y' are scalars for simplicity.  To test 'x == y' with some
precision,
you use all.equal(x,y, tolerance=eps), which tests |x-y| < eps <=>
-eps < x-y < eps where eps is a small positive number specifying your
precision.

To test 'x <= y' (<=> 'x - y <= 0') with some
precision, you want to
test x - y <= eps, i.e.

lessOrEqual <- function(x, y, tolerance=.Machine$double.eps^0.5, ...) {
  (x - y <= tolerance);
}

sapply(x, x001, lessOrEqual)> sapply(x, lessOrEqual, x001)##       [,1] [,2]  [,3]
##  [1,] TRUE TRUE FALSE
##  [2,] TRUE TRUE FALSE
##  [3,] TRUE TRUE FALSE
##  [4,] TRUE TRUE FALSE
##  [5,] TRUE TRUE FALSE
##  [6,] TRUE TRUE FALSE
##  [7,] TRUE TRUE FALSE
##  [8,] TRUE TRUE FALSE
##  [9,] TRUE TRUE FALSE

Next time, please give a cut'n'paste reproducible example.

/Henrik

On 8/21/07, Jenny Bryan <jenny at stat.ubc.ca>
wrote:> Hello,
>
> I am getting an unexpected result from quantile().  Specifically, the
> return value falls outside the range of the data, which I wouldn't
> have thought possible for a weighted average of 2 order statistics.
> Is this an unintended accuracy issue or am I being too casual in my
> comparison (is there some analogue of 'all.equal' for
"<=")?
>
> Small example:
>
>  > foo <- h2[,"17"][h2$plate==222] # some data I have,
details not
> interesting
>  > foo <- sort(foo)[1:3]            # can demo with the 3 smallest
> values
>  > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10))
>  > fooQtile <- rep(0,9)
>  > for(i in 1:9) {               # compute 0.01 quantile, for all
> 'types'
> +   fooQtile[i] <- quantile(foo,probs=0.01,type=i)
> +   yo[,i+1] <- foo <= fooQtile[i]
> + }
>  > names(yo) <-
c("myData",paste("qType",1:9,sep=""))
>  > yo
>      myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8
> qType9
> 1 6.402611   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE  FALSE  FALSE
> TRUE
> 2 6.402611   TRUE   TRUE   TRUE   TRUE   TRUE   TRUE  FALSE  FALSE
> TRUE
> 3 6.420587  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE  FALSE
> FALSE
>  > fooQtile
> [1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261
> 6.40261
>  > table(fooQtile)
> fooQtile
> 6.40261053520674 6.40261053520674
>                 2                7
>
> I expected the returned quantile to be either equal to or greater than
> the minimum observed value and that is the case for types 1-6 and
> 9. But for types 7 (the default) and 8, the returned quantile is less
> than the minimum observed value.  The difference between the type
> (1-6,9) and type (7,8) return values and between the returned quantile
> and the minimum is obviously very very small.
>
> Is there any choice for 'type' that is guaranteed to return values
> inside the observed range?
>
> Thanks, Jenny
>
> Dr. Jennifer Bryan
> Assistant Professor
> Department of Statistics and
>   the Michael Smith Laboratories
> University of British Columbia
> --------------------------------------------
> 333-6356 Agricultural Road
> Vancouver, BC Canada
> V6T 1Z2
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more possibly parallel threads

R help - Aug 2007 - quantile() returns a value outside the data range

[R] quantile() returns a value outside the data range

[R] quantile() returns a value outside the data range

Reasonably Related Threads