Hello,
I am getting an unexpected result from quantile(). Specifically, the
return value falls outside the range of the data, which I wouldn't
have thought possible for a weighted average of 2 order statistics.
Is this an unintended accuracy issue or am I being too casual in my
comparison (is there some analogue of 'all.equal' for
"<=")?
Small example:
> foo <- h2[,"17"][h2$plate==222] # some data I have, details
not
interesting
> foo <- sort(foo)[1:3] # can demo with the 3 smallest
values
> yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10))
> fooQtile <- rep(0,9)
> for(i in 1:9) { # compute 0.01 quantile, for all
'types'
+ fooQtile[i] <- quantile(foo,probs=0.01,type=i)
+ yo[,i+1] <- foo <= fooQtile[i]
+ }
> names(yo) <-
c("myData",paste("qType",1:9,sep=""))
> yo
myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8
qType9
1 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
TRUE
2 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
TRUE
3 6.420587 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
> fooQtile
[1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261
6.40261
> table(fooQtile)
fooQtile
6.40261053520674 6.40261053520674
2 7
I expected the returned quantile to be either equal to or greater than
the minimum observed value and that is the case for types 1-6 and
9. But for types 7 (the default) and 8, the returned quantile is less
than the minimum observed value. The difference between the type
(1-6,9) and type (7,8) return values and between the returned quantile
and the minimum is obviously very very small.
Is there any choice for 'type' that is guaranteed to return values
inside the observed range?
Thanks, Jenny
Dr. Jennifer Bryan
Assistant Professor
Department of Statistics and
the Michael Smith Laboratories
University of British Columbia
--------------------------------------------
333-6356 Agricultural Road
Vancouver, BC Canada
V6T 1Z2
Henrik Bengtsson
2007-Aug-22 06:34 UTC
[R] quantile() returns a value outside the data range
Hi,
yes, it is all about numerical accuracy;
x <- c(6.402611, 6.402611, 6.420587)
x001 <- sapply(1:9, function(type) quantile(x, probs=0.01, type=type))
names(x001) <- NULL
print(x001)
## [1] 6.402611 6.402611 6.402611 6.402611 6.402611
## [6] 6.402611 6.402611 6.402611 6.402611
diff(x001)
## [1] 0.000000e+00 0.000000e+00 0.000000e+00
## [4] 0.000000e+00 -8.881784e-16 8.881784e-16
## [7] -8.881784e-16 8.881784e-16
So, (even) the different types of 1%-quantile estimates of 'x' differ.
Thus, the equality part of your comparison (<=) (with sort(x)[1]) can
only return TRUE for some of the elements, but not all.
The following should do for "all.equal()" for '<=': Assume
'x' and
'y' are scalars for simplicity. To test 'x == y' with some
precision,
you use all.equal(x,y, tolerance=eps), which tests |x-y| < eps <=>
-eps < x-y < eps where eps is a small positive number specifying your
precision.
To test 'x <= y' (<=> 'x - y <= 0') with some
precision, you want to
test x - y <= eps, i.e.
lessOrEqual <- function(x, y, tolerance=.Machine$double.eps^0.5, ...) {
(x - y <= tolerance);
}
sapply(x, x001, lessOrEqual)> sapply(x, lessOrEqual, x001)
## [,1] [,2] [,3]
## [1,] TRUE TRUE FALSE
## [2,] TRUE TRUE FALSE
## [3,] TRUE TRUE FALSE
## [4,] TRUE TRUE FALSE
## [5,] TRUE TRUE FALSE
## [6,] TRUE TRUE FALSE
## [7,] TRUE TRUE FALSE
## [8,] TRUE TRUE FALSE
## [9,] TRUE TRUE FALSE
Next time, please give a cut'n'paste reproducible example.
/Henrik
On 8/21/07, Jenny Bryan <jenny at stat.ubc.ca>
wrote:> Hello,
>
> I am getting an unexpected result from quantile(). Specifically, the
> return value falls outside the range of the data, which I wouldn't
> have thought possible for a weighted average of 2 order statistics.
> Is this an unintended accuracy issue or am I being too casual in my
> comparison (is there some analogue of 'all.equal' for
"<=")?
>
> Small example:
>
> > foo <- h2[,"17"][h2$plate==222] # some data I have,
details not
> interesting
> > foo <- sort(foo)[1:3] # can demo with the 3 smallest
> values
> > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10))
> > fooQtile <- rep(0,9)
> > for(i in 1:9) { # compute 0.01 quantile, for all
> 'types'
> + fooQtile[i] <- quantile(foo,probs=0.01,type=i)
> + yo[,i+1] <- foo <= fooQtile[i]
> + }
> > names(yo) <-
c("myData",paste("qType",1:9,sep=""))
> > yo
> myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8
> qType9
> 1 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
> TRUE
> 2 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
> TRUE
> 3 6.420587 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> FALSE
> > fooQtile
> [1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261
> 6.40261
> > table(fooQtile)
> fooQtile
> 6.40261053520674 6.40261053520674
> 2 7
>
> I expected the returned quantile to be either equal to or greater than
> the minimum observed value and that is the case for types 1-6 and
> 9. But for types 7 (the default) and 8, the returned quantile is less
> than the minimum observed value. The difference between the type
> (1-6,9) and type (7,8) return values and between the returned quantile
> and the minimum is obviously very very small.
>
> Is there any choice for 'type' that is guaranteed to return values
> inside the observed range?
>
> Thanks, Jenny
>
> Dr. Jennifer Bryan
> Assistant Professor
> Department of Statistics and
> the Michael Smith Laboratories
> University of British Columbia
> --------------------------------------------
> 333-6356 Agricultural Road
> Vancouver, BC Canada
> V6T 1Z2
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>