Hello, I am getting an unexpected result from quantile(). Specifically, the return value falls outside the range of the data, which I wouldn't have thought possible for a weighted average of 2 order statistics. Is this an unintended accuracy issue or am I being too casual in my comparison (is there some analogue of 'all.equal' for "<=")? Small example: > foo <- h2[,"17"][h2$plate==222] # some data I have, details not interesting > foo <- sort(foo)[1:3] # can demo with the 3 smallest values > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10)) > fooQtile <- rep(0,9) > for(i in 1:9) { # compute 0.01 quantile, for all 'types' + fooQtile[i] <- quantile(foo,probs=0.01,type=i) + yo[,i+1] <- foo <= fooQtile[i] + } > names(yo) <- c("myData",paste("qType",1:9,sep="")) > yo myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8 qType9 1 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE 2 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE 3 6.420587 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > fooQtile [1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 > table(fooQtile) fooQtile 6.40261053520674 6.40261053520674 2 7 I expected the returned quantile to be either equal to or greater than the minimum observed value and that is the case for types 1-6 and 9. But for types 7 (the default) and 8, the returned quantile is less than the minimum observed value. The difference between the type (1-6,9) and type (7,8) return values and between the returned quantile and the minimum is obviously very very small. Is there any choice for 'type' that is guaranteed to return values inside the observed range? Thanks, Jenny Dr. Jennifer Bryan Assistant Professor Department of Statistics and the Michael Smith Laboratories University of British Columbia -------------------------------------------- 333-6356 Agricultural Road Vancouver, BC Canada V6T 1Z2
Henrik Bengtsson
2007-Aug-22 06:34 UTC
[R] quantile() returns a value outside the data range
Hi, yes, it is all about numerical accuracy; x <- c(6.402611, 6.402611, 6.420587) x001 <- sapply(1:9, function(type) quantile(x, probs=0.01, type=type)) names(x001) <- NULL print(x001) ## [1] 6.402611 6.402611 6.402611 6.402611 6.402611 ## [6] 6.402611 6.402611 6.402611 6.402611 diff(x001) ## [1] 0.000000e+00 0.000000e+00 0.000000e+00 ## [4] 0.000000e+00 -8.881784e-16 8.881784e-16 ## [7] -8.881784e-16 8.881784e-16 So, (even) the different types of 1%-quantile estimates of 'x' differ. Thus, the equality part of your comparison (<=) (with sort(x)[1]) can only return TRUE for some of the elements, but not all. The following should do for "all.equal()" for '<=': Assume 'x' and 'y' are scalars for simplicity. To test 'x == y' with some precision, you use all.equal(x,y, tolerance=eps), which tests |x-y| < eps <=> -eps < x-y < eps where eps is a small positive number specifying your precision. To test 'x <= y' (<=> 'x - y <= 0') with some precision, you want to test x - y <= eps, i.e. lessOrEqual <- function(x, y, tolerance=.Machine$double.eps^0.5, ...) { (x - y <= tolerance); } sapply(x, x001, lessOrEqual)> sapply(x, lessOrEqual, x001)## [,1] [,2] [,3] ## [1,] TRUE TRUE FALSE ## [2,] TRUE TRUE FALSE ## [3,] TRUE TRUE FALSE ## [4,] TRUE TRUE FALSE ## [5,] TRUE TRUE FALSE ## [6,] TRUE TRUE FALSE ## [7,] TRUE TRUE FALSE ## [8,] TRUE TRUE FALSE ## [9,] TRUE TRUE FALSE Next time, please give a cut'n'paste reproducible example. /Henrik On 8/21/07, Jenny Bryan <jenny at stat.ubc.ca> wrote:> Hello, > > I am getting an unexpected result from quantile(). Specifically, the > return value falls outside the range of the data, which I wouldn't > have thought possible for a weighted average of 2 order statistics. > Is this an unintended accuracy issue or am I being too casual in my > comparison (is there some analogue of 'all.equal' for "<=")? > > Small example: > > > foo <- h2[,"17"][h2$plate==222] # some data I have, details not > interesting > > foo <- sort(foo)[1:3] # can demo with the 3 smallest > values > > yo <- data.frame(matrix(foo,nrow=length(foo),ncol=10)) > > fooQtile <- rep(0,9) > > for(i in 1:9) { # compute 0.01 quantile, for all > 'types' > + fooQtile[i] <- quantile(foo,probs=0.01,type=i) > + yo[,i+1] <- foo <= fooQtile[i] > + } > > names(yo) <- c("myData",paste("qType",1:9,sep="")) > > yo > myData qType1 qType2 qType3 qType4 qType5 qType6 qType7 qType8 > qType9 > 1 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE > TRUE > 2 6.402611 TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE > TRUE > 3 6.420587 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE > > fooQtile > [1] 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 6.40261 > 6.40261 > > table(fooQtile) > fooQtile > 6.40261053520674 6.40261053520674 > 2 7 > > I expected the returned quantile to be either equal to or greater than > the minimum observed value and that is the case for types 1-6 and > 9. But for types 7 (the default) and 8, the returned quantile is less > than the minimum observed value. The difference between the type > (1-6,9) and type (7,8) return values and between the returned quantile > and the minimum is obviously very very small. > > Is there any choice for 'type' that is guaranteed to return values > inside the observed range? > > Thanks, Jenny > > Dr. Jennifer Bryan > Assistant Professor > Department of Statistics and > the Michael Smith Laboratories > University of British Columbia > -------------------------------------------- > 333-6356 Agricultural Road > Vancouver, BC Canada > V6T 1Z2 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >