Hello All, I am trying to figure out the rational behind why quantile() returns different values for the same probabilities depending on whether 0 is first. Here is an example: quantile(c(54, 72, 83, 112), type=6, probs=c(0, .25, .5, .75, 1)) quantile(c(54, 72, 83, 112), type=6, probs=c(.25, .5, .75, 1, 0)) It seems to come down to this part of the code for quantile: fuzz <- 4 * .Machine$double.eps nppm <- a + probs * (n + 1 - a - b) j <- floor(nppm + fuzz) h <- nppm - j qs <- x[j + 2L] qs[h == 1] <- x[j + 3L][h == 1] other <- (h > 0) && (h < 1) if (any(other)) qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L])[other] In my example, a and b are both 0, and n = 4. Particularly, the alternate formula for qs is only used when the first element of h is both > 0 and < 1. Any ideas on this? It seems like a simple alternative would be other <- (h > 0) & (h < 1) but I do not know if that would cause problems for other quantile formulae. By the way, this comes around lines 39-70 in quantile.default in:> version_ platform x86_64-pc-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 2 minor 11.1 year 2010 month 05 day 31 svn rev 52157 language R version.string R version 2.11.1 (2010-05-31) Best regards, Josh -- Joshua Wiley Ph.D. Student Health Psychology University of California, Los Angeles
Hi, It seems to me that the results are actually the same but they are not returned in the same order (R 2.10.1 in Windows Vista). If you call sort on the output the results will be the same:> sort(quantile(c(54, 72, 83, 112), type=6, probs=c(0, .25, .5, .75, 1)))??? 0%??? 25%??? 50%??? 75%?? 100% ?54.00? 58.50? 77.50 104.75 112.00> sort(quantile(c(54, 72, 83, 112), type=6, probs=c(.25, .5, .75, 1, 0)))??? 0%??? 25%??? 50%??? 75%?? 100% ?54.00? 58.50? 77.50 104.75 112.00 With such a small sample, the actual quantile values may critically depend on the interpolatory algorithm used in their calculation, so exercise caution:> sort(quantile(c(54, 72, 83, 112), type=7, probs=c(0, .25, .5, .75, 1)))??? 0%??? 25%??? 50%??? 75%?? 100% ?54.00? 67.50? 77.50? 90.25 112.00> sort(quantile(c(54, 72, 83, 112), type=7, probs=c(.25, .5, .75, 1, 0)))??? 0%??? 25%??? 50%??? 75%?? 100% ?54.00? 67.50? 77.50? 90.25 112.00 Christos Argyropoulos ----------------------------------------> Date: Fri, 18 Jun 2010 21:02:41 -0700 > From: jwiley.psych at gmail.com > To: r-help at r-project.org > Subject: [R] quantile() depends on order of probs? > > Hello All, > > I am trying to figure out the rational behind why quantile() returns > different values for the same probabilities depending on whether 0 is > first. > > Here is an example: > > quantile(c(54, 72, 83, 112), type=6, probs=c(0, .25, .5, .75, 1)) > quantile(c(54, 72, 83, 112), type=6, probs=c(.25, .5, .75, 1, 0)) > > It seems to come down to this part of the code for quantile: > > fuzz <- 4 * .Machine$double.eps > nppm <- a + probs * (n + 1 - a - b) > j <- floor(nppm + fuzz) > h <- nppm - j > qs <- x[j + 2L] > qs[h == 1] <- x[j + 3L][h == 1] > other <- (h> 0) && (h < 1) > if (any(other)) > qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L])[other] > > In my example, a and b are both 0, and n = 4. Particularly, the > alternate formula for qs is only used when the first element of h is > both> 0 and < 1. Any ideas on this? It seems like a simple > alternative would be > > other <- (h> 0) & (h < 1) > > but I do not know if that would cause problems for other quantile > formulae. By the way, this comes around lines 39-70 in > quantile.default in: > >> version > _ > platform x86_64-pc-mingw32 > arch x86_64 > os mingw32 > system x86_64, mingw32 > status > major 2 > minor 11.1 > year 2010 > month 05 > day 31 > svn rev 52157 > language R > version.string R version 2.11.1 (2010-05-31) > > > Best regards, > > Josh > > -- > Joshua Wiley > Ph.D. Student > Health Psychology > University of California, Los Angeles > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code._________________________________________________________________ Hotmail: Trusted email with Microsoft?s powerful SPAM protection.
Dear Christos, Thank you, the code implemented in 2.10.1 is actually slightly different; the qs variable in the call to quantile is determined by a series of ifelse() statements: qs <- ifelse(h == 0, x[j + 2], ifelse(h == 1, x[j + 3], (1 - h) * x[j + 2] + h * x[j + 3])) so if h is neither 0 nor 1, it is (1 - h) * x[j + 2] + h * x[j + 3]). It was a particularly small example, so I did look at on a sample of rnorm(500). The differences are less pronouned in the larger sample, but still present (again in 2.11.1). Josh On Sat, Jun 19, 2010 at 4:35 AM, Christos Argyropoulos <argchris at hotmail.com> wrote:> > > Hi, > It seems to me that the results are actually the same but they are not returned in the same order (R 2.10.1 in Windows Vista). If you call sort on the output the results will be the same: >> sort(quantile(c(54, 72, 83, 112), type=6, probs=c(0, .25, .5, .75, 1))) > ??? 0%??? 25%??? 50%??? 75%?? 100% > ?54.00? 58.50? 77.50 104.75 112.00 >> sort(quantile(c(54, 72, 83, 112), type=6, probs=c(.25, .5, .75, 1, 0))) > ??? 0%??? 25%??? 50%??? 75%?? 100% > ?54.00? 58.50? 77.50 104.75 112.00 > > With such a small sample, the actual quantile values may critically depend on the interpolatory algorithm used in their calculation, so exercise caution: > >> sort(quantile(c(54, 72, 83, 112), type=7, probs=c(0, .25, .5, .75, 1))) > ??? 0%??? 25%??? 50%??? 75%?? 100% > ?54.00? 67.50? 77.50? 90.25 112.00 >> sort(quantile(c(54, 72, 83, 112), type=7, probs=c(.25, .5, .75, 1, 0))) > ??? 0%??? 25%??? 50%??? 75%?? 100% > ?54.00? 67.50? 77.50? 90.25 112.00 > > Christos Argyropoulos > > > ---------------------------------------- >> Date: Fri, 18 Jun 2010 21:02:41 -0700 >> From: jwiley.psych at gmail.com >> To: r-help at r-project.org >> Subject: [R] quantile() depends on order of probs? >> >> Hello All, >> >> I am trying to figure out the rational behind why quantile() returns >> different values for the same probabilities depending on whether 0 is >> first. >> >> Here is an example: >> >> quantile(c(54, 72, 83, 112), type=6, probs=c(0, .25, .5, .75, 1)) >> quantile(c(54, 72, 83, 112), type=6, probs=c(.25, .5, .75, 1, 0)) >> >> It seems to come down to this part of the code for quantile: >> >> fuzz <- 4 * .Machine$double.eps >> nppm <- a + probs * (n + 1 - a - b) >> j <- floor(nppm + fuzz) >> h <- nppm - j >> qs <- x[j + 2L] >> qs[h == 1] <- x[j + 3L][h == 1] >> other <- (h> 0) && (h < 1) >> if (any(other)) >> qs[other] <- ((1 - h) * x[j + 2L] + h * x[j + 3L])[other] >> >> In my example, a and b are both 0, and n = 4. Particularly, the >> alternate formula for qs is only used when the first element of h is >> both> 0 and < 1. Any ideas on this? It seems like a simple >> alternative would be >> >> other <- (h> 0) & (h < 1) >> >> but I do not know if that would cause problems for other quantile >> formulae. By the way, this comes around lines 39-70 in >> quantile.default in: >> >>> version >> _ >> platform x86_64-pc-mingw32 >> arch x86_64 >> os mingw32 >> system x86_64, mingw32 >> status >> major 2 >> minor 11.1 >> year 2010 >> month 05 >> day 31 >> svn rev 52157 >> language R >> version.string R version 2.11.1 (2010-05-31) >> >> >> Best regards, >> >> Josh >> >> -- >> Joshua Wiley >> Ph.D. Student >> Health Psychology >> University of California, Los Angeles >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > _________________________________________________________________ > Hotmail: Trusted email with Microsoft?s powerful SPAM protection. > https://signup.live.com/signup.aspx?id=60969-- Joshua Wiley Ph.D. Student Health Psychology University of California, Los Angeles