Please do the comparison with R-devel. That has a different policy,
including switching to quicksort when partial has more than 10 elements.
BTW you need to compare with quicksort (which suffices here and is quite a
lot faster than shellsort for large numeric vectors).
For 1 million+1 I am getting timings on my Windows laptop
sort 0.39
quicksort 0.24
quantile(50%) 0.13
quantile() 0.17
kuantile(50%) 0.16
kuantile() 0.17
(timings for a million are about 50% more in the last four lines as
roughly twice as many raw quantiles are needed.)
Note that there is never more than about a factor of three gain over
quicksort (remembering that quantile has some overhead of its own, so I
also timed a version of quantile always using quicksort, using about
0.34). Given how rare this is as a task for R, it would probably be quite
sufficient to always use quicksort.
On Sat, 11 Mar 2006, roger koenker wrote:
> Motivated by Deepayan's recent inquiries about the efficiency of the
> R 'quantile'
> function:
>
> http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
> http://tolstoy.newcastle.edu.au/R/devel/06/03/4358.html
>
> I decided to try to revive an old project to implement a version of
> the Floyd
> and Rivest (1975) algorithm for finding quantiles with O(n)
> comparisons. I
> used to have a direct translation from FR's algol68 in my quantreg
> package,
> but was forced to remove it when it encountered g77 compilers that
> didn't like the way I had handled recursion.
>
> Fortunately, in the interim, I've corresponded with Krzysztof Kiwiel
> in Warsaw
> who has made a detailed study of the algorithm:
>
> K.C. Kiwiel: On Floyd and Rivest's SELECT Algorithm, Theoretical
> Computer Sci. 347 (2005) 214-238.
>
> He has also kindly agreed to allow me to incorporate his
> implementation of
> the FR algorithm in R under GPL.
>
> For the moment, I have made an alpha-test package with a single
> function --
> 'kuantile' -- that attempts to reproduce the functionality of the
> current
> base quantile function, essentially replacing the partial sorting done
> there with calls to Kiwiel's 'select' function. This package
is
> available
> from:
>
> http://www.econ.uiuc.edu/~roger/research/rq/kuantile
>
> The good news is that the new function _is_ somewhat quicker than
> the base 'quantile' function. The following table is based on
means
> of 10 replications. The first two columns report times for computing
> just the median, the next two columns for computing the five default
> quantiles: seq(0,1,.25), and the last column is the time needed for
> a call
> of sort().
>
> mean system.time()[1] for various calls*
>
> median only default 5
> n quantile kuantile quantile kuantile sort
>
> 100 0.002 0.001 0.004 0.004 0.000
> 1000 0.002 0.001 0.003 0.002 0.002
> 10000 0.003 0.002 0.009 0.005 0.005
> 1e+05 0.024 0.010 0.061 0.013 0.031
> 1e+06 0.206 0.120 0.535 0.142 0.502
> 1e+07 2.132 0.790 5.575 1.035 7.484
>
> * On a mac G5 running R 2.2.1.
>
> The bad news is that this approach doesn't help with Deepayan's
> original question which involved instances in which quantile() was
> being called for rather large number of probabilities. In such
> cases it seems that it is difficult to improve upon doing a full sort.
>
> I would welcome comments on any of this....
>
> url: www.econ.uiuc.edu/~roger Roger Koenker
> email: rkoenker at uiuc.edu Department of Economics
> vox: 217-333-4558 University of Illinois
> fax: 217-244-6678 Champaign, IL 61820
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595