thr3ads.net - R devel - [Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650) [Mar 2006]

If this information is useful, please help other people find it:
Share via:

deepayan.sarkar@gmail.com

2006-Mar-02 02:02 UTC

[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

Hi,

This is essentially a reposting of

http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html

which had no responses, and the behaviour reported there persists in
r-devel as of yesterday.

(1) sort() with non-null partial
> x = rnorm(100000)
> keep = as.integer(ppoints(10000) * 100000)
> system.time(sort(x))
[1] 0.05 0.00 0.04 0.00 0.00> system.time(sort(x, partial = keep))[1] 52.04  0.02 52.08  0.00  0.00

This is perhaps not strictly a bug, but taking approximately 1000
times longer to do a subset of the work seems pointless at best.

(2) quantile.default() always calls sort() with a non-null partial
argument. Consequently,
> system.time(quantile(x, ppoints(10000)))[1] 88.82  0.05 88.90  0.00  0.00

There's no way around this except by writing a custom version of
quantile. lattice currently does this, giving
> system.time(lattice:::fast.quantile(x, ppoints(10000)))[1] 0.07 0.01 0.08 0.00 0.00

Which brings me to my wishlist request: if (1) cannot be fixed easily,
could quantile.default() at least have an optional argument that can
be used to disable partial sorting?
> sessionInfo()Version 2.3.0 Under development (unstable) (2006-02-28 r37448)
x86_64-unknown-linux-gnu

attached base packages:
[1] "methods"   "stats"     "graphics" 
"grDevices" "utils"     "datasets"
[7] "base"

Deepayan
--
http://www.stat.wisc.edu/~deepayan/

ripley@stats.ox.ac.uk

2006-Mar-04 17:36 UTC

head link

[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

Deepayan,

The current algorithm is really designed for partial of length 1, and is
more or less proportional to the length of partial.  So inevitably it is 
slow in your (pretty unrealistic?) example.  I have temporarily altered it 
so a barebones full sort is done if partial has more than 10 elements, the 
changeover point for a million-element vector on my system.

John Chambers wrote a paper on this in 1971 (and I know that from his 1977 
book).  It is possible to write partial sorting to be about as fast as 
sorting in all cases (at least if partial is sorted) and much faster if 
partial is small.  But I am not sure it is really worth the bother when 
full sorting is so fast even on a million elements.

Brian

On Thu, 2 Mar 2006, deepayan.sarkar at gmail.com wrote:
> Hi,
>
> This is essentially a reposting of
>
> http://tolstoy.newcastle.edu.au/R/devel/05/11/3305.html
>
> which had no responses, and the behaviour reported there persists in
> r-devel as of yesterday.
>
> (1) sort() with non-null partial
>
>> x = rnorm(100000)
>> keep = as.integer(ppoints(10000) * 100000)
>> system.time(sort(x))
> [1] 0.05 0.00 0.04 0.00 0.00
>> system.time(sort(x, partial = keep))
> [1] 52.04  0.02 52.08  0.00  0.00
>
> This is perhaps not strictly a bug, but taking approximately 1000
> times longer to do a subset of the work seems pointless at best.
>
> (2) quantile.default() always calls sort() with a non-null partial
> argument. Consequently,
>
>> system.time(quantile(x, ppoints(10000)))
> [1] 88.82  0.05 88.90  0.00  0.00
>
> There's no way around this except by writing a custom version of
> quantile. lattice currently does this, giving
>
>> system.time(lattice:::fast.quantile(x, ppoints(10000)))
> [1] 0.07 0.01 0.08 0.00 0.00
>
> Which brings me to my wishlist request: if (1) cannot be fixed easily,
> could quantile.default() at least have an optional argument that can
> be used to disable partial sorting?
>
>> sessionInfo()
> Version 2.3.0 Under development (unstable) (2006-02-28 r37448)
> x86_64-unknown-linux-gnu
>
> attached base packages:
> [1] "methods"   "stats"     "graphics" 
"grDevices" "utils"     "datasets"
> [7] "base"
>
> Deepayan
> --
> http://www.stat.wisc.edu/~deepayan/
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

roger koenker

2006-Mar-05 18:28 UTC

head link

[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

I used to have a version of the classical Floyd and Revest "Select"  
algorithm (ACM#489)
for computing  univariate quantiles in my quantreg package.  At some  
point
g77 decided that it shouldn't be possible to use recursion  in  
fortran and I had
to remove it.  Stimulated by my your recent inquiry,  I thought I  
would see
whether it could be revived.  At least on my macs, running gfortran,  
it seems
that recursion is now again permissible.  I've not done any serious  
testing, but
I get:

 > unix.time(kuantile(y))  #my fortran translation of the Floyd- 
Revest algol algorithm
[1] 0.56 0.41 0.96 0.00 0.00
 > unix.time(median(y))
[1] 1.80 0.27 2.07 0.00 0.00
 > length(y)
[1] 10000000

for a rnorm example.  I'd be happy to provide some new version of  
this, but
I'm afraid that there are still lots of g77 users, including on macs,  
that
wouldn't be functional due to the recursion.  Any thoughts on this  
would  be
appreciated.  Brian is doubtless right that current methods are  
perfectly
adequate for almost all purposes, but in cases of very large datasets  
where
many quantiles are needed, it may be worthwhile.

Roger

url:    www.econ.uiuc.edu/~roger                Roger Koenker
email   rkoenker at uiuc.edu                       Department of Economics
vox:    217-333-4558                            University of Illinois
fax:    217-244-6678                            Champaign, IL 61820

Apparently Analagous Threads

Search for more seemingly similar threads

R devel - Mar 2006 - Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

[Rd] Bug/Wishlist: 'partial' in 'sort' and 'quantile' (PR#8650)

Apparently Analagous Threads