thr3ads.net - R devel - R-alpha: Sorting Efficiency [Sep 1997]

If this information is useful, please help other people find it:
Share via:

Ross Ihaka

1997-Sep-04 20:39 UTC

R-alpha: Sorting Efficiency

As part of checking Martin Maechler's observations of sorting
efficiency I have been putting the code under the microscope.  Part of
this involved comparing the performance with Splus. I turns out that
the R code is much slower.  A good deal of this is due to the
interpreted front end to the R verion of sort.  It is worth taking
a look at the sort function and seeing if you can see what the
problem is.  (hint: turn on GC reporting with "gcinfo").

A good deal of the rest of the inefficiency is because the internal R
code attempts to deal with missing observations.  This is no longer
needed because this is dealt with in the interpreted front end.

An overall fix of the code should see us nearly match the performance
of Splus.  I say nearly because I think I will use heapsort instead of
quicksort because I want to avoid worst-case performance rather than
seek good average performance.

I don't yet have an explanation of the behavior which Martin uncovered.
	Ross
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Martin Maechler

1997-Sep-05 08:32 UTC

head link

R-alpha: Sorting Efficiency -- thoughts.. : if(!is.sorted(x)) x <- sort(x)

Since you are delving into the code anyway....
Here is a thought that I have been having for a long time:

Wouldn't it make sense / be useful  to have a function 	is.sorted
which returns TRUE   if  all(diff(x) >= 0).	        ---------

I actually have been using the function

	is.sorted <- function(x) (length(x) <= 1) || all(diff(x) >= 0)

in my collection of S-plus enhancements for quite while now.

I can do things as

	if(!is.sorted(x)) {
		o <- order(x)
		x <- x[o]
		y <- y[o]
	}
		
which potentially saves computing time (over unconditional 'order'),
since
1.	        is.sorted  is  O(n)
    whereas	sort       is  O(n*log(n)) 
2. for sorted x, the two assignments are superfluous even when order(x) was
   very fast (for SORTED x). 

TO be CPU-efficient, is.sorted  probably should be written .Internal(.).

On the other hand,  sort(.) and order(.)  could be designed such that they
use a sorting algorithm which guarantees to be fast when things are already
sorted. [maybe this is already the case,
	 and one of your reasons against quicksort? ]
This would make the 'if(..)' in  
	if(!is.sorted(x)) x <- sort(x)
superfluous, but the situation with order(.) above
(which is NOT infrequent in statistics !!) would still profit from not
doing  'x <- x[1:length(x)]' when x is sorted.

--
Opinions?

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Reasonably Related Threads

Search for more seemingly similar threads

R devel - Sep 1997 - R-alpha: Sorting Efficiency

R-alpha: Sorting Efficiency

R-alpha: Sorting Efficiency -- thoughts.. : if(!is.sorted(x)) x <- sort(x)

Reasonably Related Threads