thr3ads.net - R devel - [Rd] O(N log N) Kendall Tau [Dec 2009]

If this information is useful, please help other people find it:
Share via:

David Simcha

2009-Dec-13 05:38 UTC

[Rd] O(N log N) Kendall Tau

I've noticed that the implementation of Kendall's Tau in R is O(N^2).  
The following reference describes how it can be done in O(N log N):

 A Computer Method for Calculating Kendall's Tau with Ungrouped Data
 William R. Knight
 Journal of the American Statistical Association, Vol. 61, No. 314, Part 
1 (Jun., 1966), pp. 436-439
 http://www.jstor.org/pss/2282833

I'm interested in contributing an implementation of this algorithm to 
R.  However, I have a few questions:

1.  Would it likely be accepted if well-implemented?
2.  cov.c in the main/ directory of the R source tree seems to contain 
at least 4 different but nearly identical implementations of Kendall's 
Tau:  cov_complete1, cov_complete2, cov_na1, and cov_na2.  I can't tell 
why.  The file is very difficult to read because of its lack of 
comments, (ab)use of macros and improper indentation.  As I will 
probably need to modify this file, can some seasoned veteran please 
explain what the heck is going on in this file?

Dirk Eddelbuettel

2009-Dec-13 16:27 UTC

head link

[Rd] O(N log N) Kendall Tau

On 13 December 2009 at 00:38, David Simcha wrote:
| I've noticed that the implementation of Kendall's Tau in R is O(N^2).
| The following reference describes how it can be done in O(N log N):
| 
|  A Computer Method for Calculating Kendall's Tau with Ungrouped Data
|  William R. Knight
|  Journal of the American Statistical Association, Vol. 61, No. 314, Part 
| 1 (Jun., 1966), pp. 436-439
|  http://www.jstor.org/pss/2282833

Interesting.  I recently used the performance of cor(X,
method="kendall") as
a contrast to the 26-fold speedup when (!!) using gpuCor(X,
method="kendall")
from the nice gputools package by Josh Buckner and Mark Seligman.  That was
using the GPU in a QuadroFX 4800 with a 1206 x 477 matrix; the full example
is in my most recent 'Intro to HPC with R' slides.

| I'm interested in contributing an implementation of this algorithm to 
| R.  However, I have a few questions:
| 
| 1.  Would it likely be accepted if well-implemented?
| 2.  cov.c in the main/ directory of the R source tree seems to contain 
| at least 4 different but nearly identical implementations of Kendall's 
| Tau:  cov_complete1, cov_complete2, cov_na1, and cov_na2.  I can't tell 
| why.  

Combine your bottom-up code analysis with a top-down usage analysis -- open
up R and read 'help(cov)'.  There are different ways to deal with
missing
observations.  

| The file is very difficult to read because of its lack of 
| comments, (ab)use of macros and improper indentation.  As I will 
| probably need to modify this file, can some seasoned veteran please 
| explain what the heck is going on in this file?

Again, I think reading the help file along with the code may prove
beneficial.  The R implementation is pretty consistent across files to you
will have to get used to those C level macros if you want to modify code at
that level.  The R Extensions manual may prove helpful.

Lastly, as a matter of style, I find people are more likely to help you if
you don't hit them first with a two-by-four of 'your code and style is
horrid'.

Cheers, Dirk

-- 
Three out of two people have difficulties with fractions.

Uwe Ligges

2009-Dec-13 17:18 UTC

head link

[Rd] O(N log N) Kendall Tau

David Simcha wrote:> I've noticed that the implementation of Kendall's Tau in R is
O(N^2).
> The following reference describes how it can be done in O(N log N):
> 
> A Computer Method for Calculating Kendall's Tau with Ungrouped Data
> William R. Knight
> Journal of the American Statistical Association, Vol. 61, No. 314, Part 
> 1 (Jun., 1966), pp. 436-439
> http://www.jstor.org/pss/2282833
> 
> I'm interested in contributing an implementation of this algorithm to 
> R.  However, I have a few questions:
> 
> 1.  Would it likely be accepted if well-implemented?
> 2.  cov.c in the main/ directory of the R source tree seems to contain 
> at least 4 different but nearly identical implementations of Kendall's 
> Tau:  cov_complete1, cov_complete2, cov_na1, and cov_na2.  I can't tell
> why.  The file is very difficult to read because of its lack of 
> comments, (ab)use of macros and improper indentation.  As I will 
> probably need to modify this file, can some seasoned veteran please 
> explain what the heck is going on in this file?

Well, it is not that hard to find that the functions you mentioned are 
all used (for different cases) in the main function
   do_cov
that is called by R's cov() function.

Given you provide a well tested implementation based on a published 
algorithm that is numerically as stable as the method already 
implemented in R, including all features, and gives identical results, I 
think it is very likely that your implementation will replace the one 
that is currently used in R.

Best,
Uwe Ligges


> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Maybe Matching Threads

Search for more maybe matching threads

R devel - Dec 2009 - O(N log N) Kendall Tau

[Rd] O(N log N) Kendall Tau

[Rd] O(N log N) Kendall Tau

[Rd] O(N log N) Kendall Tau

Maybe Matching Threads