I discovered that the Kendall's tau calculation in R uses all pairwise comparisons which is O(n^2) and takes a long time for large vectors. I implemented a O(n*log(n)) algorithm based on merge-sort. Is this of interest to be included in core R? The code (fortran and R wrapper) is available in my package clinfun v0.9.7 (not exported in NAMESPACE). Thanks, Venkat -- Venkatraman E. Seshan, Ph.D. | Attending Biostatistician Director of Biostatistics Computer-Intensive Support Services Department of Epidemiology and Biostatistics | MSKCC 307 E 63rd St 3rd Floor | New York, NY 10065 Phone: 646-735-8126 | Fax: 646-735-0010 ==================================================================== Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution, copying, or other use of this communication or any of its attachments is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to this message and deleting this message, any attachments, and all copies and backups from your computer.
>>>>> "S" == SeshanV <SeshanV at mskcc.org> >>>>> on Sat, 30 Apr 2011 11:20:59 -0400 writes:> I discovered that the Kendall's tau calculation in R uses > all pairwise comparisons which is O(n^2) and takes a long time for > large vectors. I implemented a O(n*log(n)) algorithm based on > merge-sort. Is this of interest to be included in core R? Yes, quite a bit of interest! I know about the O(n^2) "feature" for quite a while, and it is indeed a considerable problem in copula modelling which has become an interest of mine in the recent year. > The code (fortran and R wrapper) is available in my package clinfun v0.9.7 > (not exported in NAMESPACE). Thank you! Yes, I see you've put them there quite recently. I see the Fortran code uses modern allocate / deallocate constructs (that I don't know). As I think we'd want to use this in the C code which is also underlying cor(*, method="kendall") I'll eventually want a C version, not the least because we may look into dealing with NA 's in the same -- flexible -- way that they are handled currently via the 'use = "..."' argument. I may contact you privately for more. Thanks again, Martin Maechler, ETH Zurich (and R Core Team). >Thanks, Venkat > -- > Venkatraman E. Seshan, Ph.D. | Attending Biostatistician > Director of Biostatistics Computer-Intensive Support Services > Department of Epidemiology and Biostatistics | MSKCC > 307 E 63rd St 3rd Floor | New York, NY 10065 > Phone: 646-735-8126 | Fax: 646-735-0010 > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel