Jim Regetz
2009-Sep-07 19:03 UTC
[Rd] performance of vector subscripting via character index
Hi all, Using character indexing on a vector is quite fast up through vector length of 46340, then it suddenly it gets 3 orders of magnitude slower. This is true at least of the special case in which the index vector is the complete (though possibly out-of-order) set of vector names: test <- function(n) { vec <- seq_len(n) names(vec) <- as.character(vec) ind <- rev(names(vec)) system.time(vec[ind]) } test(46340) ## user system elapsed ## 0.012 0.000 0.009 test(46341) ## user system elapsed ## 11.805 0.000 11.805 There seems to be a rebound at just over twice the value of the threshold above, though I'll admit I didn't have the stamina to test all values in between: test(92689) ## user system elapsed ## 48.951 0.000 48.946 test(92690) ## user system elapsed ## 0.036 0.000 0.038 And then worse again... test(139022) ## user system elapsed ## 0.068 0.003 0.071 test(139023) ## user system elapsed ## 114.239 0.000 114.279 I see this on both Ubuntu 9.04 and OS X 10.6, using R 2.9.2 in both cases. Has this behavior already been identified? Using 'match' instead of direct character indexing is a serviceable workaround ... is it in fact the recommended approach in this case? Thanks, Jim ------------------------------ James Regetz, Ph.D. Scientific Programmer/Analyst National Center for Ecological Analysis & Synthesis 735 State St, Suite 300 Santa Barbara, CA 93101