Hervé Pagès
2013-May-08 20:53 UTC
[Rd] subsetting by name is very slow when subscript contains a lot of "invalid" names
Hi, Note sure why but subsetting by name is *very* slow when the character vector used as subscript contains a lot of "invalid" names: x <- c(A=10L, B=20L, C=30L) subscript <- c(LETTERS[1:3], sprintf("ID%05d", 1:150000)) > system.time(y1 <- x[subscript]) user system elapsed 111.991 0.000 112.230 Since subsetting by name is basically equivalent to i <- match(subscript, names(x)) x[i] it's quite surprising that the former is more than 10 thousand times slower than the latter: > system.time({i <- match(subscript, names(x)); y2 <- x[i]}) user system elapsed 0.008 0.000 0.007 > identical(y2, y1) [1] TRUE Thanks, H. PS: This issue was already reported here https://stat.ethz.ch/pipermail/r-devel/2010-July/057945.html in 2010, and with a proposed fix by Martin Morgan. > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicRanges_1.13.8 IRanges_1.19.3 BiocGenerics_0.7.2 loaded via a namespace (and not attached): [1] stats4_3.0.0 tools_3.0.0 -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Possibly Parallel Threads
- cache most-recent dispatch
- Error on executing functions from installed package
- slotName defined in object, present in instance, but inaccessible [SCL:4]
- slotName defined in object, present in instance, but inaccessible [SCL:4]
- subsetting matrix by subscript=0,x silently skips.