Hi all, Profiling turned up a bit of a speedbump in the library function. I submitted a patch to the R bug tracker as bug 16168 and I've also included it below. The alternate code is simpler and easier to read/maintain, I believe. Any thoughts on other ways to write this? Index: src/library/base/R/library.R ==================================================================--- src/library/base/R/library.R (revision 67578) +++ src/library/base/R/library.R (working copy) @@ -688,18 +688,8 @@ out <- character() for(pkg in package) { - paths <- character() - for(lib in lib.loc) { - dirs <- list.files(lib, - pattern = paste0("^", pkg, "$"), - full.names = TRUE) - ## Note that we cannot use tools::file_test() here, as - ## cyclic namespace dependencies are not supported. Argh. - paths <- c(paths, - dirs[dir.exists(dirs) & - file.exists(file.path(dirs, - "DESCRIPTION"))]) - } + paths <- file.path(lib.loc, pkg) + paths <- paths[ file.exists(file.path(paths, "DESCRIPTION")) ] if(use_loaded && pkg %in% loadedNamespaces()) { dir <- if (pkg == "base") system.file() else getNamespaceInfo(pkg, "path") Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phaverty at gene.com
I think you can simplify a little by replacing this: pkg %in% loadedNamespaces() with this: .getNamespace(pkg) Whereas getNamespace(pkg) will load the package if it's not already loaded, calling .getNamespace(pkg) (note the leading dot) won't load the package. I can't speak to whether there are any pitfalls in changing the library path searching, though. -Winston On Thu, Jan 22, 2015 at 12:25 PM, Peter Haverty <haverty.peter at gene.com> wrote:> Hi all, > > Profiling turned up a bit of a speedbump in the library function. I > submitted a patch to the R bug tracker as bug 16168 and I've also > included it below. The alternate code is simpler and easier to > read/maintain, I believe. Any thoughts on other ways to write this? > > Index: src/library/base/R/library.R > ==================================================================> --- src/library/base/R/library.R (revision 67578) > +++ src/library/base/R/library.R (working copy) > @@ -688,18 +688,8 @@ > out <- character() > > for(pkg in package) { > - paths <- character() > - for(lib in lib.loc) { > - dirs <- list.files(lib, > - pattern = paste0("^", pkg, "$"), > - full.names = TRUE) > - ## Note that we cannot use tools::file_test() here, as > - ## cyclic namespace dependencies are not supported. Argh. > - paths <- c(paths, > - dirs[dir.exists(dirs) & > - file.exists(file.path(dirs, > - "DESCRIPTION"))]) > - } > + paths <- file.path(lib.loc, pkg) > + paths <- paths[ file.exists(file.path(paths, "DESCRIPTION")) ] > if(use_loaded && pkg %in% loadedNamespaces()) { > dir <- if (pkg == "base") system.file() > else getNamespaceInfo(pkg, "path") > > Pete > > ____________________ > Peter M. Haverty, Ph.D. > Genentech, Inc. > phaverty at gene.com > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks Winston, Yes, your version of that part is more direct. I guess it would need a ! is.null() too. I think we should use .getNamespace. It It also occurred to me that this %in% check (which happens in a few places) is kind of roundabout. It equates to "foo" %in% ls(.Internal(getNamespaceRegistry()), all.names = TRUE) We lack and R-level accessor for the namespace registry, but if we had one we could do getNamespaceRegistry()[["foo"]] , which is just a hash lookup. I'm getting a bit off topic here, but ... "foo" %in% vector is a common pattern and reads well, but any(vector == "foo") is less work and much faster. I wonder if there is room for a fast path there ... Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phaverty at gene.com On Fri, Jan 23, 2015 at 8:15 AM, Winston Chang <winstonchang1 at gmail.com> wrote:> I think you can simplify a little by replacing this: > pkg %in% loadedNamespaces() > with this: > .getNamespace(pkg) > > Whereas getNamespace(pkg) will load the package if it's not already > loaded, calling .getNamespace(pkg) (note the leading dot) won't load > the package. > > I can't speak to whether there are any pitfalls in changing the > library path searching, though. > > -Winston > > > On Thu, Jan 22, 2015 at 12:25 PM, Peter Haverty <haverty.peter at gene.com> wrote: >> Hi all, >> >> Profiling turned up a bit of a speedbump in the library function. I >> submitted a patch to the R bug tracker as bug 16168 and I've also >> included it below. The alternate code is simpler and easier to >> read/maintain, I believe. Any thoughts on other ways to write this? >> >> Index: src/library/base/R/library.R >> ==================================================================>> --- src/library/base/R/library.R (revision 67578) >> +++ src/library/base/R/library.R (working copy) >> @@ -688,18 +688,8 @@ >> out <- character() >> >> for(pkg in package) { >> - paths <- character() >> - for(lib in lib.loc) { >> - dirs <- list.files(lib, >> - pattern = paste0("^", pkg, "$"), >> - full.names = TRUE) >> - ## Note that we cannot use tools::file_test() here, as >> - ## cyclic namespace dependencies are not supported. Argh. >> - paths <- c(paths, >> - dirs[dir.exists(dirs) & >> - file.exists(file.path(dirs, >> - "DESCRIPTION"))]) >> - } >> + paths <- file.path(lib.loc, pkg) >> + paths <- paths[ file.exists(file.path(paths, "DESCRIPTION")) ] >> if(use_loaded && pkg %in% loadedNamespaces()) { >> dir <- if (pkg == "base") system.file() >> else getNamespaceInfo(pkg, "path") >> >> Pete >> >> ____________________ >> Peter M. Haverty, Ph.D. >> Genentech, Inc. >> phaverty at gene.com >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> Winston Chang <winstonchang1 at gmail.com> >>>>> on Fri, 23 Jan 2015 10:15:53 -0600 writes:> I think you can simplify a little by replacing this: > pkg %in% loadedNamespaces() > with this: > .getNamespace(pkg) almost: It would be !is.null(.getNamespace(pkg)) > Whereas getNamespace(pkg) will load the package if it's not already > loaded, calling .getNamespace(pkg) (note the leading dot) won't load > the package. indeed. And you, Winston, are right that this new code snippet would be an order of magnitude faster : ##----------------------------------------------------------------------------- f1 <- function(pkg) pkg %in% loadedNamespaces() f2 <- function(pkg) !is.null(.getNamespace(pkg)) require(microbenchmark) pkg <- "foo"; (mbM <- microbenchmark(r1 <- f1(pkg), r2 <- f2(pkg))); stopifnot(identical(r1,r2)); r1 ## Unit: microseconds ## expr min lq mean median uq max neval cld ## r1 <- f1(pkg) 38.516 40.9790 42.35037 41.7245 42.4060 82.922 100 b ## r2 <- f2(pkg) 1.331 1.8285 2.13874 2.0855 2.3365 7.252 100 a ## [1] FALSE pkg <- "stats"; (mbM <- microbenchmark(r1 <- f1(pkg), r2 <- f2(pkg))); stopifnot(identical(r1,r2)); r1 ## Unit: microseconds ## expr min lq mean median uq max neval cld ## r1 <- f1(pkg) 29.955 31.2575 32.27748 31.6035 32.1215 62.428 100 b ## r2 <- f2(pkg) 1.067 1.4315 1.71437 1.6335 1.8460 9.169 100 a ## [1] TRUE loadNamespace("Matrix") ## <environment: namespace:Matrix> pkg <- "Matrix"; (mbM <- microbenchmark(r1 <- f1(pkg), r2 <- f2(pkg))); stopifnot(identical(r1,r2)); r1 ## Unit: microseconds ## expr min lq mean median uq max neval cld ## r1 <- f1(pkg) 32.721 33.5205 35.17450 33.9505 34.6050 65.373 100 b ## r2 <- f2(pkg) 1.010 1.3750 1.93671 1.5615 1.7795 12.128 100 a ## [1] TRUE ##----------------------------------------------------------------------------- Hence, indeed, !is.null(.getNamespace(pkg)) seems equivalent to pkg %in% loadedNamespaces() --- when 'pkg' is of length 1 (!!!) but is 20 times faster.... and we have 11 occurrences of ' <...> %in% loadedNamespaces() ' in the "base packages" in the R (devel) sources, 3 in base, 2 in methods, 3 in stats, 2 in tools, 1 in utils.. On the other hand, pkg %in% loadedNamespaces() is extremely nicely readable code, whereas !is.null(.getNamespace(pkg)) is pretty much the contrary. .. and well readable code is so much easier to maintain etc, such that in many cases, code optimization with the cost of code obfuscation is *not* desirable. Of course we could yet again use a few lines of C and R code to provide a new R lowlevel function, say is.loadedNamespace() which would be even faster than !is.null(.getNamespace(pkg)) ... ... but do we have *any* evidence that this would noticably speedup any higher level function such as library() ? Thank you, again, Winston; you've opened an interesting topic! -- Martin Maechler, ETH Zurich