I am currently puzzled by a seach path behavior. I have a library of a dozen routines getlabs(), getssn(), getecg(), ... that interface to local repositories and pull back patient information. All have a the first 6 arguments in common, and immediately call a second routine to do initial processing of these 6. The functions "joe" and "fred" below capture the relevant portion of them. My puzzle is this: the last test in the "test" file works fine if these routines are sourced and executed at the command line, it fails if the routines are bundled up and loaded as a library. That test is motivated by a user who called his data set "t", and ended up with a match to base:::t instead of his data, resulting in a strange error message out of model.frame --- you can always count on the users! (There are a few hundred.) I'm attempting to be careful with envr and enclos arguments -- how does base end up earlier in the search path? Perhaps this is clearly stated in the docs and just not clear to me? A working solution to the dilemma is of course more than welcome. Terry Therneau code: joe <- function(id, data, subset, na.action, date1, date2, other.args) { Call <- match.call() if (!missing(data)) temp <- fred(Call) temp } fred <- function(Call) { # get a first copy of the id and date variables index <- match(c("id", "date1", "date2"), names(Call), nomatch=0) temp <- Call[c(1, index)] temp[[1]] <- as.name("list") pf <- parent.frame(2) # the caller of the caller data <- eval(Call$data, envir=pf) ldata <- eval(temp, data, enclos= pf) date1 <- ldata$date1 date2 <- ldata$date2 # Users are allowed great flexibility with dates. Both can be given # as length 1 parameters, both can be in the data set, or one could # be in each place. Call model.frame with a built up formula that # includes the id and any dates of length greater than 1. This allows # subset and na.action to be applied in the usual way. index <- match(c("data", "subset", "na.action"), names(Call), nomatch=0) temp <- Call[c(1, index)] temp[[1]] <- as.name("model.frame") tform <- "~ id" if (length(date1) > 1 && is.name(Call$date1)) tform <- paste(tform, "+", as.character(Call$date1)) if (length(date2) > 1 && is.name(Call$date2)) tform <- paste(tform, "+", as.character(Call$date2)) tform <- as.formula(tform) environment(tform) <- pf temp$formula <- tform mf <- eval(temp, enclos=pf) # At this point the real routine has checks for legal dates, date1 <= date2, etc # It returns the tidied up id, date1, date2 vectors. list(ldata=ldata, mf=mf) } test: library(puzzle) tdata <- data.frame(id=1:10, start=as.Date(paste0("1999/", 1:10, "/25"))) xdate <- as.Date(paste0(2001:2010, "/03/10")) joe(id, tdata, date1= "2001/10/11", date2= xdate[2]) joe(id, tdata, date1=start, date2=xdate) sqrt <- xdate cos <- tdata joe(id, cos, date1=start, date2=sqrt) DESCRIPTION: Title: A puzzle Priority: optional Package: puzzle Version: 1.1-1 LazyLoad: Yes LazyData: Yes Authors at R: c(person(c("Terry", "M"), "Therneau", email="therneau.terry at mayo.edu", role=c("cre"))) Description: What gives with my tests? License: GPL NAMESPACE: export("joe")
On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote:> I am currently puzzled by a seach path behavior. I have a library of a dozen routines > getlabs(), getssn(), getecg(), ... that interface to local repositories and pull back > patient information. All have a the first 6 arguments in common, and immediately call a > second routine to do initial processing of these 6. The functions "joe" and "fred" below > capture the relevant portion of them. > My puzzle is this: the last test in the "test" file works fine if these routines are > sourced and executed at the command line, it fails if the routines are bundled up and > loaded as a library. That test is motivated by a user who called his data set "t", and > ended up with a match to base:::t instead of his data, resulting in a strange error > message out of model.frame --- you can always count on the users! (There are a few hundred.) > I'm attempting to be careful with envr and enclos arguments -- how does base end up > earlier in the search path? Perhaps this is clearly stated in the docs and just not > clear to me? A working solution to the dilemma is of course more than welcome.I haven't followed through all the details in fred(), but I can answer the last question. In package code, the search order is: - the package environment - the imports to the package (with base being an implicit import) - the global environment and the rest of the search list. In code sourced to the global environment, only the third of these is searched. Since base is in the second one, it is found first in the package version. Duncan Murdoch> > Terry Therneau > > > code: > joe <- function(id, data, subset, na.action, date1, date2, other.args) { > Call <- match.call() > if (!missing(data)) temp <- fred(Call) > > temp > } > > fred <- function(Call) { > # get a first copy of the id and date variables > index <- match(c("id", "date1", "date2"), names(Call), nomatch=0) > temp <- Call[c(1, index)] > temp[[1]] <- as.name("list") > > pf <- parent.frame(2) # the caller of the caller > data <- eval(Call$data, envir=pf) > > ldata <- eval(temp, data, enclos= pf) > date1 <- ldata$date1 > date2 <- ldata$date2 > > # Users are allowed great flexibility with dates. Both can be given > # as length 1 parameters, both can be in the data set, or one could > # be in each place. Call model.frame with a built up formula that > # includes the id and any dates of length greater than 1. This allows > # subset and na.action to be applied in the usual way. > index <- match(c("data", "subset", "na.action"), names(Call), nomatch=0) > temp <- Call[c(1, index)] > temp[[1]] <- as.name("model.frame") > tform <- "~ id" > if (length(date1) > 1 && is.name(Call$date1)) > tform <- paste(tform, "+", as.character(Call$date1)) > if (length(date2) > 1 && is.name(Call$date2)) > tform <- paste(tform, "+", as.character(Call$date2)) > > tform <- as.formula(tform) > environment(tform) <- pf > temp$formula <- tform > mf <- eval(temp, enclos=pf) > > # At this point the real routine has checks for legal dates, date1 <= date2, etc > # It returns the tidied up id, date1, date2 vectors. > list(ldata=ldata, mf=mf) > } > > test: > library(puzzle) > tdata <- data.frame(id=1:10, > start=as.Date(paste0("1999/", 1:10, "/25"))) > xdate <- as.Date(paste0(2001:2010, "/03/10")) > > joe(id, tdata, date1= "2001/10/11", date2= xdate[2]) > joe(id, tdata, date1=start, date2=xdate) > > sqrt <- xdate > cos <- tdata > > joe(id, cos, date1=start, date2=sqrt) > > DESCRIPTION: > Title: A puzzle > Priority: optional > Package: puzzle > Version: 1.1-1 > LazyLoad: Yes > LazyData: Yes > Authors at R: c(person(c("Terry", "M"), "Therneau", > email="therneau.terry at mayo.edu", > role=c("cre"))) > Description: What gives with my tests? > License: GPL > > NAMESPACE: > export("joe") > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Duncan, That's helpful. Two follow-up questions: 1. Where would I have found this information? I had looked at eval and model.frame. 2. What stops the following code from falling down the same rabbit hole? Shouldn't it find base::cos first? library(survival) cos <- lung coxph(Surv(time, status) ~ age, data=cos) Terry T. On 11/06/2015 07:51 AM, Duncan Murdoch wrote:> On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote: >> I am currently puzzled by a seach path behavior. I have a library of a dozen routines >> getlabs(), getssn(), getecg(), ... that interface to local repositories and pull back >> patient information. All have a the first 6 arguments in common, and immediately call a >> second routine to do initial processing of these 6. The functions "joe" and "fred" below >> capture the relevant portion of them. >> My puzzle is this: the last test in the "test" file works fine if these routines are >> sourced and executed at the command line, it fails if the routines are bundled up and >> loaded as a library. That test is motivated by a user who called his data set "t", and >> ended up with a match to base:::t instead of his data, resulting in a strange error >> message out of model.frame --- you can always count on the users! (There are a few >> hundred.) >> I'm attempting to be careful with envr and enclos arguments -- how does base end up >> earlier in the search path? Perhaps this is clearly stated in the docs and just not >> clear to me? A working solution to the dilemma is of course more than welcome. > > I haven't followed through all the details in fred(), but I can answer the last question. > In package code, the search order is: > > - the package environment > - the imports to the package (with base being an implicit import) > - the global environment and the rest of the search list. > > In code sourced to the global environment, only the third of these is searched. Since > base is in the second one, it is found first in the package version. > > Duncan Murdoch
On 06/11/2015 8:20 AM, Therneau, Terry M., Ph.D. wrote:> Duncan, > That's helpful. Two follow-up questions: > 1. Where would I have found this information? I had looked at eval and model.frame.I think the best description is Luke's article on namespaces, "Name space management for R". Luke Tierney, R News, 3(1):2-6, June 2003. There's a link to it from the "Technical papers" section of the HTML help index. There's also a short description of this in the R Language Definition manual in the "Search path" section 3.5.4.> 2. What stops the following code from falling down the same rabbit hole? Shouldn't it > find base::cos first? > > library(survival) > cos <- lung > coxph(Surv(time, status) ~ age, data=cos)If that code is in a function anywhere (package or not), cos will be a local variable created there in the evaluation environment created when you evaluate the function. If you execute it at the command line, you'll create a variable called "cos" in the global environment. Local variables come ahead of the 3 places I listed. (This is why Luke's article is good: it doesn't oversimplify.) There's one other twist. Even with cos being a local variable, cos(theta) would find base::cos, because the evaluator knows it is looking for a function (since it's a function call) and will skip over the local dataframe named cos. Duncan Murdoch> > Terry T. > > > On 11/06/2015 07:51 AM, Duncan Murdoch wrote: >> On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote: >>> I am currently puzzled by a seach path behavior. I have a library of a dozen routines >>> getlabs(), getssn(), getecg(), ... that interface to local repositories and pull back >>> patient information. All have a the first 6 arguments in common, and immediately call a >>> second routine to do initial processing of these 6. The functions "joe" and "fred" below >>> capture the relevant portion of them. >>> My puzzle is this: the last test in the "test" file works fine if these routines are >>> sourced and executed at the command line, it fails if the routines are bundled up and >>> loaded as a library. That test is motivated by a user who called his data set "t", and >>> ended up with a match to base:::t instead of his data, resulting in a strange error >>> message out of model.frame --- you can always count on the users! (There are a few >>> hundred.) >>> I'm attempting to be careful with envr and enclos arguments -- how does base end up >>> earlier in the search path? Perhaps this is clearly stated in the docs and just not >>> clear to me? A working solution to the dilemma is of course more than welcome. >> >> I haven't followed through all the details in fred(), but I can answer the last question. >> In package code, the search order is: >> >> - the package environment >> - the imports to the package (with base being an implicit import) >> - the global environment and the rest of the search list. >> >> In code sourced to the global environment, only the third of these is searched. Since >> base is in the second one, it is found first in the package version. >> >> Duncan Murdoch
This code which I think I wrote but might have gotten from elsewhere a long time ago shows the environments that are searched from a given function, in this case chart.RelativePerformance in PerformanceAnalytics package. Try it on some of your functions in and out of packages to help determine the sequence of environments R searches along: library( PerformanceAnalytics ) ## change as needed x <- environment(chart.RelativePerformance) ## change as needed str(x) while (!identical(x, emptyenv())) { p <- parent.env(x) cat("---- child is above this line and parent is below ----\n") str(p) if (isBaseNamespace(p)) cat("Same as .BaseNamespaceEnv\n") if (identical(p, baseenv())) cat("Same as baseenv()\n") if (identical(p, emptyenv())) cat("Same as emptyenv()\n") if (identical(p, globalenv())) cat("Same as globalenv()\n") x <- p } On Fri, Nov 6, 2015 at 9:47 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 06/11/2015 8:20 AM, Therneau, Terry M., Ph.D. wrote: >> >> Duncan, >> That's helpful. Two follow-up questions: >> 1. Where would I have found this information? I had looked at eval and >> model.frame. > > > I think the best description is Luke's article on namespaces, "Name space > management for R". Luke Tierney, R News, 3(1):2-6, June 2003. There's a link > to it from the "Technical papers" section of the HTML help index. There's > also a short description of this in the R Language Definition manual in the > "Search path" section 3.5.4. > > >> 2. What stops the following code from falling down the same rabbit hole? >> Shouldn't it >> find base::cos first? >> >> library(survival) >> cos <- lung >> coxph(Surv(time, status) ~ age, data=cos) > > > If that code is in a function anywhere (package or not), cos will be a local > variable created there in the evaluation environment created when you > evaluate the function. If you execute it at the command line, you'll create > a variable called "cos" in the global environment. Local variables come > ahead of the 3 places I listed. (This is why Luke's article is good: it > doesn't oversimplify.) > > There's one other twist. Even with cos being a local variable, cos(theta) > would find base::cos, because the evaluator knows it is looking for a > function (since it's a function call) and will skip over the local dataframe > named cos. > > Duncan Murdoch > >> >> Terry T. >> >> >> On 11/06/2015 07:51 AM, Duncan Murdoch wrote: >>> >>> On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote: >>>> >>>> I am currently puzzled by a seach path behavior. I have a library of a >>>> dozen routines >>>> getlabs(), getssn(), getecg(), ... that interface to local repositories >>>> and pull back >>>> patient information. All have a the first 6 arguments in common, and >>>> immediately call a >>>> second routine to do initial processing of these 6. The functions "joe" >>>> and "fred" below >>>> capture the relevant portion of them. >>>> My puzzle is this: the last test in the "test" file works fine if >>>> these routines are >>>> sourced and executed at the command line, it fails if the routines are >>>> bundled up and >>>> loaded as a library. That test is motivated by a user who called his >>>> data set "t", and >>>> ended up with a match to base:::t instead of his data, resulting in a >>>> strange error >>>> message out of model.frame --- you can always count on the users! >>>> (There are a few >>>> hundred.) >>>> I'm attempting to be careful with envr and enclos arguments -- how >>>> does base end up >>>> earlier in the search path? Perhaps this is clearly stated in the docs >>>> and just not >>>> clear to me? A working solution to the dilemma is of course more than >>>> welcome. >>> >>> >>> I haven't followed through all the details in fred(), but I can answer >>> the last question. >>> In package code, the search order is: >>> >>> - the package environment >>> - the imports to the package (with base being an implicit import) >>> - the global environment and the rest of the search list. >>> >>> In code sourced to the global environment, only the third of these is >>> searched. Since >>> base is in the second one, it is found first in the package version. >>> >>> Duncan Murdoch > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com