I am currently puzzled by a seach path behavior. I have a library of a dozen
routines
getlabs(), getssn(), getecg(), ... that interface to local repositories and pull
back
patient information. All have a the first 6 arguments in common, and
immediately call a
second routine to do initial processing of these 6. The functions
"joe" and "fred" below
capture the relevant portion of them.
My puzzle is this: the last test in the "test" file works fine if
these routines are
sourced and executed at the command line, it fails if the routines are bundled
up and
loaded as a library. That test is motivated by a user who called his data set
"t", and
ended up with a match to base:::t instead of his data, resulting in a strange
error
message out of model.frame --- you can always count on the users! (There are a
few hundred.)
I'm attempting to be careful with envr and enclos arguments -- how does
base end up
earlier in the search path? Perhaps this is clearly stated in the docs and
just not
clear to me? A working solution to the dilemma is of course more than welcome.
Terry Therneau
code:
joe <- function(id, data, subset, na.action, date1, date2, other.args) {
Call <- match.call()
if (!missing(data)) temp <- fred(Call)
temp
}
fred <- function(Call) {
# get a first copy of the id and date variables
index <- match(c("id", "date1", "date2"),
names(Call), nomatch=0)
temp <- Call[c(1, index)]
temp[[1]] <- as.name("list")
pf <- parent.frame(2) # the caller of the caller
data <- eval(Call$data, envir=pf)
ldata <- eval(temp, data, enclos= pf)
date1 <- ldata$date1
date2 <- ldata$date2
# Users are allowed great flexibility with dates. Both can be given
# as length 1 parameters, both can be in the data set, or one could
# be in each place. Call model.frame with a built up formula that
# includes the id and any dates of length greater than 1. This allows
# subset and na.action to be applied in the usual way.
index <- match(c("data", "subset",
"na.action"), names(Call), nomatch=0)
temp <- Call[c(1, index)]
temp[[1]] <- as.name("model.frame")
tform <- "~ id"
if (length(date1) > 1 && is.name(Call$date1))
tform <- paste(tform, "+", as.character(Call$date1))
if (length(date2) > 1 && is.name(Call$date2))
tform <- paste(tform, "+", as.character(Call$date2))
tform <- as.formula(tform)
environment(tform) <- pf
temp$formula <- tform
mf <- eval(temp, enclos=pf)
# At this point the real routine has checks for legal dates, date1 <=
date2, etc
# It returns the tidied up id, date1, date2 vectors.
list(ldata=ldata, mf=mf)
}
test:
library(puzzle)
tdata <- data.frame(id=1:10,
start=as.Date(paste0("1999/", 1:10,
"/25")))
xdate <- as.Date(paste0(2001:2010, "/03/10"))
joe(id, tdata, date1= "2001/10/11", date2= xdate[2])
joe(id, tdata, date1=start, date2=xdate)
sqrt <- xdate
cos <- tdata
joe(id, cos, date1=start, date2=sqrt)
DESCRIPTION:
Title: A puzzle
Priority: optional
Package: puzzle
Version: 1.1-1
LazyLoad: Yes
LazyData: Yes
Authors at R: c(person(c("Terry", "M"),
"Therneau",
email="therneau.terry at mayo.edu",
role=c("cre")))
Description: What gives with my tests?
License: GPL
NAMESPACE:
export("joe")
On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote:> I am currently puzzled by a seach path behavior. I have a library of a dozen routines > getlabs(), getssn(), getecg(), ... that interface to local repositories and pull back > patient information. All have a the first 6 arguments in common, and immediately call a > second routine to do initial processing of these 6. The functions "joe" and "fred" below > capture the relevant portion of them. > My puzzle is this: the last test in the "test" file works fine if these routines are > sourced and executed at the command line, it fails if the routines are bundled up and > loaded as a library. That test is motivated by a user who called his data set "t", and > ended up with a match to base:::t instead of his data, resulting in a strange error > message out of model.frame --- you can always count on the users! (There are a few hundred.) > I'm attempting to be careful with envr and enclos arguments -- how does base end up > earlier in the search path? Perhaps this is clearly stated in the docs and just not > clear to me? A working solution to the dilemma is of course more than welcome.I haven't followed through all the details in fred(), but I can answer the last question. In package code, the search order is: - the package environment - the imports to the package (with base being an implicit import) - the global environment and the rest of the search list. In code sourced to the global environment, only the third of these is searched. Since base is in the second one, it is found first in the package version. Duncan Murdoch> > Terry Therneau > > > code: > joe <- function(id, data, subset, na.action, date1, date2, other.args) { > Call <- match.call() > if (!missing(data)) temp <- fred(Call) > > temp > } > > fred <- function(Call) { > # get a first copy of the id and date variables > index <- match(c("id", "date1", "date2"), names(Call), nomatch=0) > temp <- Call[c(1, index)] > temp[[1]] <- as.name("list") > > pf <- parent.frame(2) # the caller of the caller > data <- eval(Call$data, envir=pf) > > ldata <- eval(temp, data, enclos= pf) > date1 <- ldata$date1 > date2 <- ldata$date2 > > # Users are allowed great flexibility with dates. Both can be given > # as length 1 parameters, both can be in the data set, or one could > # be in each place. Call model.frame with a built up formula that > # includes the id and any dates of length greater than 1. This allows > # subset and na.action to be applied in the usual way. > index <- match(c("data", "subset", "na.action"), names(Call), nomatch=0) > temp <- Call[c(1, index)] > temp[[1]] <- as.name("model.frame") > tform <- "~ id" > if (length(date1) > 1 && is.name(Call$date1)) > tform <- paste(tform, "+", as.character(Call$date1)) > if (length(date2) > 1 && is.name(Call$date2)) > tform <- paste(tform, "+", as.character(Call$date2)) > > tform <- as.formula(tform) > environment(tform) <- pf > temp$formula <- tform > mf <- eval(temp, enclos=pf) > > # At this point the real routine has checks for legal dates, date1 <= date2, etc > # It returns the tidied up id, date1, date2 vectors. > list(ldata=ldata, mf=mf) > } > > test: > library(puzzle) > tdata <- data.frame(id=1:10, > start=as.Date(paste0("1999/", 1:10, "/25"))) > xdate <- as.Date(paste0(2001:2010, "/03/10")) > > joe(id, tdata, date1= "2001/10/11", date2= xdate[2]) > joe(id, tdata, date1=start, date2=xdate) > > sqrt <- xdate > cos <- tdata > > joe(id, cos, date1=start, date2=sqrt) > > DESCRIPTION: > Title: A puzzle > Priority: optional > Package: puzzle > Version: 1.1-1 > LazyLoad: Yes > LazyData: Yes > Authors at R: c(person(c("Terry", "M"), "Therneau", > email="therneau.terry at mayo.edu", > role=c("cre"))) > Description: What gives with my tests? > License: GPL > > NAMESPACE: > export("joe") > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Duncan,
That's helpful. Two follow-up questions:
1. Where would I have found this information? I had looked at eval and
model.frame.
2. What stops the following code from falling down the same rabbit hole?
Shouldn't it
find base::cos first?
library(survival)
cos <- lung
coxph(Surv(time, status) ~ age, data=cos)
Terry T.
On 11/06/2015 07:51 AM, Duncan Murdoch wrote:> On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote:
>> I am currently puzzled by a seach path behavior. I have a library of a
dozen routines
>> getlabs(), getssn(), getecg(), ... that interface to local repositories
and pull back
>> patient information. All have a the first 6 arguments in common, and
immediately call a
>> second routine to do initial processing of these 6. The functions
"joe" and "fred" below
>> capture the relevant portion of them.
>> My puzzle is this: the last test in the "test" file works
fine if these routines are
>> sourced and executed at the command line, it fails if the routines are
bundled up and
>> loaded as a library. That test is motivated by a user who called his
data set "t", and
>> ended up with a match to base:::t instead of his data, resulting in a
strange error
>> message out of model.frame --- you can always count on the users!
(There are a few
>> hundred.)
>> I'm attempting to be careful with envr and enclos arguments --
how does base end up
>> earlier in the search path? Perhaps this is clearly stated in the
docs and just not
>> clear to me? A working solution to the dilemma is of course more than
welcome.
>
> I haven't followed through all the details in fred(), but I can answer
the last question.
> In package code, the search order is:
>
> - the package environment
> - the imports to the package (with base being an implicit import)
> - the global environment and the rest of the search list.
>
> In code sourced to the global environment, only the third of these is
searched. Since
> base is in the second one, it is found first in the package version.
>
> Duncan Murdoch
On 06/11/2015 8:20 AM, Therneau, Terry M., Ph.D. wrote:> Duncan, > That's helpful. Two follow-up questions: > 1. Where would I have found this information? I had looked at eval and model.frame.I think the best description is Luke's article on namespaces, "Name space management for R". Luke Tierney, R News, 3(1):2-6, June 2003. There's a link to it from the "Technical papers" section of the HTML help index. There's also a short description of this in the R Language Definition manual in the "Search path" section 3.5.4.> 2. What stops the following code from falling down the same rabbit hole? Shouldn't it > find base::cos first? > > library(survival) > cos <- lung > coxph(Surv(time, status) ~ age, data=cos)If that code is in a function anywhere (package or not), cos will be a local variable created there in the evaluation environment created when you evaluate the function. If you execute it at the command line, you'll create a variable called "cos" in the global environment. Local variables come ahead of the 3 places I listed. (This is why Luke's article is good: it doesn't oversimplify.) There's one other twist. Even with cos being a local variable, cos(theta) would find base::cos, because the evaluator knows it is looking for a function (since it's a function call) and will skip over the local dataframe named cos. Duncan Murdoch> > Terry T. > > > On 11/06/2015 07:51 AM, Duncan Murdoch wrote: >> On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote: >>> I am currently puzzled by a seach path behavior. I have a library of a dozen routines >>> getlabs(), getssn(), getecg(), ... that interface to local repositories and pull back >>> patient information. All have a the first 6 arguments in common, and immediately call a >>> second routine to do initial processing of these 6. The functions "joe" and "fred" below >>> capture the relevant portion of them. >>> My puzzle is this: the last test in the "test" file works fine if these routines are >>> sourced and executed at the command line, it fails if the routines are bundled up and >>> loaded as a library. That test is motivated by a user who called his data set "t", and >>> ended up with a match to base:::t instead of his data, resulting in a strange error >>> message out of model.frame --- you can always count on the users! (There are a few >>> hundred.) >>> I'm attempting to be careful with envr and enclos arguments -- how does base end up >>> earlier in the search path? Perhaps this is clearly stated in the docs and just not >>> clear to me? A working solution to the dilemma is of course more than welcome. >> >> I haven't followed through all the details in fred(), but I can answer the last question. >> In package code, the search order is: >> >> - the package environment >> - the imports to the package (with base being an implicit import) >> - the global environment and the rest of the search list. >> >> In code sourced to the global environment, only the third of these is searched. Since >> base is in the second one, it is found first in the package version. >> >> Duncan Murdoch
This code which I think I wrote but might have gotten from elsewhere a
long time ago shows the environments that are searched from a given
function, in this case chart.RelativePerformance in
PerformanceAnalytics package. Try it on some of your functions in
and out of packages to help determine the sequence of environments R
searches along:
library( PerformanceAnalytics ) ## change as needed
x <- environment(chart.RelativePerformance) ## change as needed
str(x)
while (!identical(x, emptyenv())) {
p <- parent.env(x)
cat("---- child is above this line and parent is below ----\n")
str(p)
if (isBaseNamespace(p)) cat("Same as .BaseNamespaceEnv\n")
if (identical(p, baseenv())) cat("Same as baseenv()\n")
if (identical(p, emptyenv())) cat("Same as emptyenv()\n")
if (identical(p, globalenv())) cat("Same as globalenv()\n")
x <- p
}
On Fri, Nov 6, 2015 at 9:47 AM, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:> On 06/11/2015 8:20 AM, Therneau, Terry M., Ph.D. wrote:
>>
>> Duncan,
>> That's helpful. Two follow-up questions:
>> 1. Where would I have found this information? I had looked at eval and
>> model.frame.
>
>
> I think the best description is Luke's article on namespaces,
"Name space
> management for R". Luke Tierney, R News, 3(1):2-6, June 2003.
There's a link
> to it from the "Technical papers" section of the HTML help index.
There's
> also a short description of this in the R Language Definition manual in the
> "Search path" section 3.5.4.
>
>
>> 2. What stops the following code from falling down the same rabbit
hole?
>> Shouldn't it
>> find base::cos first?
>>
>> library(survival)
>> cos <- lung
>> coxph(Surv(time, status) ~ age, data=cos)
>
>
> If that code is in a function anywhere (package or not), cos will be a
local
> variable created there in the evaluation environment created when you
> evaluate the function. If you execute it at the command line, you'll
create
> a variable called "cos" in the global environment. Local
variables come
> ahead of the 3 places I listed. (This is why Luke's article is good:
it
> doesn't oversimplify.)
>
> There's one other twist. Even with cos being a local variable,
cos(theta)
> would find base::cos, because the evaluator knows it is looking for a
> function (since it's a function call) and will skip over the local
dataframe
> named cos.
>
> Duncan Murdoch
>
>>
>> Terry T.
>>
>>
>> On 11/06/2015 07:51 AM, Duncan Murdoch wrote:
>>>
>>> On 06/11/2015 7:36 AM, Therneau, Terry M., Ph.D. wrote:
>>>>
>>>> I am currently puzzled by a seach path behavior. I have a
library of a
>>>> dozen routines
>>>> getlabs(), getssn(), getecg(), ... that interface to local
repositories
>>>> and pull back
>>>> patient information. All have a the first 6 arguments in
common, and
>>>> immediately call a
>>>> second routine to do initial processing of these 6. The
functions "joe"
>>>> and "fred" below
>>>> capture the relevant portion of them.
>>>> My puzzle is this: the last test in the "test"
file works fine if
>>>> these routines are
>>>> sourced and executed at the command line, it fails if the
routines are
>>>> bundled up and
>>>> loaded as a library. That test is motivated by a user who
called his
>>>> data set "t", and
>>>> ended up with a match to base:::t instead of his data,
resulting in a
>>>> strange error
>>>> message out of model.frame --- you can always count on the
users!
>>>> (There are a few
>>>> hundred.)
>>>> I'm attempting to be careful with envr and enclos
arguments -- how
>>>> does base end up
>>>> earlier in the search path? Perhaps this is clearly stated in
the docs
>>>> and just not
>>>> clear to me? A working solution to the dilemma is of course
more than
>>>> welcome.
>>>
>>>
>>> I haven't followed through all the details in fred(), but I can
answer
>>> the last question.
>>> In package code, the search order is:
>>>
>>> - the package environment
>>> - the imports to the package (with base being an implicit import)
>>> - the global environment and the rest of the search list.
>>>
>>> In code sourced to the global environment, only the third of these
is
>>> searched. Since
>>> base is in the second one, it is found first in the package
version.
>>>
>>> Duncan Murdoch
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com