William Dunlap
2016-Dec-06 15:03 UTC
[R] Write a function that allows access to columns of a passeddataframe.
I basically agree with Rui - using substitute will cause trouble. E.g., how
would the user iterate over the columns, calling your function for each?
for(column in dataFrame) func(column)
would fail because dataFrame$column does not exist. You need to provide
an extra argument to handle this case. something like the following:
func <- function(df,
columnAsName,,
columnAsString = deparse(substitute(columnAsName))[1])
...
}
The default value of columnAsString should also deal with the case that
the user supplied something like log(Conc.) instead of Conc.
I think that using a formula for the lazily evaluated argument
(columnAsName)
works well. The user then knows exactly how it gets evaluated.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at
grecc.umaryland.edu>
wrote:
> Over my almost 50 years programming, I have come to believe that if one
> wants a program to be useful, one should write the program to do as much
> work as possible and demand as little as possible from the user of the
> program. In my opinion, one should not ask the person who uses my function
> to remember to put the name of the data frame column in quotation marks.
> The function should be written so that all that needs to be passed is the
> name of the column; the function should take care of the quotation marks.
> Jihny
>
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone) 410-605-7119
> > (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>
>
> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
> >
> > Hello,
> >
> > Just to say that I wouldn't write the function as John did. I
would get
> > rid of all the deparse/substitute stuff and instinctively use a quoted
> > argument as a column name. Something like the following.
> >
> > myfun <- function(frame, var){
> > [...]
> > col <- frame[, var] # or frame[[var]]
> > [...]
> > }
> >
> > myfun(mydf, "age") # much better, simpler, no promises.
> >
> > Rui Barradas
> >
> > Em 05-12-2016 21:49, Bert Gunter escreveu:
> >> Typo: "lazy evaluation" not "lay evaluation."
> >>
> >> -- Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people keep
coming along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom County"
comic strip )
> >>
> >>
> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter <bgunter.4567
at gmail.com>
> wrote:
> >>> Sorry, hit "Send" by mistake.
> >>>
> >>> Inline.
> >>>
> >>>
> >>>
> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
<bgunter.4567 at gmail.com>
> wrote:
> >>>> Inline.
> >>>>
> >>>> -- Bert
> >>>>
> >>>>
> >>>> Bert Gunter
> >>>>
> >>>> "The trouble with having an open mind is that people
keep coming along
> >>>> and sticking things into it."
> >>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>
> >>>>
> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
<ruipbarradas at sapo.pt>
> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> Inline.
> >>>>>
> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
> >>>>>>
> >>>>>>
> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin <
> jsorkin at grecc.umaryland.edu>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Rui,
> >>>>>>> I appreciate your suggestion, but eliminating
the deparse
> statement does
> >>>>>>> not solve my problem. Do you have any other
suggestions? See code
> below.
> >>>>>>> Thank you,
> >>>>>>> John
> >>>>>>>
> >>>>>>>
> >>>>>>> mydf <-
> >>>>>>>
data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),
> age=c(20,34,43,32,21))
> >>>>>>> mydf
> >>>>>>> class(mydf)
> >>>>>>>
> >>>>>>>
> >>>>>>> myfun <- function(frame,var){
> >>>>>>> call <- match.call()
> >>>>>>> print(call)
> >>>>>>>
> >>>>>>>
> >>>>>>> indx <-
match(c("frame","var"),names(call),nomatch=0)
> >>>>>>> print(indx)
> >>>>>>> if(indx[1]==0) stop("Function called
without sufficient
> arguments!")
> >>>>>>>
> >>>>>>>
> >>>>>>> cat("I can get the name of the
dataframe as a text string!\n")
> >>>>>>> #xx <- deparse(substitute(frame))
> >>>>>>> print(xx)
> >>>>>>>
> >>>>>>>
> >>>>>>> cat("I can get the name of the column
as a text string!\n")
> >>>>>>> #yy <- deparse(substitute(var))
> >>>>>>> print(yy)
> >>>>>>>
> >>>>>>>
> >>>>>>> # This does not work.
> >>>>>>> print(frame[,var])
> >>>>>>>
> >>>>>>>
> >>>>>>> # This does not work.
> >>>>>>> print(frame[,"var"])
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> # This does not work.
> >>>>>>> col <- xx[,"yy"]
> >>>>>>>
> >>>>>>>
> >>>>>>> # Nor does this work.
> >>>>>>> col <- xx[,yy]
> >>>>>>> print(col)
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> myfun(mydf,age)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> When you use that calling syntax, the system will
supply the values
> of
> >>>>>> whatever the `age` variable contains. (And if
there is no
> `age`-named
> >>>>>> object, you get an error at the time of the call
to `myfun`.
> >>>>>
> >>>>>
> >>>>> Actually, no, which was very surprising to me but
John's code worked
> (not
> >>>>> the function, the call). And with the change I've
proposed, it worked
> >>>>> flawlessly. No errors. Why I don't know.
> >>>
> >>> See ?substitute and in particular the example highlighted
there.
> >>>
> >>> The technical details are explained in the R Language
Definition
> >>> manual. The key here is the use of promises for lay
evaluations. In
> >>> fact, the expression in the call *is* available within the
functions,
> >>> as is (a pointer to) the environment in which to evaluate the
> >>> expression. That is how substitute() works. Specifically,
quoting from
> >>> the manual,
> >>>
> >>> *****
> >>> It is possible to access the actual (not default) expressions
used as
> >>> arguments inside the function. The mechanism is implemented
via
> >>> promises. When a function is being evaluated the actual
expression
> >>> used as an argument is stored in the promise together with a
pointer
> >>> to the environment the function was called from. When (if) the
> >>> argument is evaluated the stored expression is evaluated in
the
> >>> environment that the function was called from. Since only a
pointer to
> >>> the environment is used any changes made to that environment
will be
> >>> in effect during this evaluation. The resulting value is then
also
> >>> stored in a separate spot in the promise. Subsequent
evaluations
> >>> retrieve this stored value (a second evaluation is not carried
out).
> >>> Access to the unevaluated expression is also available using
> >>> substitute.
> >>> ********
> >>>
> >>> -- Bert
> >>>
> >>>
> >>>
> >>>
> >>>>>
> >>>>> Rui Barradas
> >>>>>
> >>>>> You need either to call it as:
> >>>>>>
> >>>>>>
> >>>>>> myfun( mydf , "age")
> >>>>>>
> >>>>>>
> >>>>>> # Or:
> >>>>>>
> >>>>>> age <- "age"
> >>>>>> myfun( mydf, age)
> >>>>>>
> >>>>>> Unless your value of the `age`-named variable was
"age" in the
> calling
> >>>>>> environment (and you did not give us that value in
either of your
> postings),
> >>>>>> you would fail.
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
http://www.R-project.org/
> posting-guide.html
> >>>>> and provide commented, minimal, self-contained,
reproducible code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:16}}
Rui Barradas
2016-Dec-06 15:33 UTC
[R] Write a function that allows access to columns of a passeddataframe.
Perhaps the best way is the one used by library(), where both
library(package) and library("package") work. It uses
as.charecter/substitute, not deparse/substitute, as follows.
mydf <-
data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)
str(mydf)
myfun <- function(frame,var){
yy <- as.character(substitute(var))
frame[, yy]
}
myfun(mydf, age)
myfun(mydf, "age")
Rui Barradas
Em 06-12-2016 15:03, William Dunlap escreveu:> I basically agree with Rui - using substitute will cause trouble. E.g.,
how
> would the user iterate over the columns, calling your function for each?
> for(column in dataFrame) func(column)
> would fail because dataFrame$column does not exist. You need to provide
> an extra argument to handle this case. something like the following:
> func <- function(df,
> columnAsName,,
> columnAsString = deparse(substitute(columnAsName))[1])
> ...
> }
> The default value of columnAsString should also deal with the case that
> the user supplied something like log(Conc.) instead of Conc.
>
> I think that using a formula for the lazily evaluated argument
> (columnAsName)
> works well. The user then knows exactly how it gets evaluated.
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at
grecc.umaryland.edu
> <mailto:jsorkin at grecc.umaryland.edu>> wrote:
>
> Over my almost 50 years programming, I have come to believe that if
> one wants a program to be useful, one should write the program to do
> as much work as possible and demand as little as possible from the
> user of the program. In my opinion, one should not ask the person
> who uses my function to remember to put the name of the data frame
> column in quotation marks. The function should be written so that
> all that needs to be passed is the name of the column; the function
> should take care of the quotation marks.
> Jihny
>
> > John David Sorkin M.D., Ph.D.
> > Professor of Medicine
> > Chief, Biostatistics and Informatics
> > University of Maryland School of Medicine Division of Gerontology
and Geriatric Medicine
> > Baltimore VA Medical Center
> > 10 North Greene Street
> > GRECC (BT/18/GR)
> > Baltimore, MD 21201-1524
> > (Phone)410-605-7119 <tel:410-605-7119>
> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone
number above
> prior to faxing)
>
>
> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at
sapo.pt
> <mailto:ruipbarradas at sapo.pt>> wrote:
> >
> > Hello,
> >
> > Just to say that I wouldn't write the function as John did. I
> would get
> > rid of all the deparse/substitute stuff and instinctively use a
> quoted
> > argument as a column name. Something like the following.
> >
> > myfun <- function(frame, var){
> > [...]
> > col <- frame[, var] # or frame[[var]]
> > [...]
> > }
> >
> > myfun(mydf, "age") # much better, simpler, no
promises.
> >
> > Rui Barradas
> >
> > Em 05-12-2016 21:49, Bert Gunter escreveu:
> >> Typo: "lazy evaluation" not "lay
evaluation."
> >>
> >> -- Bert
> >>
> >>
> >>
> >> Bert Gunter
> >>
> >> "The trouble with having an open mind is that people
keep coming
> along
> >> and sticking things into it."
> >> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>
> >>
> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter
> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at
gmail.com>> wrote:
> >>> Sorry, hit "Send" by mistake.
> >>>
> >>> Inline.
> >>>
> >>>
> >>>
> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter
> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at
gmail.com>> wrote:
> >>>> Inline.
> >>>>
> >>>> -- Bert
> >>>>
> >>>>
> >>>> Bert Gunter
> >>>>
> >>>> "The trouble with having an open mind is that
people keep
> coming along
> >>>> and sticking things into it."
> >>>> -- Opus (aka Berkeley Breathed in his "Bloom
County" comic strip )
> >>>>
> >>>>
> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas
> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>
wrote:
> >>>>> Hello,
> >>>>>
> >>>>> Inline.
> >>>>>
> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu:
> >>>>>>
> >>>>>>
> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin
> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at
grecc.umaryland.edu>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Rui,
> >>>>>>> I appreciate your suggestion, but
eliminating the deparse
> statement does
> >>>>>>> not solve my problem. Do you have any
other suggestions?
> See code below.
> >>>>>>> Thank you,
> >>>>>>> John
> >>>>>>>
> >>>>>>>
> >>>>>>> mydf <-
> >>>>>>>
>
data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
> >>>>>>> mydf
> >>>>>>> class(mydf)
> >>>>>>>
> >>>>>>>
> >>>>>>> myfun <- function(frame,var){
> >>>>>>> call <- match.call()
> >>>>>>> print(call)
> >>>>>>>
> >>>>>>>
> >>>>>>> indx <-
match(c("frame","var"),names(call),nomatch=0)
> >>>>>>> print(indx)
> >>>>>>> if(indx[1]==0) stop("Function
called without sufficient
> arguments!")
> >>>>>>>
> >>>>>>>
> >>>>>>> cat("I can get the name of the
dataframe as a text
> string!\n")
> >>>>>>> #xx <- deparse(substitute(frame))
> >>>>>>> print(xx)
> >>>>>>>
> >>>>>>>
> >>>>>>> cat("I can get the name of the
column as a text string!\n")
> >>>>>>> #yy <- deparse(substitute(var))
> >>>>>>> print(yy)
> >>>>>>>
> >>>>>>>
> >>>>>>> # This does not work.
> >>>>>>> print(frame[,var])
> >>>>>>>
> >>>>>>>
> >>>>>>> # This does not work.
> >>>>>>> print(frame[,"var"])
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> # This does not work.
> >>>>>>> col <- xx[,"yy"]
> >>>>>>>
> >>>>>>>
> >>>>>>> # Nor does this work.
> >>>>>>> col <- xx[,yy]
> >>>>>>> print(col)
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> myfun(mydf,age)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> When you use that calling syntax, the system
will supply the
> values of
> >>>>>> whatever the `age` variable contains. (And if
there is no
> `age`-named
> >>>>>> object, you get an error at the time of the
call to `myfun`.
> >>>>>
> >>>>>
> >>>>> Actually, no, which was very surprising to me but
John's code
> worked (not
> >>>>> the function, the call). And with the change
I've proposed,
> it worked
> >>>>> flawlessly. No errors. Why I don't know.
> >>>
> >>> See ?substitute and in particular the example highlighted
there.
> >>>
> >>> The technical details are explained in the R Language
Definition
> >>> manual. The key here is the use of promises for lay
evaluations. In
> >>> fact, the expression in the call *is* available within
the
> functions,
> >>> as is (a pointer to) the environment in which to evaluate
the
> >>> expression. That is how substitute() works. Specifically,
> quoting from
> >>> the manual,
> >>>
> >>> *****
> >>> It is possible to access the actual (not default)
expressions
> used as
> >>> arguments inside the function. The mechanism is
implemented via
> >>> promises. When a function is being evaluated the actual
expression
> >>> used as an argument is stored in the promise together
with a
> pointer
> >>> to the environment the function was called from. When
(if) the
> >>> argument is evaluated the stored expression is evaluated
in the
> >>> environment that the function was called from. Since only
a
> pointer to
> >>> the environment is used any changes made to that
environment
> will be
> >>> in effect during this evaluation. The resulting value is
then also
> >>> stored in a separate spot in the promise. Subsequent
evaluations
> >>> retrieve this stored value (a second evaluation is not
carried
> out).
> >>> Access to the unevaluated expression is also available
using
> >>> substitute.
> >>> ********
> >>>
> >>> -- Bert
> >>>
> >>>
> >>>
> >>>
> >>>>>
> >>>>> Rui Barradas
> >>>>>
> >>>>> You need either to call it as:
> >>>>>>
> >>>>>>
> >>>>>> myfun( mydf , "age")
> >>>>>>
> >>>>>>
> >>>>>> # Or:
> >>>>>>
> >>>>>> age <- "age"
> >>>>>> myfun( mydf, age)
> >>>>>>
> >>>>>> Unless your value of the `age`-named variable
was "age" in
> the calling
> >>>>>> environment (and you did not give us that
value in either of
> your postings),
> >>>>>> you would fail.
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org <mailto:R-help at
r-project.org> mailing
> list -- To UNSUBSCRIBE and more, see
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> >>>>> and provide commented, minimal, self-contained,
reproducible
> code.
>
> Confidentiality Statement:
> This email message, including any attachments, is for the sole use
> of the intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized use, disclosure or
> distribution is prohibited. If you are not the intended recipient,
> please contact the sender by reply email and destroy all copies of
> the original message.
> ______________________________________________
> R-help at r-project.org <mailto:R-help at r-project.org> mailing
list --
> To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> <https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
>
William Dunlap
2016-Dec-06 15:57 UTC
[R] Write a function that allows access to columns of a passeddataframe.
Note that library has another argument, character.only=TRUE/FALSE, to control whether the main argument should be regarded as a variable or a literal. I think you need two arguments to handle this. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Dec 6, 2016 at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Perhaps the best way is the one used by library(), where both > library(package) and library("package") work. It uses > as.charecter/substitute, not deparse/substitute, as follows. > > mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c( > 20,34,43,32,21)) > mydf > class(mydf) > str(mydf) > > myfun <- function(frame,var){ > yy <- as.character(substitute(var)) > frame[, yy] > } > > myfun(mydf, age) > myfun(mydf, "age") > > Rui Barradas > > Em 06-12-2016 15:03, William Dunlap escreveu: > >> I basically agree with Rui - using substitute will cause trouble. E.g., >> how >> would the user iterate over the columns, calling your function for each? >> for(column in dataFrame) func(column) >> would fail because dataFrame$column does not exist. You need to provide >> an extra argument to handle this case. something like the following: >> func <- function(df, >> columnAsName,, >> columnAsString = deparse(substitute(columnAsName))[1]) >> ... >> } >> The default value of columnAsString should also deal with the case that >> the user supplied something like log(Conc.) instead of Conc. >> >> I think that using a formula for the lazily evaluated argument >> (columnAsName) >> works well. The user then knows exactly how it gets evaluated. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com> >> >> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu >> <mailto:jsorkin at grecc.umaryland.edu>> wrote: >> >> Over my almost 50 years programming, I have come to believe that if >> one wants a program to be useful, one should write the program to do >> as much work as possible and demand as little as possible from the >> user of the program. In my opinion, one should not ask the person >> who uses my function to remember to put the name of the data frame >> column in quotation marks. The function should be written so that >> all that needs to be passed is the name of the column; the function >> should take care of the quotation marks. >> Jihny >> >> > John David Sorkin M.D., Ph.D. >> > Professor of Medicine >> > Chief, Biostatistics and Informatics >> > University of Maryland School of Medicine Division of Gerontology >> and Geriatric Medicine >> > Baltimore VA Medical Center >> > 10 North Greene Street >> > GRECC (BT/18/GR) >> > Baltimore, MD 21201-1524 >> > (Phone)410-605-7119 <tel:410-605-7119> >> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number >> above >> prior to faxing) >> >> >> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt >> <mailto:ruipbarradas at sapo.pt>> wrote: >> > >> > Hello, >> > >> > Just to say that I wouldn't write the function as John did. I >> would get >> > rid of all the deparse/substitute stuff and instinctively use a >> quoted >> > argument as a column name. Something like the following. >> > >> > myfun <- function(frame, var){ >> > [...] >> > col <- frame[, var] # or frame[[var]] >> > [...] >> > } >> > >> > myfun(mydf, "age") # much better, simpler, no promises. >> > >> > Rui Barradas >> > >> > Em 05-12-2016 21:49, Bert Gunter escreveu: >> >> Typo: "lazy evaluation" not "lay evaluation." >> >> >> >> -- Bert >> >> >> >> >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming >> along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> >> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>> Sorry, hit "Send" by mistake. >> >>> >> >>> Inline. >> >>> >> >>> >> >>> >> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>>> Inline. >> >>>> >> >>>> -- Bert >> >>>> >> >>>> >> >>>> Bert Gunter >> >>>> >> >>>> "The trouble with having an open mind is that people keep >> coming along >> >>>> and sticking things into it." >> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic >> strip ) >> >>>> >> >>>> >> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas >> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote: >> >>>>> Hello, >> >>>>> >> >>>>> Inline. >> >>>>> >> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu: >> >>>>>> >> >>>>>> >> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin >> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>> >> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> Rui, >> >>>>>>> I appreciate your suggestion, but eliminating the deparse >> statement does >> >>>>>>> not solve my problem. Do you have any other suggestions? >> See code below. >> >>>>>>> Thank you, >> >>>>>>> John >> >>>>>>> >> >>>>>>> >> >>>>>>> mydf <- >> >>>>>>> >> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c( >> 20,34,43,32,21)) >> >>>>>>> mydf >> >>>>>>> class(mydf) >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun <- function(frame,var){ >> >>>>>>> call <- match.call() >> >>>>>>> print(call) >> >>>>>>> >> >>>>>>> >> >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0) >> >>>>>>> print(indx) >> >>>>>>> if(indx[1]==0) stop("Function called without sufficient >> arguments!") >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the dataframe as a text >> string!\n") >> >>>>>>> #xx <- deparse(substitute(frame)) >> >>>>>>> print(xx) >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the column as a text string!\n") >> >>>>>>> #yy <- deparse(substitute(var)) >> >>>>>>> print(yy) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,var]) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,"var"]) >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> col <- xx[,"yy"] >> >>>>>>> >> >>>>>>> >> >>>>>>> # Nor does this work. >> >>>>>>> col <- xx[,yy] >> >>>>>>> print(col) >> >>>>>>> } >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun(mydf,age) >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> When you use that calling syntax, the system will supply the >> values of >> >>>>>> whatever the `age` variable contains. (And if there is no >> `age`-named >> >>>>>> object, you get an error at the time of the call to `myfun`. >> >>>>> >> >>>>> >> >>>>> Actually, no, which was very surprising to me but John's code >> worked (not >> >>>>> the function, the call). And with the change I've proposed, >> it worked >> >>>>> flawlessly. No errors. Why I don't know. >> >>> >> >>> See ?substitute and in particular the example highlighted there. >> >>> >> >>> The technical details are explained in the R Language Definition >> >>> manual. The key here is the use of promises for lay evaluations. >> In >> >>> fact, the expression in the call *is* available within the >> functions, >> >>> as is (a pointer to) the environment in which to evaluate the >> >>> expression. That is how substitute() works. Specifically, >> quoting from >> >>> the manual, >> >>> >> >>> ***** >> >>> It is possible to access the actual (not default) expressions >> used as >> >>> arguments inside the function. The mechanism is implemented via >> >>> promises. When a function is being evaluated the actual >> expression >> >>> used as an argument is stored in the promise together with a >> pointer >> >>> to the environment the function was called from. When (if) the >> >>> argument is evaluated the stored expression is evaluated in the >> >>> environment that the function was called from. Since only a >> pointer to >> >>> the environment is used any changes made to that environment >> will be >> >>> in effect during this evaluation. The resulting value is then >> also >> >>> stored in a separate spot in the promise. Subsequent evaluations >> >>> retrieve this stored value (a second evaluation is not carried >> out). >> >>> Access to the unevaluated expression is also available using >> >>> substitute. >> >>> ******** >> >>> >> >>> -- Bert >> >>> >> >>> >> >>> >> >>> >> >>>>> >> >>>>> Rui Barradas >> >>>>> >> >>>>> You need either to call it as: >> >>>>>> >> >>>>>> >> >>>>>> myfun( mydf , "age") >> >>>>>> >> >>>>>> >> >>>>>> # Or: >> >>>>>> >> >>>>>> age <- "age" >> >>>>>> myfun( mydf, age) >> >>>>>> >> >>>>>> Unless your value of the `age`-named variable was "age" in >> the calling >> >>>>>> environment (and you did not give us that value in either of >> your postings), >> >>>>>> you would fail. >> >>>>>> >> >>>>> >> >>>>> ______________________________________________ >> >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> >>>>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> >>>>> and provide commented, minimal, self-contained, reproducible >> code. >> >> Confidentiality Statement: >> This email message, including any attachments, is for the sole use >> of the intended recipient(s) and may contain confidential and >> privileged information. Any unauthorized use, disclosure or >> distribution is prohibited. If you are not the intended recipient, >> please contact the sender by reply email and destroy all copies of >> the original message. >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- >> To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> >>[[alternative HTML version deleted]]
David Winsemius
2016-Dec-06 18:41 UTC
[R] Write a function that allows access to columns of a passeddataframe.
> On Dec 6, 2016, at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote: > > Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows. > > mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) > mydf > class(mydf) > str(mydf) > > myfun <- function(frame,var){ > yy <- as.character(substitute(var)) > frame[, yy] > } > > myfun(mydf, age) > myfun(mydf, "age") > > Rui Barradas > > Em 06-12-2016 15:03, William Dunlap escreveu: >> I basically agree with Rui - using substitute will cause trouble. E.g., how >> would the user iterate over the columns, calling your function for each? >> for(column in dataFrame) func(column) >> would fail because dataFrame$column does not exist. You need to provide >> an extra argument to handle this case. something like the following: >> func <- function(df, >> columnAsName,, >> columnAsString = deparse(substitute(columnAsName))[1]) >> ... >> } >> The default value of columnAsString should also deal with the case that >> the user supplied something like log(Conc.) instead of Conc. >> >> I think that using a formula for the lazily evaluated argument >> (columnAsName) >> works well. The user then knows exactly how it gets evaluated.This would be an implementation that would support a multi-column extraction using a formula object: mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) mydf class(mydf) str(mydf) myfun <- function(frame, vars){ yy <- terms(vars) frame[, attr(yy, "term.labels")] } myfun(mydf, ~age+sex)>> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com> >> >> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu >> <mailto:jsorkin at grecc.umaryland.edu>> wrote: >> >> Over my almost 50 years programming, I have come to believe that if >> one wants a program to be useful, one should write the program to do >> as much work as possible and demand as little as possible from the >> user of the program. In my opinion, one should not ask the person >> who uses my function to remember to put the name of the data frame >> column in quotation marks. The function should be written so that >> all that needs to be passed is the name of the column; the function >> should take care of the quotation marks. >> Jihny >> >> > John David Sorkin M.D., Ph.D. >> > Professor of Medicine >> > Chief, Biostatistics and Informatics >> > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine >> > Baltimore VA Medical Center >> > 10 North Greene Street >> > GRECC (BT/18/GR) >> > Baltimore, MD 21201-1524 >> > (Phone)410-605-7119 <tel:410-605-7119> >> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above >> prior to faxing) >> >> >> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt >> <mailto:ruipbarradas at sapo.pt>> wrote: >> > >> > Hello, >> > >> > Just to say that I wouldn't write the function as John did. I >> would get >> > rid of all the deparse/substitute stuff and instinctively use a >> quoted >> > argument as a column name. Something like the following. >> > >> > myfun <- function(frame, var){ >> > [...] >> > col <- frame[, var] # or frame[[var]] >> > [...] >> > } >> > >> > myfun(mydf, "age") # much better, simpler, no promises. >> > >> > Rui Barradas >> > >> > Em 05-12-2016 21:49, Bert Gunter escreveu: >> >> Typo: "lazy evaluation" not "lay evaluation." >> >> >> >> -- Bert >> >> >> >> >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming >> along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> >> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>> Sorry, hit "Send" by mistake. >> >>> >> >>> Inline. >> >>> >> >>> >> >>> >> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>>> Inline. >> >>>> >> >>>> -- Bert >> >>>> >> >>>> >> >>>> Bert Gunter >> >>>> >> >>>> "The trouble with having an open mind is that people keep >> coming along >> >>>> and sticking things into it." >> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >>>> >> >>>> >> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas >> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote: >> >>>>> Hello, >> >>>>> >> >>>>> Inline. >> >>>>> >> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu: >> >>>>>> >> >>>>>> >> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin >> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> Rui, >> >>>>>>> I appreciate your suggestion, but eliminating the deparse >> statement does >> >>>>>>> not solve my problem. Do you have any other suggestions? >> See code below. >> >>>>>>> Thank you, >> >>>>>>> John >> >>>>>>> >> >>>>>>> >> >>>>>>> mydf <- >> >>>>>>> >> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) >> >>>>>>> mydf >> >>>>>>> class(mydf) >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun <- function(frame,var){ >> >>>>>>> call <- match.call() >> >>>>>>> print(call) >> >>>>>>> >> >>>>>>> >> >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0) >> >>>>>>> print(indx) >> >>>>>>> if(indx[1]==0) stop("Function called without sufficient >> arguments!") >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the dataframe as a text >> string!\n") >> >>>>>>> #xx <- deparse(substitute(frame)) >> >>>>>>> print(xx) >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the column as a text string!\n") >> >>>>>>> #yy <- deparse(substitute(var)) >> >>>>>>> print(yy) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,var]) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,"var"]) >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> col <- xx[,"yy"] >> >>>>>>> >> >>>>>>> >> >>>>>>> # Nor does this work. >> >>>>>>> col <- xx[,yy] >> >>>>>>> print(col) >> >>>>>>> } >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun(mydf,age) >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> When you use that calling syntax, the system will supply the >> values of >> >>>>>> whatever the `age` variable contains. (And if there is no >> `age`-named >> >>>>>> object, you get an error at the time of the call to `myfun`. >> >>>>> >> >>>>> >> >>>>> Actually, no, which was very surprising to me but John's code >> worked (not >> >>>>> the function, the call). And with the change I've proposed, >> it worked >> >>>>> flawlessly. No errors. Why I don't know. >> >>> >> >>> See ?substitute and in particular the example highlighted there. >> >>> >> >>> The technical details are explained in the R Language Definition >> >>> manual. The key here is the use of promises for lay evaluations. In >> >>> fact, the expression in the call *is* available within the >> functions, >> >>> as is (a pointer to) the environment in which to evaluate the >> >>> expression. That is how substitute() works. Specifically, >> quoting from >> >>> the manual, >> >>> >> >>> ***** >> >>> It is possible to access the actual (not default) expressions >> used as >> >>> arguments inside the function. The mechanism is implemented via >> >>> promises. When a function is being evaluated the actual expression >> >>> used as an argument is stored in the promise together with a >> pointer >> >>> to the environment the function was called from. When (if) the >> >>> argument is evaluated the stored expression is evaluated in the >> >>> environment that the function was called from. Since only a >> pointer to >> >>> the environment is used any changes made to that environment >> will be >> >>> in effect during this evaluation. The resulting value is then also >> >>> stored in a separate spot in the promise. Subsequent evaluations >> >>> retrieve this stored value (a second evaluation is not carried >> out). >> >>> Access to the unevaluated expression is also available using >> >>> substitute. >> >>> ******** >> >>> >> >>> -- Bert >> >>> >> >>> >> >>> >> >>> >> >>>>> >> >>>>> Rui Barradas >> >>>>> >> >>>>> You need either to call it as: >> >>>>>> >> >>>>>> >> >>>>>> myfun( mydf , "age") >> >>>>>> >> >>>>>> >> >>>>>> # Or: >> >>>>>> >> >>>>>> age <- "age" >> >>>>>> myfun( mydf, age) >> >>>>>> >> >>>>>> Unless your value of the `age`-named variable was "age" in >> the calling >> >>>>>> environment (and you did not give us that value in either of >> your postings), >> >>>>>> you would fail. >> >>>>>> >> >>>>> >> >>>>> ______________________________________________ >> >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> >>>>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> >>>>> and provide commented, minimal, self-contained, reproducible >> code. >> >> Confidentiality Statement: >> This email message, including any attachments, is for the sole use >> of the intended recipient(s) and may contain confidential and >> privileged information. Any unauthorized use, disclosure or >> distribution is prohibited. If you are not the intended recipient, >> please contact the sender by reply email and destroy all copies of >> the original message. >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- >> To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA