William Dunlap
2016-Dec-06 15:03 UTC
[R] Write a function that allows access to columns of a passeddataframe.
I basically agree with Rui - using substitute will cause trouble. E.g., how would the user iterate over the columns, calling your function for each? for(column in dataFrame) func(column) would fail because dataFrame$column does not exist. You need to provide an extra argument to handle this case. something like the following: func <- function(df, columnAsName,, columnAsString = deparse(substitute(columnAsName))[1]) ... } The default value of columnAsString should also deal with the case that the user supplied something like log(Conc.) instead of Conc. I think that using a formula for the lazily evaluated argument (columnAsName) works well. The user then knows exactly how it gets evaluated. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:> Over my almost 50 years programming, I have come to believe that if one > wants a program to be useful, one should write the program to do as much > work as possible and demand as little as possible from the user of the > program. In my opinion, one should not ask the person who uses my function > to remember to put the name of the data frame column in quotation marks. > The function should be written so that all that needs to be passed is the > name of the column; the function should take care of the quotation marks. > Jihny > > > John David Sorkin M.D., Ph.D. > > Professor of Medicine > > Chief, Biostatistics and Informatics > > University of Maryland School of Medicine Division of Gerontology and > Geriatric Medicine > > Baltimore VA Medical Center > > 10 North Greene Street > > GRECC (BT/18/GR) > > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > > > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote: > > > > Hello, > > > > Just to say that I wouldn't write the function as John did. I would get > > rid of all the deparse/substitute stuff and instinctively use a quoted > > argument as a column name. Something like the following. > > > > myfun <- function(frame, var){ > > [...] > > col <- frame[, var] # or frame[[var]] > > [...] > > } > > > > myfun(mydf, "age") # much better, simpler, no promises. > > > > Rui Barradas > > > > Em 05-12-2016 21:49, Bert Gunter escreveu: > >> Typo: "lazy evaluation" not "lay evaluation." > >> > >> -- Bert > >> > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming along > >> and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter <bgunter.4567 at gmail.com> > wrote: > >>> Sorry, hit "Send" by mistake. > >>> > >>> Inline. > >>> > >>> > >>> > >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter <bgunter.4567 at gmail.com> > wrote: > >>>> Inline. > >>>> > >>>> -- Bert > >>>> > >>>> > >>>> Bert Gunter > >>>> > >>>> "The trouble with having an open mind is that people keep coming along > >>>> and sticking things into it." > >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>>> > >>>> > >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas <ruipbarradas at sapo.pt> > wrote: > >>>>> Hello, > >>>>> > >>>>> Inline. > >>>>> > >>>>> Em 05-12-2016 17:09, David Winsemius escreveu: > >>>>>> > >>>>>> > >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin < > jsorkin at grecc.umaryland.edu> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Rui, > >>>>>>> I appreciate your suggestion, but eliminating the deparse > statement does > >>>>>>> not solve my problem. Do you have any other suggestions? See code > below. > >>>>>>> Thank you, > >>>>>>> John > >>>>>>> > >>>>>>> > >>>>>>> mydf <- > >>>>>>> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"), > age=c(20,34,43,32,21)) > >>>>>>> mydf > >>>>>>> class(mydf) > >>>>>>> > >>>>>>> > >>>>>>> myfun <- function(frame,var){ > >>>>>>> call <- match.call() > >>>>>>> print(call) > >>>>>>> > >>>>>>> > >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0) > >>>>>>> print(indx) > >>>>>>> if(indx[1]==0) stop("Function called without sufficient > arguments!") > >>>>>>> > >>>>>>> > >>>>>>> cat("I can get the name of the dataframe as a text string!\n") > >>>>>>> #xx <- deparse(substitute(frame)) > >>>>>>> print(xx) > >>>>>>> > >>>>>>> > >>>>>>> cat("I can get the name of the column as a text string!\n") > >>>>>>> #yy <- deparse(substitute(var)) > >>>>>>> print(yy) > >>>>>>> > >>>>>>> > >>>>>>> # This does not work. > >>>>>>> print(frame[,var]) > >>>>>>> > >>>>>>> > >>>>>>> # This does not work. > >>>>>>> print(frame[,"var"]) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> # This does not work. > >>>>>>> col <- xx[,"yy"] > >>>>>>> > >>>>>>> > >>>>>>> # Nor does this work. > >>>>>>> col <- xx[,yy] > >>>>>>> print(col) > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> myfun(mydf,age) > >>>>>> > >>>>>> > >>>>>> > >>>>>> When you use that calling syntax, the system will supply the values > of > >>>>>> whatever the `age` variable contains. (And if there is no > `age`-named > >>>>>> object, you get an error at the time of the call to `myfun`. > >>>>> > >>>>> > >>>>> Actually, no, which was very surprising to me but John's code worked > (not > >>>>> the function, the call). And with the change I've proposed, it worked > >>>>> flawlessly. No errors. Why I don't know. > >>> > >>> See ?substitute and in particular the example highlighted there. > >>> > >>> The technical details are explained in the R Language Definition > >>> manual. The key here is the use of promises for lay evaluations. In > >>> fact, the expression in the call *is* available within the functions, > >>> as is (a pointer to) the environment in which to evaluate the > >>> expression. That is how substitute() works. Specifically, quoting from > >>> the manual, > >>> > >>> ***** > >>> It is possible to access the actual (not default) expressions used as > >>> arguments inside the function. The mechanism is implemented via > >>> promises. When a function is being evaluated the actual expression > >>> used as an argument is stored in the promise together with a pointer > >>> to the environment the function was called from. When (if) the > >>> argument is evaluated the stored expression is evaluated in the > >>> environment that the function was called from. Since only a pointer to > >>> the environment is used any changes made to that environment will be > >>> in effect during this evaluation. The resulting value is then also > >>> stored in a separate spot in the promise. Subsequent evaluations > >>> retrieve this stored value (a second evaluation is not carried out). > >>> Access to the unevaluated expression is also available using > >>> substitute. > >>> ******** > >>> > >>> -- Bert > >>> > >>> > >>> > >>> > >>>>> > >>>>> Rui Barradas > >>>>> > >>>>> You need either to call it as: > >>>>>> > >>>>>> > >>>>>> myfun( mydf , "age") > >>>>>> > >>>>>> > >>>>>> # Or: > >>>>>> > >>>>>> age <- "age" > >>>>>> myfun( mydf, age) > >>>>>> > >>>>>> Unless your value of the `age`-named variable was "age" in the > calling > >>>>>> environment (and you did not give us that value in either of your > postings), > >>>>>> you would fail. > >>>>>> > >>>>> > >>>>> ______________________________________________ > >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:16}}
Rui Barradas
2016-Dec-06 15:33 UTC
[R] Write a function that allows access to columns of a passeddataframe.
Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows. mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) mydf class(mydf) str(mydf) myfun <- function(frame,var){ yy <- as.character(substitute(var)) frame[, yy] } myfun(mydf, age) myfun(mydf, "age") Rui Barradas Em 06-12-2016 15:03, William Dunlap escreveu:> I basically agree with Rui - using substitute will cause trouble. E.g., how > would the user iterate over the columns, calling your function for each? > for(column in dataFrame) func(column) > would fail because dataFrame$column does not exist. You need to provide > an extra argument to handle this case. something like the following: > func <- function(df, > columnAsName,, > columnAsString = deparse(substitute(columnAsName))[1]) > ... > } > The default value of columnAsString should also deal with the case that > the user supplied something like log(Conc.) instead of Conc. > > I think that using a formula for the lazily evaluated argument > (columnAsName) > works well. The user then knows exactly how it gets evaluated. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu > <mailto:jsorkin at grecc.umaryland.edu>> wrote: > > Over my almost 50 years programming, I have come to believe that if > one wants a program to be useful, one should write the program to do > as much work as possible and demand as little as possible from the > user of the program. In my opinion, one should not ask the person > who uses my function to remember to put the name of the data frame > column in quotation marks. The function should be written so that > all that needs to be passed is the name of the column; the function > should take care of the quotation marks. > Jihny > > > John David Sorkin M.D., Ph.D. > > Professor of Medicine > > Chief, Biostatistics and Informatics > > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > > Baltimore VA Medical Center > > 10 North Greene Street > > GRECC (BT/18/GR) > > Baltimore, MD 21201-1524 > > (Phone)410-605-7119 <tel:410-605-7119> > > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above > prior to faxing) > > > > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt > <mailto:ruipbarradas at sapo.pt>> wrote: > > > > Hello, > > > > Just to say that I wouldn't write the function as John did. I > would get > > rid of all the deparse/substitute stuff and instinctively use a > quoted > > argument as a column name. Something like the following. > > > > myfun <- function(frame, var){ > > [...] > > col <- frame[, var] # or frame[[var]] > > [...] > > } > > > > myfun(mydf, "age") # much better, simpler, no promises. > > > > Rui Barradas > > > > Em 05-12-2016 21:49, Bert Gunter escreveu: > >> Typo: "lazy evaluation" not "lay evaluation." > >> > >> -- Bert > >> > >> > >> > >> Bert Gunter > >> > >> "The trouble with having an open mind is that people keep coming > along > >> and sticking things into it." > >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >> > >> > >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter > <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: > >>> Sorry, hit "Send" by mistake. > >>> > >>> Inline. > >>> > >>> > >>> > >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter > <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: > >>>> Inline. > >>>> > >>>> -- Bert > >>>> > >>>> > >>>> Bert Gunter > >>>> > >>>> "The trouble with having an open mind is that people keep > coming along > >>>> and sticking things into it." > >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>>> > >>>> > >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas > <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote: > >>>>> Hello, > >>>>> > >>>>> Inline. > >>>>> > >>>>> Em 05-12-2016 17:09, David Winsemius escreveu: > >>>>>> > >>>>>> > >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin > <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Rui, > >>>>>>> I appreciate your suggestion, but eliminating the deparse > statement does > >>>>>>> not solve my problem. Do you have any other suggestions? > See code below. > >>>>>>> Thank you, > >>>>>>> John > >>>>>>> > >>>>>>> > >>>>>>> mydf <- > >>>>>>> > data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) > >>>>>>> mydf > >>>>>>> class(mydf) > >>>>>>> > >>>>>>> > >>>>>>> myfun <- function(frame,var){ > >>>>>>> call <- match.call() > >>>>>>> print(call) > >>>>>>> > >>>>>>> > >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0) > >>>>>>> print(indx) > >>>>>>> if(indx[1]==0) stop("Function called without sufficient > arguments!") > >>>>>>> > >>>>>>> > >>>>>>> cat("I can get the name of the dataframe as a text > string!\n") > >>>>>>> #xx <- deparse(substitute(frame)) > >>>>>>> print(xx) > >>>>>>> > >>>>>>> > >>>>>>> cat("I can get the name of the column as a text string!\n") > >>>>>>> #yy <- deparse(substitute(var)) > >>>>>>> print(yy) > >>>>>>> > >>>>>>> > >>>>>>> # This does not work. > >>>>>>> print(frame[,var]) > >>>>>>> > >>>>>>> > >>>>>>> # This does not work. > >>>>>>> print(frame[,"var"]) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> # This does not work. > >>>>>>> col <- xx[,"yy"] > >>>>>>> > >>>>>>> > >>>>>>> # Nor does this work. > >>>>>>> col <- xx[,yy] > >>>>>>> print(col) > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> myfun(mydf,age) > >>>>>> > >>>>>> > >>>>>> > >>>>>> When you use that calling syntax, the system will supply the > values of > >>>>>> whatever the `age` variable contains. (And if there is no > `age`-named > >>>>>> object, you get an error at the time of the call to `myfun`. > >>>>> > >>>>> > >>>>> Actually, no, which was very surprising to me but John's code > worked (not > >>>>> the function, the call). And with the change I've proposed, > it worked > >>>>> flawlessly. No errors. Why I don't know. > >>> > >>> See ?substitute and in particular the example highlighted there. > >>> > >>> The technical details are explained in the R Language Definition > >>> manual. The key here is the use of promises for lay evaluations. In > >>> fact, the expression in the call *is* available within the > functions, > >>> as is (a pointer to) the environment in which to evaluate the > >>> expression. That is how substitute() works. Specifically, > quoting from > >>> the manual, > >>> > >>> ***** > >>> It is possible to access the actual (not default) expressions > used as > >>> arguments inside the function. The mechanism is implemented via > >>> promises. When a function is being evaluated the actual expression > >>> used as an argument is stored in the promise together with a > pointer > >>> to the environment the function was called from. When (if) the > >>> argument is evaluated the stored expression is evaluated in the > >>> environment that the function was called from. Since only a > pointer to > >>> the environment is used any changes made to that environment > will be > >>> in effect during this evaluation. The resulting value is then also > >>> stored in a separate spot in the promise. Subsequent evaluations > >>> retrieve this stored value (a second evaluation is not carried > out). > >>> Access to the unevaluated expression is also available using > >>> substitute. > >>> ******** > >>> > >>> -- Bert > >>> > >>> > >>> > >>> > >>>>> > >>>>> Rui Barradas > >>>>> > >>>>> You need either to call it as: > >>>>>> > >>>>>> > >>>>>> myfun( mydf , "age") > >>>>>> > >>>>>> > >>>>>> # Or: > >>>>>> > >>>>>> age <- "age" > >>>>>> myfun( mydf, age) > >>>>>> > >>>>>> Unless your value of the `age`-named variable was "age" in > the calling > >>>>>> environment (and you did not give us that value in either of > your postings), > >>>>>> you would fail. > >>>>>> > >>>>> > >>>>> ______________________________________________ > >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing > list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > >>>>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > >>>>> and provide commented, minimal, self-contained, reproducible > code. > > Confidentiality Statement: > This email message, including any attachments, is for the sole use > of the intended recipient(s) and may contain confidential and > privileged information. Any unauthorized use, disclosure or > distribution is prohibited. If you are not the intended recipient, > please contact the sender by reply email and destroy all copies of > the original message. > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > >
William Dunlap
2016-Dec-06 15:57 UTC
[R] Write a function that allows access to columns of a passeddataframe.
Note that library has another argument, character.only=TRUE/FALSE, to control whether the main argument should be regarded as a variable or a literal. I think you need two arguments to handle this. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Dec 6, 2016 at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Perhaps the best way is the one used by library(), where both > library(package) and library("package") work. It uses > as.charecter/substitute, not deparse/substitute, as follows. > > mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c( > 20,34,43,32,21)) > mydf > class(mydf) > str(mydf) > > myfun <- function(frame,var){ > yy <- as.character(substitute(var)) > frame[, yy] > } > > myfun(mydf, age) > myfun(mydf, "age") > > Rui Barradas > > Em 06-12-2016 15:03, William Dunlap escreveu: > >> I basically agree with Rui - using substitute will cause trouble. E.g., >> how >> would the user iterate over the columns, calling your function for each? >> for(column in dataFrame) func(column) >> would fail because dataFrame$column does not exist. You need to provide >> an extra argument to handle this case. something like the following: >> func <- function(df, >> columnAsName,, >> columnAsString = deparse(substitute(columnAsName))[1]) >> ... >> } >> The default value of columnAsString should also deal with the case that >> the user supplied something like log(Conc.) instead of Conc. >> >> I think that using a formula for the lazily evaluated argument >> (columnAsName) >> works well. The user then knows exactly how it gets evaluated. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com> >> >> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu >> <mailto:jsorkin at grecc.umaryland.edu>> wrote: >> >> Over my almost 50 years programming, I have come to believe that if >> one wants a program to be useful, one should write the program to do >> as much work as possible and demand as little as possible from the >> user of the program. In my opinion, one should not ask the person >> who uses my function to remember to put the name of the data frame >> column in quotation marks. The function should be written so that >> all that needs to be passed is the name of the column; the function >> should take care of the quotation marks. >> Jihny >> >> > John David Sorkin M.D., Ph.D. >> > Professor of Medicine >> > Chief, Biostatistics and Informatics >> > University of Maryland School of Medicine Division of Gerontology >> and Geriatric Medicine >> > Baltimore VA Medical Center >> > 10 North Greene Street >> > GRECC (BT/18/GR) >> > Baltimore, MD 21201-1524 >> > (Phone)410-605-7119 <tel:410-605-7119> >> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number >> above >> prior to faxing) >> >> >> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt >> <mailto:ruipbarradas at sapo.pt>> wrote: >> > >> > Hello, >> > >> > Just to say that I wouldn't write the function as John did. I >> would get >> > rid of all the deparse/substitute stuff and instinctively use a >> quoted >> > argument as a column name. Something like the following. >> > >> > myfun <- function(frame, var){ >> > [...] >> > col <- frame[, var] # or frame[[var]] >> > [...] >> > } >> > >> > myfun(mydf, "age") # much better, simpler, no promises. >> > >> > Rui Barradas >> > >> > Em 05-12-2016 21:49, Bert Gunter escreveu: >> >> Typo: "lazy evaluation" not "lay evaluation." >> >> >> >> -- Bert >> >> >> >> >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming >> along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> >> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>> Sorry, hit "Send" by mistake. >> >>> >> >>> Inline. >> >>> >> >>> >> >>> >> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>>> Inline. >> >>>> >> >>>> -- Bert >> >>>> >> >>>> >> >>>> Bert Gunter >> >>>> >> >>>> "The trouble with having an open mind is that people keep >> coming along >> >>>> and sticking things into it." >> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic >> strip ) >> >>>> >> >>>> >> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas >> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote: >> >>>>> Hello, >> >>>>> >> >>>>> Inline. >> >>>>> >> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu: >> >>>>>> >> >>>>>> >> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin >> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>> >> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> Rui, >> >>>>>>> I appreciate your suggestion, but eliminating the deparse >> statement does >> >>>>>>> not solve my problem. Do you have any other suggestions? >> See code below. >> >>>>>>> Thank you, >> >>>>>>> John >> >>>>>>> >> >>>>>>> >> >>>>>>> mydf <- >> >>>>>>> >> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c( >> 20,34,43,32,21)) >> >>>>>>> mydf >> >>>>>>> class(mydf) >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun <- function(frame,var){ >> >>>>>>> call <- match.call() >> >>>>>>> print(call) >> >>>>>>> >> >>>>>>> >> >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0) >> >>>>>>> print(indx) >> >>>>>>> if(indx[1]==0) stop("Function called without sufficient >> arguments!") >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the dataframe as a text >> string!\n") >> >>>>>>> #xx <- deparse(substitute(frame)) >> >>>>>>> print(xx) >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the column as a text string!\n") >> >>>>>>> #yy <- deparse(substitute(var)) >> >>>>>>> print(yy) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,var]) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,"var"]) >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> col <- xx[,"yy"] >> >>>>>>> >> >>>>>>> >> >>>>>>> # Nor does this work. >> >>>>>>> col <- xx[,yy] >> >>>>>>> print(col) >> >>>>>>> } >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun(mydf,age) >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> When you use that calling syntax, the system will supply the >> values of >> >>>>>> whatever the `age` variable contains. (And if there is no >> `age`-named >> >>>>>> object, you get an error at the time of the call to `myfun`. >> >>>>> >> >>>>> >> >>>>> Actually, no, which was very surprising to me but John's code >> worked (not >> >>>>> the function, the call). And with the change I've proposed, >> it worked >> >>>>> flawlessly. No errors. Why I don't know. >> >>> >> >>> See ?substitute and in particular the example highlighted there. >> >>> >> >>> The technical details are explained in the R Language Definition >> >>> manual. The key here is the use of promises for lay evaluations. >> In >> >>> fact, the expression in the call *is* available within the >> functions, >> >>> as is (a pointer to) the environment in which to evaluate the >> >>> expression. That is how substitute() works. Specifically, >> quoting from >> >>> the manual, >> >>> >> >>> ***** >> >>> It is possible to access the actual (not default) expressions >> used as >> >>> arguments inside the function. The mechanism is implemented via >> >>> promises. When a function is being evaluated the actual >> expression >> >>> used as an argument is stored in the promise together with a >> pointer >> >>> to the environment the function was called from. When (if) the >> >>> argument is evaluated the stored expression is evaluated in the >> >>> environment that the function was called from. Since only a >> pointer to >> >>> the environment is used any changes made to that environment >> will be >> >>> in effect during this evaluation. The resulting value is then >> also >> >>> stored in a separate spot in the promise. Subsequent evaluations >> >>> retrieve this stored value (a second evaluation is not carried >> out). >> >>> Access to the unevaluated expression is also available using >> >>> substitute. >> >>> ******** >> >>> >> >>> -- Bert >> >>> >> >>> >> >>> >> >>> >> >>>>> >> >>>>> Rui Barradas >> >>>>> >> >>>>> You need either to call it as: >> >>>>>> >> >>>>>> >> >>>>>> myfun( mydf , "age") >> >>>>>> >> >>>>>> >> >>>>>> # Or: >> >>>>>> >> >>>>>> age <- "age" >> >>>>>> myfun( mydf, age) >> >>>>>> >> >>>>>> Unless your value of the `age`-named variable was "age" in >> the calling >> >>>>>> environment (and you did not give us that value in either of >> your postings), >> >>>>>> you would fail. >> >>>>>> >> >>>>> >> >>>>> ______________________________________________ >> >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> >>>>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> >>>>> and provide commented, minimal, self-contained, reproducible >> code. >> >> Confidentiality Statement: >> This email message, including any attachments, is for the sole use >> of the intended recipient(s) and may contain confidential and >> privileged information. Any unauthorized use, disclosure or >> distribution is prohibited. If you are not the intended recipient, >> please contact the sender by reply email and destroy all copies of >> the original message. >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- >> To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> >>[[alternative HTML version deleted]]
David Winsemius
2016-Dec-06 18:41 UTC
[R] Write a function that allows access to columns of a passeddataframe.
> On Dec 6, 2016, at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote: > > Perhaps the best way is the one used by library(), where both library(package) and library("package") work. It uses as.charecter/substitute, not deparse/substitute, as follows. > > mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) > mydf > class(mydf) > str(mydf) > > myfun <- function(frame,var){ > yy <- as.character(substitute(var)) > frame[, yy] > } > > myfun(mydf, age) > myfun(mydf, "age") > > Rui Barradas > > Em 06-12-2016 15:03, William Dunlap escreveu: >> I basically agree with Rui - using substitute will cause trouble. E.g., how >> would the user iterate over the columns, calling your function for each? >> for(column in dataFrame) func(column) >> would fail because dataFrame$column does not exist. You need to provide >> an extra argument to handle this case. something like the following: >> func <- function(df, >> columnAsName,, >> columnAsString = deparse(substitute(columnAsName))[1]) >> ... >> } >> The default value of columnAsString should also deal with the case that >> the user supplied something like log(Conc.) instead of Conc. >> >> I think that using a formula for the lazily evaluated argument >> (columnAsName) >> works well. The user then knows exactly how it gets evaluated.This would be an implementation that would support a multi-column extraction using a formula object: mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) mydf class(mydf) str(mydf) myfun <- function(frame, vars){ yy <- terms(vars) frame[, attr(yy, "term.labels")] } myfun(mydf, ~age+sex)>> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com> >> >> On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu >> <mailto:jsorkin at grecc.umaryland.edu>> wrote: >> >> Over my almost 50 years programming, I have come to believe that if >> one wants a program to be useful, one should write the program to do >> as much work as possible and demand as little as possible from the >> user of the program. In my opinion, one should not ask the person >> who uses my function to remember to put the name of the data frame >> column in quotation marks. The function should be written so that >> all that needs to be passed is the name of the column; the function >> should take care of the quotation marks. >> Jihny >> >> > John David Sorkin M.D., Ph.D. >> > Professor of Medicine >> > Chief, Biostatistics and Informatics >> > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine >> > Baltimore VA Medical Center >> > 10 North Greene Street >> > GRECC (BT/18/GR) >> > Baltimore, MD 21201-1524 >> > (Phone)410-605-7119 <tel:410-605-7119> >> > (Fax)410-605-7913 <tel:410-605-7913> (Please call phone number above >> prior to faxing) >> >> >> > On Dec 6, 2016, at 3:17 AM, Rui Barradas <ruipbarradas at sapo.pt >> <mailto:ruipbarradas at sapo.pt>> wrote: >> > >> > Hello, >> > >> > Just to say that I wouldn't write the function as John did. I >> would get >> > rid of all the deparse/substitute stuff and instinctively use a >> quoted >> > argument as a column name. Something like the following. >> > >> > myfun <- function(frame, var){ >> > [...] >> > col <- frame[, var] # or frame[[var]] >> > [...] >> > } >> > >> > myfun(mydf, "age") # much better, simpler, no promises. >> > >> > Rui Barradas >> > >> > Em 05-12-2016 21:49, Bert Gunter escreveu: >> >> Typo: "lazy evaluation" not "lay evaluation." >> >> >> >> -- Bert >> >> >> >> >> >> >> >> Bert Gunter >> >> >> >> "The trouble with having an open mind is that people keep coming >> along >> >> and sticking things into it." >> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> >> >> >>> On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>> Sorry, hit "Send" by mistake. >> >>> >> >>> Inline. >> >>> >> >>> >> >>> >> >>>> On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter >> <bgunter.4567 at gmail.com <mailto:bgunter.4567 at gmail.com>> wrote: >> >>>> Inline. >> >>>> >> >>>> -- Bert >> >>>> >> >>>> >> >>>> Bert Gunter >> >>>> >> >>>> "The trouble with having an open mind is that people keep >> coming along >> >>>> and sticking things into it." >> >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >>>> >> >>>> >> >>>>> On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas >> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>> wrote: >> >>>>> Hello, >> >>>>> >> >>>>> Inline. >> >>>>> >> >>>>> Em 05-12-2016 17:09, David Winsemius escreveu: >> >>>>>> >> >>>>>> >> >>>>>>> On Dec 5, 2016, at 7:29 AM, John Sorkin >> <jsorkin at grecc.umaryland.edu <mailto:jsorkin at grecc.umaryland.edu>> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> Rui, >> >>>>>>> I appreciate your suggestion, but eliminating the deparse >> statement does >> >>>>>>> not solve my problem. Do you have any other suggestions? >> See code below. >> >>>>>>> Thank you, >> >>>>>>> John >> >>>>>>> >> >>>>>>> >> >>>>>>> mydf <- >> >>>>>>> >> data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21)) >> >>>>>>> mydf >> >>>>>>> class(mydf) >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun <- function(frame,var){ >> >>>>>>> call <- match.call() >> >>>>>>> print(call) >> >>>>>>> >> >>>>>>> >> >>>>>>> indx <- match(c("frame","var"),names(call),nomatch=0) >> >>>>>>> print(indx) >> >>>>>>> if(indx[1]==0) stop("Function called without sufficient >> arguments!") >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the dataframe as a text >> string!\n") >> >>>>>>> #xx <- deparse(substitute(frame)) >> >>>>>>> print(xx) >> >>>>>>> >> >>>>>>> >> >>>>>>> cat("I can get the name of the column as a text string!\n") >> >>>>>>> #yy <- deparse(substitute(var)) >> >>>>>>> print(yy) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,var]) >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> print(frame[,"var"]) >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> # This does not work. >> >>>>>>> col <- xx[,"yy"] >> >>>>>>> >> >>>>>>> >> >>>>>>> # Nor does this work. >> >>>>>>> col <- xx[,yy] >> >>>>>>> print(col) >> >>>>>>> } >> >>>>>>> >> >>>>>>> >> >>>>>>> myfun(mydf,age) >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> When you use that calling syntax, the system will supply the >> values of >> >>>>>> whatever the `age` variable contains. (And if there is no >> `age`-named >> >>>>>> object, you get an error at the time of the call to `myfun`. >> >>>>> >> >>>>> >> >>>>> Actually, no, which was very surprising to me but John's code >> worked (not >> >>>>> the function, the call). And with the change I've proposed, >> it worked >> >>>>> flawlessly. No errors. Why I don't know. >> >>> >> >>> See ?substitute and in particular the example highlighted there. >> >>> >> >>> The technical details are explained in the R Language Definition >> >>> manual. The key here is the use of promises for lay evaluations. In >> >>> fact, the expression in the call *is* available within the >> functions, >> >>> as is (a pointer to) the environment in which to evaluate the >> >>> expression. That is how substitute() works. Specifically, >> quoting from >> >>> the manual, >> >>> >> >>> ***** >> >>> It is possible to access the actual (not default) expressions >> used as >> >>> arguments inside the function. The mechanism is implemented via >> >>> promises. When a function is being evaluated the actual expression >> >>> used as an argument is stored in the promise together with a >> pointer >> >>> to the environment the function was called from. When (if) the >> >>> argument is evaluated the stored expression is evaluated in the >> >>> environment that the function was called from. Since only a >> pointer to >> >>> the environment is used any changes made to that environment >> will be >> >>> in effect during this evaluation. The resulting value is then also >> >>> stored in a separate spot in the promise. Subsequent evaluations >> >>> retrieve this stored value (a second evaluation is not carried >> out). >> >>> Access to the unevaluated expression is also available using >> >>> substitute. >> >>> ******** >> >>> >> >>> -- Bert >> >>> >> >>> >> >>> >> >>> >> >>>>> >> >>>>> Rui Barradas >> >>>>> >> >>>>> You need either to call it as: >> >>>>>> >> >>>>>> >> >>>>>> myfun( mydf , "age") >> >>>>>> >> >>>>>> >> >>>>>> # Or: >> >>>>>> >> >>>>>> age <- "age" >> >>>>>> myfun( mydf, age) >> >>>>>> >> >>>>>> Unless your value of the `age`-named variable was "age" in >> the calling >> >>>>>> environment (and you did not give us that value in either of >> your postings), >> >>>>>> you would fail. >> >>>>>> >> >>>>> >> >>>>> ______________________________________________ >> >>>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> >>>>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> >>>>> and provide commented, minimal, self-contained, reproducible >> code. >> >> Confidentiality Statement: >> This email message, including any attachments, is for the sole use >> of the intended recipient(s) and may contain confidential and >> privileged information. Any unauthorized use, disclosure or >> distribution is prohibited. If you are not the intended recipient, >> please contact the sender by reply email and destroy all copies of >> the original message. >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- >> To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA