Ravi Varadhan
2010-Mar-05 23:22 UTC
[R] How to parse the arguments from a function call and evaluate them in a dataframe?
Hi, I would like to write a function which has the following syntax: myfn <- function(formula, ftime, fstatus, data) { # step 1: obtain terms in `formula' from dataframe `data' # step 2: obtain ftime from `data' # step 3: obtain fstatus from `data' # step 4: do model estimation # step 5: return results } The user would call this function as: myfn(formula=myform, ftime=myftime, fstatus=myfstatus, data=mydata) Where `myform' is a formula object; and the terms in `myform', and the variables `myftime' and `myfstatus' should be obtained from the dataframe `mydata'. I am getting tripped up in trying to figure out how to do the seemingly simple steps of 1, 2, and 3. I looked at the code for `lm', `coxph', `nls' etc, but they are too complicated for my understanding. Is there a simple way to accomplish this? Thanks very much, Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvaradhan at jhmi.edu
Thomas Lumley
2010-Mar-06 00:09 UTC
[R] How to parse the arguments from a function call and evaluate them in a dataframe?
On Fri, 5 Mar 2010, Ravi Varadhan wrote:> Hi, > > I would like to write a function which has the following syntax: > > myfn <- function(formula, ftime, fstatus, data) { > # step 1: obtain terms in `formula' from dataframe `data' > # step 2: obtain ftime from `data' > # step 3: obtain fstatus from `data' > # step 4: do model estimation > # step 5: return results > } > > The user would call this function as: > > myfn(formula=myform, ftime=myftime, fstatus=myfstatus, data=mydata) > > Where `myform' is a formula object; and the terms in `myform', and the variables `myftime' and `myfstatus' should be obtained from the dataframe `mydata'. > > I am getting tripped up in trying to figure out how to do the seemingly simple steps of 1, 2, and 3. >I don't think steps 2 and 3 are a good idea as written -- I would strongly advocate that all variables to be looked up in the data frame should be supplied as formulas so that it is clear they are not being evaluated according to the usual rules. Many existing functions work the way you suggest, but I still think it's unclear and makes it harder to use them in programs. Having said that, you can use mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "ftime", "fstatus"), names(mf), 0) mf <- mf[c(1, m)] mf$drop.unused.levels <- TRUE mf[[1]] <- as.name("model.frame") mf <- eval.parent(mf) to create a model frame that will contain the variables in the formula, and columns `(ftime)` and `(fstatus)` for the other arguments. If you use formulas for ftime and fstatus you would have to call model.frame() multiple times, which is a bit more work. You would also need to use na.action=na.pass() to let through any missing data and then remove missing data after you have all three variables. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle