I am trying to build a function that can accept variables for a regression. It would work something like this: --- # Y = my response variable (e.g. income) # X = my key predictor variable (e.g. education) # subY = a subsetting variable for Y (e.g. race) # subY.val = the value of the subsetting value that I want (e.g. ?black?) foo <- function(Y, X, subY, subY.val, dataset){ if(is.na(subY) == F) { Y <- paste(Y, ?[?, subY, ?==?, subY.val, ?]?) } FORMULA <- paste(Y ~ X) fit <- some.regression.tool(FORMULA, data=dataset) return(some.data.after.processing) } --- If I call this function with, foo(income, education, race, ?black?, my.dataset), I do not get the result that I need because the FORMULA is "income[race==black] ~ education? when what I need is ?income[race==?black?] ~ education?. How do I get the quotes to stay on ?black?? Or, is there a better way? Help appreciated. -- Brant [[alternative HTML version deleted]]
On 02.12.2015 06:11, Brant Inman wrote:> I am trying to build a function that can accept variables for a regression. It would work something like this: > > --- > # Y = my response variable (e.g. income) > # X = my key predictor variable (e.g. education) > # subY = a subsetting variable for Y (e.g. race) > # subY.val = the value of the subsetting value that I want (e.g. ?black?) > > foo <- function(Y, X, subY, subY.val, dataset){ > > if(is.na(subY) == F) {Not sure why this all is needed, but you can insert here: if(is.character(subY.val) || is.factor(subY.val)) subY.val <- shQuote(subY.val) Best, Uwe Ligges> Y <- paste(Y, ?[?, subY, ?==?, subY.val, ?]?) > } > FORMULA <- paste(Y ~ X) > fit <- some.regression.tool(FORMULA, data=dataset) > > return(some.data.after.processing) > } > --- > > If I call this function with, foo(income, education, race, ?black?, my.dataset), I do not get the result that I need because the FORMULA is "income[race==black] ~ education? when what I need is ?income[race==?black?] ~ education?. How do I get the quotes to stay on ?black?? Or, is there a better way? > > Help appreciated. > > -- > Brant > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
phgrosjean at sciviews.org
2015-Dec-02 12:10 UTC
[R] Passing variable names in quotes to a function
Your example and explanation are not complete, but I have the gut feeling that you could do all this both more efficiently *and* more R-ish. First of all, why would you pass Y and X separately, to ultimately build the Y ~ X formula within the body of your function? Secondly, it seems to me that subY and subY.val does something very similar to the subset argument in, say, lm(). Personally, I would write it like this: foo <- function(formula, data, subset) { if (!missing(subset)) data <- data[subset, ] fit <- some_regression_tool(formula, data = data) ## <more code> data_after_processing } with subset = subY == subY.val. Best, Philippe> On 02 Dec 2015, at 06:11, Brant Inman <brant.inman at me.com> wrote: > > I am trying to build a function that can accept variables for a regression. It would work something like this: > > --- > # Y = my response variable (e.g. income) > # X = my key predictor variable (e.g. education) > # subY = a subsetting variable for Y (e.g. race) > # subY.val = the value of the subsetting value that I want (e.g. ?black?) > > foo <- function(Y, X, subY, subY.val, dataset){ > > if(is.na(subY) == F) { > Y <- paste(Y, ?[?, subY, ?==?, subY.val, ?]?) > } > FORMULA <- paste(Y ~ X) > fit <- some.regression.tool(FORMULA, data=dataset) > > return(some.data.after.processing) > } > --- > > If I call this function with, foo(income, education, race, ?black?, my.dataset), I do not get the result that I need because the FORMULA is "income[race==black] ~ education? when what I need is ?income[race==?black?] ~ education?. How do I get the quotes to stay on ?black?? Or, is there a better way? > > Help appreciated. > > -- > Brant > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you for your response. Here is the problem that I find with your code (which I had tried). When you pass a value to the subset argument of the function, it will not hold the quotes on the subsetting variable?s value. For example, if I want the function to do Y[Z==?skinny?] so that we use only those values of Y where Z is equal to skinny, I need to be able to retain the quotes around skinny. If you try passing ?Z==?skinny?? to the function, it will remove the quotes and give you Z==skinny, which does not work in the subsetting code.> On Dec 2, 2015, at 7:10 AM, phgrosjean at sciviews.org wrote: > > Your example and explanation are not complete, but I have the gut feeling that you could do all this both more efficiently *and* more R-ish. > > First of all, why would you pass Y and X separately, to ultimately build the Y ~ X formula within the body of your function? > > Secondly, it seems to me that subY and subY.val does something very similar to the subset argument in, say, lm(). > > Personally, I would write it like this: > > foo <- function(formula, data, subset) { > if (!missing(subset)) > data <- data[subset, ] > fit <- some_regression_tool(formula, data = data) > > ## <more code> > > data_after_processing > } > > with subset = subY == subY.val. > > Best, > > Philippe > >> On 02 Dec 2015, at 06:11, Brant Inman <brant.inman at me.com> wrote: >> >> I am trying to build a function that can accept variables for a regression. It would work something like this: >> >> --- >> # Y = my response variable (e.g. income) >> # X = my key predictor variable (e.g. education) >> # subY = a subsetting variable for Y (e.g. race) >> # subY.val = the value of the subsetting value that I want (e.g. ?black?) >> >> foo <- function(Y, X, subY, subY.val, dataset){ >> >> if(is.na(subY) == F) { >> Y <- paste(Y, ?[?, subY, ?==?, subY.val, ?]?) >> } >> FORMULA <- paste(Y ~ X) >> fit <- some.regression.tool(FORMULA, data=dataset) >> >> return(some.data.after.processing) >> } >> --- >> >> If I call this function with, foo(income, education, race, ?black?, my.dataset), I do not get the result that I need because the FORMULA is "income[race==black] ~ education? when what I need is ?income[race==?black?] ~ education?. How do I get the quotes to stay on ?black?? Or, is there a better way? >> >> Help appreciated. >> >> -- >> Brant >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >