Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following, myfunction <- function(y,x,dataframe){ fit0 <- lm(y~x,data=dataframe) print (summary(fit0)) } # Run the function using dep and ind as dependent and independent variables. mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7)) myfunction(dep,ind) # Run the function using outcome and predictor as dependent and independent variables. newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) myfunction(outcome,predictor) John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) [[alternative HTML version deleted]]
Hi, I'm not sure if this is what you are after, but instead of defining arguments for elements of the formula why not simply pass your desired formula to your function? Cheers, Ben myfunction <- function(frmla,dataframe){ fit0 <- lm(frmla,data=dataframe) print (summary(fit0)) } # Run the function using dep and ind as dependent and independent variables. mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7)) myfunction(ind ~ dep, mydata) # Call: # lm(formula = frmla, data = dataframe) # Residuals: # 1 2 3 4 5 # 0.2 -0.3 0.2 -0.3 0.2 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -0.7000 0.3317 -2.111 0.125298 # dep 1.5000 0.1000 15.000 0.000643 *** # --- # Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 # Residual standard error: 0.3162 on 3 degrees of freedom # Multiple R-squared: 0.9868, Adjusted R-squared: 0.9825 # F-statistic: 225 on 1 and 3 DF, p-value: 0.0006431 # Run the function using outcome and predictor as dependent and independent variables. newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) myfunction(predictor ~ outcome, newdata) # # Call: # lm(formula = frmla, data = dataframe) # Residuals: # 1 2 3 4 5 # 0.2 -0.3 0.2 -0.3 0.2 # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -0.7000 0.3317 -2.111 0.125298 # outcome 1.5000 0.1000 15.000 0.000643 *** # --- # Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 # Residual standard error: 0.3162 on 3 degrees of freedom # Multiple R-squared: 0.9868, Adjusted R-squared: 0.9825 # F-statistic: 225 on 1 and 3 DF, p-value: 0.0006431> On May 8, 2019, at 9:22 PM, Sorkin, John <jsorkin at som.umaryland.edu> wrote: > > Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following, > > > myfunction <- function(y,x,dataframe){ > > fit0 <- lm(y~x,data=dataframe) > print (summary(fit0)) > } > > # Run the function using dep and ind as dependent and independent variables. > mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7)) > myfunction(dep,ind) > # Run the function using outcome and predictor as dependent and independent variables. > newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) > myfunction(outcome,predictor) > > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 bigelow.org Ecological Forecasting: eco.bigelow.org [[alternative HTML version deleted]]
Hello, There is a "standard" deparse/substitute trick that gets the names of the variables passed to a function. There are more sophisticated ways but maybe that is what you are looking for. myfunction <- function(y, x, dataframe){ y <- deparse(substitute(y)) x <- deparse(substitute(x)) fmla <- as.formula(paste(y, '~', x)) fit0 <- lm(fmla, data = dataframe) summary(fit0) } # Run the function using dep and ind as dependent and independent variables. mydata <- data.frame(dep = c(1,2,3,4,5),ind=c(1,2,4,5,7)) myfunction(dep, ind, mydata) # Run the function using outcome and predictor as dependent and independent variables. newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) myfunction(outcome, predictor, newdata) Note: your function has an argument 'dataframe' that you didn't use in any of the two calls. Hope this helps, Rui Barradas ?s 02:22 de 09/05/19, Sorkin, John escreveu:> Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following, > > > myfunction <- function(y,x,dataframe){ > > fit0 <- lm(y~x,data=dataframe) > print (summary(fit0)) > } > > # Run the function using dep and ind as dependent and independent variables. > mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7)) > myfunction(dep,ind) > # Run the function using outcome and predictor as dependent and independent variables. > newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) > myfunction(outcome,predictor) > > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello John, Others have commented on the first half of your question, but the second half of your question looks very much like R's built-in predict() functions:>?predict >?predict.lmBest Regards, Bill. W. Michels, Ph.D. On Wed, May 8, 2019 at 6:23 PM Sorkin, John <jsorkin at som.umaryland.edu> wrote:> > Can someone send me something I can read about passing parameters so I can understand how lm manages to have a dataframe passed to it, and use columns from the dataframe to set up a regression. I have looked at the code for lm and don't understand what I am reading. What I want to do is something like the following, > > > myfunction <- function(y,x,dataframe){ > > fit0 <- lm(y~x,data=dataframe) > print (summary(fit0)) > } > > # Run the function using dep and ind as dependent and independent variables. > mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7)) > myfunction(dep,ind) > # Run the function using outcome and predictor as dependent and independent variables. > newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) > myfunction(outcome,predictor) > > > > > > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I don't think previous responses have addressed the question, which appears to be: "How does R know to look in the "data" object for the variable names in the formula?" And, of course, I could be wrong -- in which case ignore all the following. My answer to that question is: it's quite complicated. I think you have to know about calls, function closures, evaluation environments, and the details of model.frame.lm -- and perhaps more. The following **might** be a start:> dat <- data.frame(x = 1:10, y = rnorm(10)) > > ## substitute() is used to return the unevaluated expression for the call > mc <- match.call(lm, call = substitute(lm(y~x,data = dat))) > class(mc)[1] "call"> as.list(mc)[[1]] lm $formula y ~ x $data dat Cheers, Bert Gunter On Thu, May 9, 2019 at 10:01 AM William Michels via R-help < r-help at r-project.org> wrote:> Hello John, > > Others have commented on the first half of your question, but the > second half of your question looks very much like R's built-in > predict() functions: > > >?predict > >?predict.lm > > Best Regards, > > Bill. > > W. Michels, Ph.D. > > > > On Wed, May 8, 2019 at 6:23 PM Sorkin, John <jsorkin at som.umaryland.edu> > wrote: > > > > Can someone send me something I can read about passing parameters so I > can understand how lm manages to have a dataframe passed to it, and use > columns from the dataframe to set up a regression. I have looked at the > code for lm and don't understand what I am reading. What I want to do is > something like the following, > > > > > > myfunction <- function(y,x,dataframe){ > > > > fit0 <- lm(y~x,data=dataframe) > > print (summary(fit0)) > > } > > > > # Run the function using dep and ind as dependent and independent > variables. > > mydata <- data.frame(dep=c(1,2,3,4,5),ind=c(1,2,4,5,7)) > > myfunction(dep,ind) > > # Run the function using outcome and predictor as dependent and > independent variables. > > newdata <- data.frame(outcome=c(1,2,3,4,5),predictor=c(1,2,4,5,7)) > > myfunction(outcome,predictor) > > > > > > > > > > > > John David Sorkin M.D., Ph.D. > > Professor of Medicine > > Chief, Biostatistics and Informatics > > University of Maryland School of Medicine Division of Gerontology and > Geriatric Medicine > > Baltimore VA Medical Center > > 10 North Greene Street > > GRECC (BT/18/GR) > > Baltimore, MD 21201-1524 > > (Phone) 410-605-7119 > > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]