New to R; please excuse me if this is a dumb question. I tried to RTFM; didn't help. I want to do a series of regressions over the columns in a data.frame, systematically varying the response variable and the the terms; and not necessarily including all the non-response columns. In my case, the columns are time series. I don't know if that makes a difference; it does mean I have to call lag() to offset non-response terms. I can not assume a specific number of columns in the data.frame; might be 3, might be 20. My central problem is that the formula given to lm() is different each time. For example, say a data.frame had columns with the following headings: height, weight, BP (blood pressure), and Cals (calorie intake per time frame). In that case, I'd need something like the following: lm(height ~ weight + BP + Cals) lm(height ~ weight + BP) lm(height ~ weight + Cals) lm(height ~ BP + Cals) lm(weight ~ height + BP) lm(weight ~ height + Cals) etc. In general, I'll have to read the header to get the argument labels. Do I have to write several functions, each taking a different number of arguments? I'd like to construct a string or list representing the varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp programmer where that part would be very simple. Anyone have a Lisp API for R? :-}] Thanks, chris Chris Elsaesser, PhD Principal Scientist, Machine Learning SPADAC Inc. 7921 Jones Branch Dr. Suite 600 McLean, VA 22102 703.371.7301 (m) 703.637.9421 (o)
Try this: lm(Sepal.Length ~., iris[1:3]) # or cn <- c("Sepal.Length", "Sepal.Width", "Petal.Length") lm(Sepal.Length ~., iris[cn]) On 5/17/07, Chris Elsaesser <chris.elsaesser at spadac.com> wrote:> New to R; please excuse me if this is a dumb question. I tried to RTFM; > didn't help. > > I want to do a series of regressions over the columns in a data.frame, > systematically varying the response variable and the the terms; and not > necessarily including all the non-response columns. In my case, the > columns are time series. I don't know if that makes a difference; it > does mean I have to call lag() to offset non-response terms. I can not > assume a specific number of columns in the data.frame; might be 3, might > be 20. > > My central problem is that the formula given to lm() is different each > time. For example, say a data.frame had columns with the following > headings: height, weight, BP (blood pressure), and Cals (calorie intake > per time frame). In that case, I'd need something like the following: > > lm(height ~ weight + BP + Cals) > lm(height ~ weight + BP) > lm(height ~ weight + Cals) > lm(height ~ BP + Cals) > lm(weight ~ height + BP) > lm(weight ~ height + Cals) > etc. > > In general, I'll have to read the header to get the argument labels. > > Do I have to write several functions, each taking a different number of > arguments? I'd like to construct a string or list representing the > varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp > programmer where that part would be very simple. Anyone have a Lisp API > for R? :-}] > > Thanks, > chris > > Chris Elsaesser, PhD > Principal Scientist, Machine Learning > SPADAC Inc. > 7921 Jones Branch Dr. Suite 600 > McLean, VA 22102 > > 703.371.7301 (m) > 703.637.9421 (o) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
> tmp <- data.frame(matrix(rnorm(40),10,4, dimnames=list(NULL, c("Y","A","B","C")))) > tmp > tmp.form <- paste(names(tmp)[1], paste(names(tmp)[-1], collapse=" + "), sep=" ~ ") > tmp.form > lm(tmp.form, tmp)The R language is powerful enough to most of the lisp-like things you may want to do. Rich
One way to do it is by giving a data frame with the right variables to lm() as the first argument each time. If lm() is given a data frame as the first argument, it will treat the first variable as the LHS and the rest as the RHS of the formula. As examples, you can do: lm(myData[c("height", "weight", "BP", "Cals")]) (The drawback to this is that the "formula" in the fitted model object looks a bit strange...) Andy From: Chris Elsaesser> > New to R; please excuse me if this is a dumb question. I > tried to RTFM; > didn't help. > > I want to do a series of regressions over the columns in a data.frame, > systematically varying the response variable and the the > terms; and not > necessarily including all the non-response columns. In my case, the > columns are time series. I don't know if that makes a difference; it > does mean I have to call lag() to offset non-response terms. I can not > assume a specific number of columns in the data.frame; might > be 3, might > be 20. > > My central problem is that the formula given to lm() is different each > time. For example, say a data.frame had columns with the following > headings: height, weight, BP (blood pressure), and Cals > (calorie intake > per time frame). In that case, I'd need something like the following: > > lm(height ~ weight + BP + Cals) > lm(height ~ weight + BP) > lm(height ~ weight + Cals) > lm(height ~ BP + Cals) > lm(weight ~ height + BP) > lm(weight ~ height + Cals) > etc. > > In general, I'll have to read the header to get the argument labels. > > Do I have to write several functions, each taking a different > number of > arguments? I'd like to construct a string or list representing the > varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp > programmer where that part would be very simple. Anyone have > a Lisp API > for R? :-}] > > Thanks, > chris > > Chris Elsaesser, PhD > Principal Scientist, Machine Learning > SPADAC Inc. > 7921 Jones Branch Dr. Suite 600 > McLean, VA 22102 > > 703.371.7301 (m) > 703.637.9421 (o) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
I was solving similar problem some time ago. Here is my script. I had a data frame, containing a response and several other variables, which were assumed predictors. I was trying to choose the best linear approximation. This approach now seems to me useless, please, don't blame me for that. However, the script might be useful to you. <code> library(forward) # dfr is a data.frame, that contains everything. # The response variable is named med5x # The following lines construct linear models for all possibe formulas # of the form # med5x~T+a+height # med5x~a+height+RH # T, a, RH, etc are the names of possible predictors inputs<-names(dfr)[c(10:30,1)] # dfr was a very large data frame, containing lot of variables. # here we have chosen only a subset of them. for(nc in 11:length(inputs)){ # the linear models were assumed to have at least 11 terms # now we are generating character vectors containing formulas. formulas<-paste("med5x",sep="~", fwd.combn(inputs,nc,fun=function(x){paste(x,collapse="+")})) # and then, are trying to fit every for(f in formulas){ lms<-lm(eval(parse(text=f)),data=dfr) cat(file="linear_models.txt",f,sum(residuals(lms)^2),"\n",sep="\t",append=TRUE) } } </code> Hmm, looking back, I see that this is rather inefficient script. For example, the inner cycle can easily be replaced with the apply function. Chris Elsaesser wrote:> > New to R; please excuse me if this is a dumb question. I tried to RTFM; > didn't help. > > I want to do a series of regressions over the columns in a data.frame, > systematically varying the response variable and the the terms; and not > necessarily including all the non-response columns. In my case, the > columns are time series. I don't know if that makes a difference; it > does mean I have to call lag() to offset non-response terms. I can not > assume a specific number of columns in the data.frame; might be 3, might > be 20. > > My central problem is that the formula given to lm() is different each > time. For example, say a data.frame had columns with the following > headings: height, weight, BP (blood pressure), and Cals (calorie intake > per time frame). In that case, I'd need something like the following: > > lm(height ~ weight + BP + Cals) > lm(height ~ weight + BP) > lm(height ~ weight + Cals) > lm(height ~ BP + Cals) > lm(weight ~ height + BP) > lm(weight ~ height + Cals) > etc. > > In general, I'll have to read the header to get the argument labels. > > Do I have to write several functions, each taking a different number of > arguments? I'd like to construct a string or list representing the > varialbes in the formula and apply lm(), so to say [I'm mainly a Lisp > programmer where that part would be very simple. Anyone have a Lisp API > for R? :-}] > >-- View this message in context: http://www.nabble.com/using-lm%28%29-with-variable-formula-tf3772540.html#a10716815 Sent from the R help mailing list archive at Nabble.com.