Hi, I have a query about the following: CONTEXT: if we define: eq <- y ~ x1 + x2 then is.language(eq) returns true. this works perfectly with commands such as lm(eq,data=my.data) THE PROBLEM: Now I have a big data set with about 2000 independent variables. So I tried to automate this. I can pick off the names of the variables and insert a plus in between them and get a string. Thus I have eq<- y ~ " x1 + x2 + ... +x2000" or eq<-"y ~ " x1 + x2 + ... +x2000" from either case how can I typecast eq into a langugae object and get the lm command to work ? Does anyone have any suggestions? thanks Arnab.
Arnab mukherji wrote:> Hi, > > I have a query about the following: > > CONTEXT: if we define: > > eq <- y ~ x1 + x2 > > then is.language(eq) returns true. this works perfectly with commands > such as lm(eq,data=my.data) > > THE PROBLEM: Now I have a big data set with about 2000 independent > variables. So I tried to automate this. I can pick off the names of > the variables and insert a plus in between them and get a string. > Thus I have > > eq<- y ~ " x1 + x2 + ... +x2000" or eq<-"y ~ " x1 + x2 + ... +x2000" > > from either case how can I typecast eq into a langugae object and get > the lm command to work ? > > Does anyone have any suggestions? > > thanks > > Arnab.Arnab, Once you have your full formula string constructed you can use: eq <- eval(parse(text = string)) This will put the model formula into eq. For example try: eq <- eval(parse(text = "y ~ x1 + x2")) See ?eval and ?parse for more information. Hope that helps. Regards, Marc Schwartz
In this case a simler answer is that you want a formula, so use as.formula:> eq <- "y ~ x1 + x2 +x2000" > as.formula(eq)y ~ x1 + x2 + x2000 once the quotation marks are in the right place. It is much simpler to use lm(y ~ ., data=foo) ! A caveat: you may hit some limits on expression size if you try to put 2000 variables in a formula, and almost certainly you will hit computational limits if you try to do a regression on it, as you need n >> p = 2000, and the key computation is (I think) O(np^2). On Fri, 7 Feb 2003, Marc Schwartz wrote:> Arnab mukherji wrote: > > Hi, > > > > I have a query about the following: > > > > CONTEXT: if we define: > > > > eq <- y ~ x1 + x2 > > > > then is.language(eq) returns true. this works perfectly with commands > > such as lm(eq,data=my.data) > > > > THE PROBLEM: Now I have a big data set with about 2000 independent > > variables. So I tried to automate this. I can pick off the names of > > the variables and insert a plus in between them and get a string. > > Thus I have > > > > eq<- y ~ " x1 + x2 + ... +x2000" or eq<-"y ~ " x1 + x2 + ... +x2000" > > > > from either case how can I typecast eq into a langugae object and get > > the lm command to work ? > > > > Does anyone have any suggestions? > > > > thanks > > > > Arnab. > > > Arnab, > > Once you have your full formula string constructed you can use: > > eq <- eval(parse(text = string)) > > This will put the model formula into eq. > > > For example try: > > eq <- eval(parse(text = "y ~ x1 + x2")) > > > See ?eval and ?parse for more information. > > Hope that helps. > > > Regards, > > Marc Schwartz > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.math.ethz.ch/mailman/listinfo/r-help >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, stats.ox.ac.uk/~ripley University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 8 Feb 2003 at 2:13, Arnab mukherji wrote:> Hi, > > I have a query about the following: > > CONTEXT: > if we define: > > eq <- y ~ x1 + x2 > > then is.language(eq) returns true. > this works perfectly with commands such as > lm(eq,data=my.data) > > THE PROBLEM: > Now I have a big data set with about 2000 independent variables. So I tried to automate this. I can pick off the names of the variables and insert a plus in between them and get a string. Thus I have > > eq<- y ~ " x1 + x2 + ... +x2000" > or > eq<-"y ~ " x1 + x2 + ... +x2000" > > from either case how can I typecast eq into a langugae object and get the lm command to work ? > > Does anyone have any suggestions? >What about the following:> sum <- paste("+ x", 1:5, sep="", collapse="") > sum[1] "+ x1+ x2+ x3+ x4+ x5"> sum <- substr(sum,2,20) > sum[1] " x1+ x2+ x3+ x4+ x5"> formula <- as.formula(paste("y ~ ", sum, sep="", collapse="")) > formulay ~ x1 + x2 + x3 + x4 + x5> class(formula)[1] "formula"> # simulating some random data > lm(formula)Call: lm(formula = formula) Coefficients: (Intercept) x1 x2 x3 x4 x5 0.05959 0.07876 0.20845 -0.11539 -0.14293 - 0.11421 Kjetil Halvorsen> thanks > > Arnab. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.math.ethz.ch/mailman/listinfo/r-help
ripley at stats.ox.ac.uk wrote:> In this case a simler answer is that you want a formula, so use > as.formula: > > >>eq <- "y ~ x1 + x2 +x2000" >>as.formula(eq) > > y ~ x1 + x2 + x2000 > > once the quotation marks are in the right place. > > It is much simpler to use lm(y ~ ., data=foo) ! > > A caveat: you may hit some limits on expression size if you try to put > 2000 variables in a formula, and almost certainly you will hit > computational limits if you try to do a regression on it, as you need > n >> p = 2000, and the key computation is (I think) O(np^2).> > SNIP Valid points of course on all accounts. Indeed there is a good example at the end of ?as.formula that addresses the construction of a formula from a large number of variables with just the type of sequencing that Arnab is using. To wit: ## Create a formula for a model with a large number of variables: xnam <- paste("x", 1:25, sep="") (fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))) which obviously results in: y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 + x22 + x23 + x24 + x25 I had been in the habit of using the more "generic" approach of eval(parse(text = charvector)) when constructing R code for evaluation in other situations. Regards and thanks, Marc Schwartz