Hi, Fitting all possible models (GLM) with 10 predictors will result in loads of (2^10 - 1) models. I want to do that in order to get the importance of variables (having an unbalanced variable design) by summing the up the AIC-weights of models including the same variable, for every variable separately. It's time consuming and annoying to define all possible models by hand. Is there a command, or easy solution to let R define the set of all possible models itself? I defined models in the following way to process them with a batch job: # e.g. model 1 preference<- formula(Y~Lwd + N + Sex + YY) # e.g. model 2 preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY) etc. etc. I appreciate any hint Cheers Lukas ??? Lukas Indermaur, PhD student eawag / Swiss Federal Institute of Aquatic Science and Technology ECO - Department of Aquatic Ecology ?berlandstrasse 133 CH-8600 D?bendorf Switzerland Phone: +41 (0) 71 220 38 25 Fax : +41 (0) 44 823 53 15 Email: lukas.indermaur at eawag.ch www.lukasindermaur.ch
Indermaur Lukas wrote:> Hi, > Fitting all possible models (GLM) with 10 predictors will result in loads of (2^10 - 1) models. I want to do that in order to get the importance of variables (having an unbalanced variable design) by summing the up the AIC-weights of models including the same variable, for every variable separately. It's time consuming and annoying to define all possible models by hand. > > Is there a command, or easy solution to let R define the set of all possible models itself? I defined models in the following way to process them with a batch job: > > # e.g. model 1 > preference<- formula(Y~Lwd + N + Sex + YY) > # e.g. model 2 > preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY) > etc. > etc. > > > I appreciate any hint > Cheers > LukasIf you choose the model from amount 2^10 -1 having best AIC, that model will be badly biased. Why look at so many? Pre-specification of models, or fitting full models with penalization, or using data reduction (masked to Y) may work better. Frank> > > > > > ??? > Lukas Indermaur, PhD student > eawag / Swiss Federal Institute of Aquatic Science and Technology > ECO - Department of Aquatic Ecology > ?berlandstrasse 133 > CH-8600 D?bendorf > Switzerland > > Phone: +41 (0) 71 220 38 25 > Fax : +41 (0) 44 823 53 15 > Email: lukas.indermaur at eawag.ch > www.lukasindermaur.ch > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Hi Lukas, You may find my meifly package helpful. It provides functions to generate ensembles of models (eg. fitall) and then extract all the coefficients, residuals etc (coef, summary, residual etc). The main point of the package is to visualise all these models, and I second Frank's comment that merely selecting the best model will be perilous. Unfortunately, the package is not on CRAN yet, but if you are interested, contact me off list with your OS and I can email you the package, and accompanying paper. Regards, Hadley On 2/27/07, Indermaur Lukas <Lukas.Indermaur at eawag.ch> wrote:> Hi, > Fitting all possible models (GLM) with 10 predictors will result in loads of (2^10 - 1) models. I want to do that in order to get the importance of variables (having an unbalanced variable design) by summing the up the AIC-weights of models including the same variable, for every variable separately. It's time consuming and annoying to define all possible models by hand. > > Is there a command, or easy solution to let R define the set of all possible models itself? I defined models in the following way to process them with a batch job: > > # e.g. model 1 > preference<- formula(Y~Lwd + N + Sex + YY) > # e.g. model 2 > preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY) > etc. > etc. > > > I appreciate any hint > Cheers > Lukas > > > > > > ??? > Lukas Indermaur, PhD student > eawag / Swiss Federal Institute of Aquatic Science and Technology > ECO - Department of Aquatic Ecology > ?berlandstrasse 133 > CH-8600 D?bendorf > Switzerland > > Phone: +41 (0) 71 220 38 25 > Fax : +41 (0) 44 823 53 15 > Email: lukas.indermaur at eawag.ch > www.lukasindermaur.ch > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
You may want to look at the packages 'leaps'. I don't think it does glm's, but possibly you could modify it to. Otherwise here is one quick approach (though there are probably better ones):> apply( expand.grid( c(TRUE,FALSE),c(TRUE,FALSE),c(TRUE,FALSE) ),+ 1, function(x) as.formula(paste(c('y~1', c('x1','x2','x3')[x]), collapse='+'))) Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Indermaur Lukas > Sent: Tuesday, February 27, 2007 12:46 AM > To: r-help at stat.math.ethz.ch > Subject: [R] fitting of all possible models > > Hi, > Fitting all possible models (GLM) with 10 predictors will > result in loads of (2^10 - 1) models. I want to do that in > order to get the importance of variables (having an > unbalanced variable design) by summing the up the AIC-weights > of models including the same variable, for every variable > separately. It's time consuming and annoying to define all > possible models by hand. > > Is there a command, or easy solution to let R define the set > of all possible models itself? I defined models in the > following way to process them with a batch job: > > # e.g. model 1 > preference<- formula(Y~Lwd + N + Sex + YY) > > # e.g. model 2 > preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY) etc. > etc. > > > I appreciate any hint > Cheers > Lukas > > > > > > ??? > Lukas Indermaur, PhD student > eawag / Swiss Federal Institute of Aquatic Science and Technology > ECO - Department of Aquatic Ecology > ?berlandstrasse 133 > CH-8600 D?bendorf > Switzerland > > Phone: +41 (0) 71 220 38 25 > Fax : +41 (0) 44 823 53 15 > Email: lukas.indermaur at eawag.ch > www.lukasindermaur.ch > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dear Lukas, allthough I'm intrigued by the purpose of what you are trying to do, as mentioned by some of the other persons on this list, I liked the challenge to write such a function. I came up with the following during some train-traveling this morning: tum <- function(x) { tum <- matrix(data=NA, nrow=2^x, ncol=x) for (i in 1:x) { tum[,i] <- c(rep(NA,2^i/2),rep(i+1,2^i/2)) } return(tum) } ### all.models <- function(model) { npred <- length(model$coefficients) - 1 matr.model <- tum(npred) output <- matrix(data=NA, nrow=2^npred, ncol=1) for (t in 2:2^npred) { preds <- names(model$coefficients) interc <- names(model$coefficients)[1] form <- as.formula(paste(". ~", paste(preds[na.omit(matr.model [t,])],collapse="+"))) model2 <- update(model, form) output[t,] <- mean(resid(model2)^2) } return(output) } ## As you can see, I used a helper-function (tum, "the ultimate matrix") to the actual function. Also, I wrote it using lm instead of glm, but I suppose that you can easily alter that. As well, the function now returns just some regular fit-measurement. But that is not all that essential, I think. The main point is: it works! Using this on my G4 mac, with a lm of 10 predictors and 18 cases, it returns the output quite fast (<1 minute). I hope you can put this to use. It needs some easy adapting to your specific needs, but I don't expect that to be a problem. If you need help with that, please contact me. I'd appreciate to hear from you, if this function is helpful in any way. sincerely, Rense Nieuwenhuis On Feb 27, 2007, at 8:46 , Indermaur Lukas wrote:> Hi, > Fitting all possible models (GLM) with 10 predictors will result in > loads of (2^10 - 1) models. I want to do that in order to get the > importance of variables (having an unbalanced variable design) by > summing the up the AIC-weights of models including the same > variable, for every variable separately. It's time consuming and > annoying to define all possible models by hand. > > Is there a command, or easy solution to let R define the set of all > possible models itself? I defined models in the following way to > process them with a batch job: > > # e.g. model 1 > preference<- formula(Y~Lwd + N + Sex + YY) > # e.g. model 2 > preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY) > etc. > etc. > > > I appreciate any hint > Cheers > Lukas > > > > > > °°° > Lukas Indermaur, PhD student > eawag / Swiss Federal Institute of Aquatic Science and Technology > ECO - Department of Aquatic Ecology > Überlandstrasse 133 > CH-8600 Dübendorf > Switzerland > > Phone: +41 (0) 71 220 38 25 > Fax : +41 (0) 44 823 53 15 > Email: lukas.indermaur@eawag.ch > www.lukasindermaur.ch > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]