thr3ads.net - R help - [R] fitting of all possible models [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Indermaur Lukas

2007-Feb-27 07:46 UTC

[R] fitting of all possible models

Hi,
Fitting all possible models (GLM) with 10 predictors will result in loads of
(2^10 - 1) models. I want to do that in order to get the importance of variables
(having an unbalanced variable design) by summing the up the AIC-weights of
models including the same variable, for every variable separately. It's time
consuming and annoying to define all possible models by hand.
 
Is there a command, or easy solution to let R define the set of all possible
models itself? I defined models in the following way to process them with a
batch job:
 
# e.g. model 1
preference<- formula(Y~Lwd + N + Sex + YY)
# e.g. model 2
preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY)  
etc.
etc.
 
 
I appreciate any hint
Cheers
Lukas
 
 
 
 
 
??? 
Lukas Indermaur, PhD student 
eawag / Swiss Federal Institute of Aquatic Science and Technology 
ECO - Department of Aquatic Ecology
?berlandstrasse 133
CH-8600 D?bendorf
Switzerland
 
Phone: +41 (0) 71 220 38 25
Fax    : +41 (0) 44 823 53 15 
Email: lukas.indermaur at eawag.ch
www.lukasindermaur.ch

Frank E Harrell Jr

2007-Feb-27 13:13 UTC

head link

[R] fitting of all possible models

Indermaur Lukas wrote:> Hi,
> Fitting all possible models (GLM) with 10 predictors will result in loads
of (2^10 - 1) models. I want to do that in order to get the importance of
variables (having an unbalanced variable design) by summing the up the
AIC-weights of models including the same variable, for every variable
separately. It's time consuming and annoying to define all possible models
by hand.
>  
> Is there a command, or easy solution to let R define the set of all
possible models itself? I defined models in the following way to process them
with a batch job:
>  
> # e.g. model 1
> preference<- formula(Y~Lwd + N + Sex + YY)
> # e.g. model 2
> preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY)  
> etc.
> etc.
>  
>  
> I appreciate any hint
> Cheers
> Lukas
If you choose the model from amount 2^10 -1 having best AIC, that model 
will be badly biased.  Why look at so many?  Pre-specification of 
models, or fitting full models with penalization, or using data 
reduction (masked to Y) may work better.

Frank
>  
>  
>  
>  
>  
> ??? 
> Lukas Indermaur, PhD student 
> eawag / Swiss Federal Institute of Aquatic Science and Technology 
> ECO - Department of Aquatic Ecology
> ?berlandstrasse 133
> CH-8600 D?bendorf
> Switzerland
>  
> Phone: +41 (0) 71 220 38 25
> Fax    : +41 (0) 44 823 53 15 
> Email: lukas.indermaur at eawag.ch
> www.lukasindermaur.ch
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

hadley wickham

2007-Feb-27 13:22 UTC

head link

[R] fitting of all possible models

Hi Lukas,

You may find my meifly package helpful.  It provides functions to
generate ensembles of models (eg. fitall) and then extract all the
coefficients, residuals etc (coef, summary, residual etc).  The main
point of the package is to visualise all these models, and I second
Frank's comment that merely selecting the best model will be perilous.

Unfortunately, the package is not on CRAN yet, but if you are
interested, contact me off list with your OS and I can email you the
package, and accompanying paper.

Regards,

Hadley

On 2/27/07, Indermaur Lukas <Lukas.Indermaur at eawag.ch>
wrote:> Hi,
> Fitting all possible models (GLM) with 10 predictors will result in loads
of (2^10 - 1) models. I want to do that in order to get the importance of
variables (having an unbalanced variable design) by summing the up the
AIC-weights of models including the same variable, for every variable
separately. It's time consuming and annoying to define all possible models
by hand.
>
> Is there a command, or easy solution to let R define the set of all
possible models itself? I defined models in the following way to process them
with a batch job:
>
> # e.g. model 1
> preference<- formula(Y~Lwd + N + Sex + YY)
> # e.g. model 2
> preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY)
> etc.
> etc.
>
>
> I appreciate any hint
> Cheers
> Lukas
>
>
>
>
>
> ???
> Lukas Indermaur, PhD student
> eawag / Swiss Federal Institute of Aquatic Science and Technology
> ECO - Department of Aquatic Ecology
> ?berlandstrasse 133
> CH-8600 D?bendorf
> Switzerland
>
> Phone: +41 (0) 71 220 38 25
> Fax    : +41 (0) 44 823 53 15
> Email: lukas.indermaur at eawag.ch
> www.lukasindermaur.ch
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Greg Snow

2007-Feb-27 20:58 UTC

head link

[R] fitting of all possible models

You may want to look at the packages 'leaps'.  I don't think it does
glm's, but possibly you could modify it to.

Otherwise here is one quick approach (though there are probably better ones):
> apply( expand.grid( c(TRUE,FALSE),c(TRUE,FALSE),c(TRUE,FALSE) ),+ 1, function(x) as.formula(paste(c('y~1',
c('x1','x2','x3')[x]), collapse='+')))

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Indermaur Lukas
> Sent: Tuesday, February 27, 2007 12:46 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] fitting of all possible models
> 
> Hi,
> Fitting all possible models (GLM) with 10 predictors will 
> result in loads of (2^10 - 1) models. I want to do that in 
> order to get the importance of variables (having an 
> unbalanced variable design) by summing the up the AIC-weights 
> of models including the same variable, for every variable 
> separately. It's time consuming and annoying to define all 
> possible models by hand. 
>  
> Is there a command, or easy solution to let R define the set 
> of all possible models itself? I defined models in the 
> following way to process them with a batch job:
>  
> # e.g. model 1
> preference<- formula(Y~Lwd + N + Sex + YY)                    
>                             
> # e.g. model 2
> preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY) etc.
> etc.
>  
>  
> I appreciate any hint
> Cheers
> Lukas
>  
>  
>  
>  
>  
> ??? 
> Lukas Indermaur, PhD student 
> eawag / Swiss Federal Institute of Aquatic Science and Technology 
> ECO - Department of Aquatic Ecology
> ?berlandstrasse 133
> CH-8600 D?bendorf
> Switzerland
>  
> Phone: +41 (0) 71 220 38 25
> Fax    : +41 (0) 44 823 53 15 
> Email: lukas.indermaur at eawag.ch
> www.lukasindermaur.ch
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Rense Nieuwenhuis

2007-Mar-15 10:10 UTC

head link

[R] fitting of all possible models

Dear Lukas,

allthough I'm  intrigued by the purpose of what you are trying to do,  
as mentioned by some of the other persons on this list, I liked the  
challenge to write such a function.

I came up with the following during some train-traveling this morning:

tum <- function(x)
	{
		tum <- matrix(data=NA, nrow=2^x, ncol=x)

		for (i in 1:x)
			{
				tum[,i] <- c(rep(NA,2^i/2),rep(i+1,2^i/2))
			}

		return(tum)
	}

###

all.models <- function(model)
	{
		npred <- length(model$coefficients) - 1
		matr.model <- tum(npred)
		output <- matrix(data=NA, nrow=2^npred, ncol=1)

		for (t in 2:2^npred)
		{
			preds <- names(model$coefficients)
			interc <- names(model$coefficients)[1]
			form <- as.formula(paste(". ~", paste(preds[na.omit(matr.model 
[t,])],collapse="+")))

			model2 <- update(model, form)
			output[t,] <- mean(resid(model2)^2)
		}

	return(output)

	}

##

As you can see, I used a helper-function (tum, "the ultimate matrix")
to the actual function. Also, I wrote it using lm instead of glm, but  
I suppose that you can easily alter that. As well, the function now  
returns just some regular fit-measurement. But that is not all that  
essential, I think.

The main point is: it works! Using this on my G4 mac, with a lm of 10  
predictors and 18 cases, it returns the output quite fast (<1 minute).

I hope you can put this to use. It needs some easy adapting to your  
specific needs, but I don't expect that to be a problem. If you need  
help with that, please contact me.

I'd appreciate to hear from you, if this function is helpful in any way.

sincerely,

Rense Nieuwenhuis

On Feb 27, 2007, at 8:46 , Indermaur Lukas wrote:
> Hi,
> Fitting all possible models (GLM) with 10 predictors will result in  
> loads of (2^10 - 1) models. I want to do that in order to get the  
> importance of variables (having an unbalanced variable design) by  
> summing the up the AIC-weights of models including the same  
> variable, for every variable separately. It's time consuming and  
> annoying to define all possible models by hand.
>
> Is there a command, or easy solution to let R define the set of all  
> possible models itself? I defined models in the following way to  
> process them with a batch job:
>
> # e.g. model 1
> preference<- formula(Y~Lwd + N + Sex + YY)
> # e.g. model 2
> preference_heterogeneity<- formula(Y~Ri + Lwd + N + Sex + YY)
> etc.
> etc.
>
>
> I appreciate any hint
> Cheers
> Lukas
>
>
>
>
>
> °°°
> Lukas Indermaur, PhD student
> eawag / Swiss Federal Institute of Aquatic Science and Technology
> ECO - Department of Aquatic Ecology
> Überlandstrasse 133
> CH-8600 Dübendorf
> Switzerland
>
> Phone: +41 (0) 71 220 38 25
> Fax    : +41 (0) 44 823 53 15
> Email: lukas.indermaur@eawag.ch
> www.lukasindermaur.ch
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2007 - fitting of all possible models

[R] fitting of all possible models

[R] fitting of all possible models

[R] fitting of all possible models

[R] fitting of all possible models

[R] fitting of all possible models

Possibly Parallel Threads