thr3ads.net - R help - [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? [Jan 2012]

If this information is useful, please help other people find it:
Share via:

Jhope

2012-Jan-25 04:41 UTC

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Hi R-listers,

I have developed 47 GLM models with different combinations of interactions
from 1 variable to 5 variables. I have manually made each model separately
and put them into individual tables (organized by the number of variables)
showing the AIC score. I want to compare all of these models. 

1) What is the best way to compare various models with unique combinations
and different number of variables? 
2) I am trying to develop the most simplest model ideally. Even though
adding another variable would lower the AIC, how do I interpret it is worth
it to include another variable in the model? How do I know when to stop? 

Definitions of Variables:
HTL - distance to high tide line (continuous)
Veg - distance to vegetation 
Aeventexhumed - Event of exhumation
Sector - number measurements along the beach
Rayos - major sections of beach (grouped sectors)
TotalEggs - nest egg density

Example of how all models were created: 
Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family binomial,
data.to.analyze)
Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
data.to.analyze, family = binomial)
Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)

Please advise, thanks! 
J


--
View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4326407.html
Sent from the R help mailing list archive at Nabble.com.

Milan Bouchet-Valat

2012-Jan-25 09:32 UTC

head link

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Le mardi 24 janvier 2012 ? 20:41 -0800, Jhope a ?crit :> Hi R-listers,
> 
> I have developed 47 GLM models with different combinations of interactions
> from 1 variable to 5 variables. I have manually made each model separately
> and put them into individual tables (organized by the number of variables)
> showing the AIC score. I want to compare all of these models. 
> 
> 1) What is the best way to compare various models with unique combinations
> and different number of variables? See ?step or ?stepAIC (from package MASS) if you want an automated way
of doing this.
> 2) I am trying to develop the most simplest model ideally. Even though
> adding another variable would lower the AIC, how do I interpret it is worth
> it to include another variable in the model? How do I know when to stop? This is a general statistical question, not specific to R. As a general
rule, if adding a variable lowers the AIC by a significant margin, then
it's worth including it. You should only stop when a variable increases
the AIC. But this is assuming you consider it a good indicator and you
know what you're doing. There's plenty of literature on this subject.
> Definitions of Variables:
> HTL - distance to high tide line (continuous)
> Veg - distance to vegetation 
> Aeventexhumed - Event of exhumation
> Sector - number measurements along the beach
> Rayos - major sections of beach (grouped sectors)
> TotalEggs - nest egg density
> 
> Example of how all models were created: 
> Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family >
binomial, data.to.analyze)
> Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
> data.to.analyze, family = binomial)
> Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
> HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)To extract the AICs of all these models, it's easier to put them in a
list and get their AICs like this:
m <- list()
m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family binomial,
data.to.analyze)

sapply(m, extractAIC)


Cheers

Frank Harrell

2012-Jan-25 13:43 UTC

head link

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

If you are trying to destroy all aspects of statistical inference this is a
good way to go.  This is also a good way to ignore the subject matter in
driving model selection.
Frank

Jhope wrote> 
> Hi R-listers,
> 
> I have developed 47 GLM models with different combinations of interactions
> from 1 variable to 5 variables. I have manually made each model separately
> and put them into individual tables (organized by the number of variables)
> showing the AIC score. I want to compare all of these models. 
> 
> 1) What is the best way to compare various models with unique combinations
> and different number of variables? 
> 2) I am trying to develop the most simplest model ideally. Even though
> adding another variable would lower the AIC, how do I interpret it is
> worth it to include another variable in the model? How do I know when to
> stop? 
> 
> Definitions of Variables:
> HTL - distance to high tide line (continuous)
> Veg - distance to vegetation 
> Aeventexhumed - Event of exhumation
> Sector - number measurements along the beach
> Rayos - major sections of beach (grouped sectors)
> TotalEggs - nest egg density
> 
> Example of how all models were created: 
> Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family >
binomial, data.to.analyze)
> Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
> data.to.analyze, family = binomial)
> Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
> HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
> 
> Please advise, thanks! 
> J
> 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4327219.html
Sent from the R help mailing list archive at Nabble.com.

R. Michael Weylandt

2012-Jan-26 17:05 UTC

head link

[R] How do I compare a multiple staged response with multivariables to a Development Index?

This might get more traction on the R-SIG-Ecology lists.

And best of luck to you; I quite like turtles.

Michael

On Thu, Jan 26, 2012 at 4:37 AM, Jhope <jeanwaijang at gmail.com>
wrote:> Hi R- listeners,
>
> I should add that I would like also to compare my field data to an index
> model. The index was created by using the following script:
>
> devel.index <- function(values, weights=c(1, 2, 3, 4, 5, 6)) {
> ?foo <- values*weights
> ?return(apply(foo, 1, sum) / apply(values, 1, sum))
> }
>
> Background:
> Surveyed turtle egg embryos have been categorized into 6 stages of
> development in the field. The stages in the field data are named ST0, ST1,
> ST2, ST3, ST4, Shells. from the data = data.to.analyze.
>
> Q?
> 1. What is the best way to analyze the field data on embryonic development
> of 6 stages?
> 2. Doing this while considering, testing the variables: Veg, HTL,
> Aeventexhumed, Sector, Rayos, TotalEggs?
> 3. And then compare the results to a development index.
>
> The goal is to determine hatching success in various areas of the beach.
And
> try to create a development index of these microenvironments. Seasonality
> would play a key role. Is this possible?
>
> Many thanks!
> Saludos, Jean
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2012-Jan-26 21:56 UTC

head link

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Simple question. 8 million pages in the statistical literature of
answers. What, indeed, is the secret to life?

Post on a statistical help list (e.g. stats.stackexchange.com). This
has almost nothing to do with R. Be prepared for an onslaught of often
conflicting "wisdom."

-- Bert

On Thu, Jan 26, 2012 at 1:25 PM, Jhope <jeanwaijang at gmail.com>
wrote:> I ask the question about when to stop adding another variable even though
it
> lowers the AIC because each time I add a variable the AIC is lower. How do
I
> know when the model is a good fit? When to stop adding variables, keeping
> the model simple?
>
> Thanks, J
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Rubén Roa

2012-Jan-27 08:03 UTC

head link

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

-----Mensaje original-----
De: Bert Gunter [mailto:gunter.berton at gene.com] 
Enviado el: jueves, 26 de enero de 2012 21:20
Para: Rub?n Roa
CC: Ben Bolker; Frank Harrell
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and
unique combinations?

On Wed, Jan 25, 2012 at 11:39 PM, Rub?n Roa <rroa at azti.es>
wrote:> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at
stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model
outcompetes which in terms of predictive ability by -2AIC or more. Just sorting
them by the AIC does not give you that if no model is better than the next best
by less than 2AIC.
> Third, I was not implying that AIC differences play the role of
significance tests. I agree with you that model selection is better not
understood as a proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If
one is bent on selecting just _the model_ then I check numerical optimization
diagnostics such as size of gradients, KKT criteria, and other issues such as
standard errors of parameter estimates and the correlation matrix of parameter
estimates.
-- And the mathematical basis for this claim is ....  ??

--
Ruben: In my area of work (building/testing/applying mechanistic nonlinear
models of natural and economic phenomena) the issue of numerical optimization is
a very serious one. It is often the case that a really good looking model does
not converge properly (that's why ADMB is so popular among us). So if the
AIC is inconclusive, but one AIC-tied model yields reasonably looking standard
errors and low pairwise parameter estimates correlations, while the other wasn?t
even able to produce a positive definite Hessian matrix (though it was able to
maximize the log-likelihood), I think I have good reasons to select the properly
converged model. I assume here that the lack of convergence is a symptom of
model inadequacy to the data, that the AIC was not able to detect.
---
Ruben: For some reasons I don't find model averaging appealing. I guess deep
in my heart I expect more from my model than just the best predictive ability.

-- This is a religious, not a scientific statement, and has no place in any
scientific discussion.

--
Ruben: Seriously, there is a wide and open place in scientific discussion for
mechanistic model-building. I expect the models I built to be more than able
predictors, I want them to capture some aspect of nature, to teach me something
about nature, so I refrain from model averaging, which is an open admission that
you don't care too much about what's really going on.

-- The belief that certain data analysis practices -- standard or not -- somehow
can be trusted to yield reliable scientific results in the face of clear
theoretical (mathematical )and practical results to the contrary, while
widespread, impedes and often thwarts the progress of science, Evidence-based
medicine and clinical trials came about for a reason. I would encourage you to
reexamine the basis of your scientific practice and the role that "magical
thinking" plays in it.

Best,
Bert

--
Ruben: All right Bert. I often re-examine the basis of my scientific praxis but
less often than I did before, I have to confess. I like to think it is because I
am converging on the right praxis so there are less issues to re-examine. But
this problem of model selection is a tough one. Being a likelihoodist in
inference naturally leads you to AIC-based model selection, Jim Lindsey being
influent too. Wanting that your models say some something about nature is
another strong conditioning factor. Put this two together and it becomes hard to
do model-averaging. And it has nothing to do with religion (yuck!).

Greg Snow

2012-Jan-27 18:48 UTC

head link

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

What variables to consider adding and when to stop adding them depends greatly
upon what question(s) you are trying to answer and the science behind your data.

Are you trying to create a model to predict your outcome for future predictors? 
How precise of predictions are needed?

Are you trying to understand how certain predictors relate to the response? How
they relate after conditioning on other predictors?

Will humans be using your equation directly? Or will it be in a black box that
the computer generates predictions from but people never need to look at the
details?

What is the cost (money, time, difficulty, etc.) of collecting the different
predictors?

Answers to the above questions will be much more valuable in choosing the
"best" model than AIC or other values (though you should still look at
the results from analyses for information to combine with the other
information).  R and its programmers (no matter how great and wonderful they
are) cannot answer these for you.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Jhope
> Sent: Thursday, January 26, 2012 2:26 PM
> To: r-help at r-project.org
> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5
> interactions and unique combinations?
> 
> I ask the question about when to stop adding another variable even
> though it
> lowers the AIC because each time I add a variable the AIC is lower. How
> do I
> know when the model is a good fit? When to stop adding variables,
> keeping
> the model simple?
> 
> Thanks, J
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/How-do-I-
> compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-
> tp4326407p4331848.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jan 2012 - How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

[R] How do I compare a multiple staged response with multivariables to a Development Index?

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Seemingly Similar Threads