Jesse.Whittington@pc.gc.ca
2006-Jan-30 16:22 UTC
[R] Logistic regression model selection with overdispersed/autocorrelated data
I am creating habitat selection models for caribou and other species with data collected from GPS collars. In my current situation the radio-collars recorded the locations of 30 caribou every 6 hours. I am then comparing resources used at caribou locations to random locations using logistic regression (standard habitat analysis). The data is therefore highly autocorrelated and this causes Type I error two ways small standard errors around beta-coefficients and over-paramaterization during model selection. Robust standard errors are easily calculated by block-bootstrapping the data using animal as a cluster with the Design library, however I havent found a satisfactory solution for model selection. A couple options are: 1. Using QAIC where the deviance is divided by a variance inflation factor (Burnham & Anderson). However, this VIF can vary greatly depending on the data set and the set of covariates used in the global model. 2. Manual forward stepwise regression using both changes in deviance and robust p-values for the beta-coefficients. I have been looking for a solution to this problem for a couple years and would appreciate any advice. Jesse
Frank E Harrell Jr
2006-Jan-30 21:37 UTC
[R] Logistic regression model selection with overdispersed/autocorrelated data
Jesse.Whittington at pc.gc.ca wrote:> > I am creating habitat selection models for caribou and other species with > data collected from GPS collars. In my current situation the radio-collars > recorded the locations of 30 caribou every 6 hours. I am then comparing > resources used at caribou locations to random locations using logistic > regression (standard habitat analysis). > > The data is therefore highly autocorrelated and this causes Type I error > two ways small standard errors around beta-coefficients and > over-paramaterization during model selection. Robust standard errors are > easily calculated by block-bootstrapping the data using animal as a > cluster with the Design library, however I havent found a satisfactory > solution for model selection. > > A couple options are: > 1. Using QAIC where the deviance is divided by a variance inflation factor > (Burnham & Anderson). However, this VIF can vary greatly depending on the > data set and the set of covariates used in the global model. > 2. Manual forward stepwise regression using both changes in deviance and > robust p-values for the beta-coefficients. > > I have been looking for a solution to this problem for a couple years and > would appreciate any advice. > > JesseIf you must do non-subject-matter-driven model selection, look at the fastbw function in Design, which will use the cluster bootstrap variance matrix. Frank> > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Renaud Lancelot
2006-Jan-31 08:02 UTC
[R] Logistic regression model selection with overdispersed/autocorrelated data
If you're not interested in fitting caribou-specific responses, you can use beta-binomial logistic models. There are several package available for this purpose on CRAN, among which aod. Because these models are fitted using maximum-likelihood methods, you can use AIC (or other information criteria) to compare different models. Best, Renaud 2006/1/30, Jesse.Whittington at pc.gc.ca <Jesse.Whittington at pc.gc.ca>:> > > I am creating habitat selection models for caribou and other species with > data collected from GPS collars. In my current situation the radio-collars > recorded the locations of 30 caribou every 6 hours. I am then comparing > resources used at caribou locations to random locations using logistic > regression (standard habitat analysis). > > The data is therefore highly autocorrelated and this causes Type I error > two ways small standard errors around beta-coefficients and > over-paramaterization during model selection. Robust standard errors are > easily calculated by block-bootstrapping the data using "animal" as a > cluster with the Design library, however I haven't found a satisfactory > solution for model selection. > > A couple options are: > 1. Using QAIC where the deviance is divided by a variance inflation factor > (Burnham & Anderson). However, this VIF can vary greatly depending on the > data set and the set of covariates used in the global model. > 2. Manual forward stepwise regression using both changes in deviance and > robust p-values for the beta-coefficients. > > I have been looking for a solution to this problem for a couple years and > would appreciate any advice. > > Jesse > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- Renaud LANCELOT Dpartement Elevage et Mdecine Vtrinaire (EMVT) du CIRAD Directeur adjoint charg des affaires scientifiques CIRAD, Animal Production and Veterinary Medicine Department Deputy director for scientific affairs Campus international de Baillarguet TA 30 / B (Bt. B, Bur. 214) 34398 Montpellier Cedex 5 - France Tl +33 (0)4 67 59 37 17 Secr. +33 (0)4 67 59 39 04 Fax +33 (0)4 67 59 37 95
Jesse.Whittington@pc.gc.ca
2006-Jan-31 16:09 UTC
[R] Logistic regression model selection with overdispersed/autocorrelated data
Jesse.Whittington at pc.gc.ca wrote:> > I am creating habitat selection models for caribou and other species with > data collected from GPS collars. In my current situation theradio-collars> recorded the locations of 30 caribou every 6 hours. I am then comparing > resources used at caribou locations to random locations using logistic > regression (standard habitat analysis). > > The data is therefore highly autocorrelated and this causes Type I error > two ways small standard errors around beta-coefficients and > over-paramaterization during model selection. Robust standard errors are > easily calculated by block-bootstrapping the data using animal as a > cluster with the Design library, however I havent found a satisfactory > solution for model selection. > > A couple options are: > 1. Using QAIC where the deviance is divided by a variance inflationfactor> (Burnham & Anderson). However, this VIF can vary greatly depending onthe> data set and the set of covariates used in the global model. > 2. Manual forward stepwise regression using both changes in deviance and > robust p-values for the beta-coefficients. > > I have been looking for a solution to this problem for a couple years and > would appreciate any advice. > > JesseFrank E Harrell Jr wrote: If you must do non-subject-matter-driven model selection, look at the fastbw function in Design, which will use the cluster bootstrap variance matrix. Frank Thanks for the tip. I didn't know that the fastbw function could account for the clustered variance. For others, the code to run such a model from the Design library would be: model.1 <- lrm(y ~ x1+x2+x3+x4, data=data, x=T,y=T) # create model model.2 <- bootcov(model.1, cluster=data$animal, B=10000) # calculate robust variance matrix fastbw(model.2) # backward step-wise selection. Later we will examine individual caribou responses to trails (subject-specific model selection). For this we plan to use mixed effects models (lmer). Is this what you would also recommend? I look forward to reading the new edition of your book when it is published. Jesse
Jesse.Whittington@pc.gc.ca
2006-Feb-08 15:49 UTC
[R] Logistic regression model selection with overdispersed/autocorrelated data
Thanks for pointing out the aod package and the beta-binomial logistic models Renaud. While I see how betabinom could be applied to some of our other analyses , I don't see how it can be used in our habitat selection analysis where individual locations are coded as 0 or 1 rather than proportions. Gee models (geeglm from geepack) could be used for our analyses. Even though these models are fit using maximum likelihood estimation, they do not solve our model selection problem. Beta-coefficients from gee, glm, glmm's, and lrm are nearly identical. The only thing that varies is the variance-covariance matrix and the resulting standard errors. Consequently, the deviances should be similar because predicted values (p) are calculated from the beta-coefficients. For an individual data point, the loglikelihood = y * log(p) + (1 - y) * log(1-p) and the deviance = -2 * sum(loglikelihoods). Consequently, the difference in deviance between two models is amplified by autocorrelated data and causes models to be overparamaterized when using AIC or likelihood ratio tests. I am curious how others select models with autocorrelated data. Thanks for your help, Jesse Renaud Lancelot <renaud.lancelot@ To: "Jesse.Whittington at pc.gc.ca" <Jesse.Whittington at pc.gc.ca> gmail.com> cc: r-help at stat.math.ethz.ch Subject: Re: [R] Logistic regression model selection with overdispersed/autocorrelated 31/01/2006 01:02 data If you're not interested in fitting caribou-specific responses, you can use beta-binomial logistic models. There are several package available for this purpose on CRAN, among which aod. Because these models are fitted using maximum-likelihood methods, you can use AIC (or other information criteria) to compare different models. Best, Renaud 2006/1/30, Jesse.Whittington at pc.gc.ca <Jesse.Whittington at pc.gc.ca>:> > > I am creating habitat selection models for caribou and other species with > data collected from GPS collars. In my current situation theradio-collars> recorded the locations of 30 caribou every 6 hours. I am then comparing > resources used at caribou locations to random locations using logistic > regression (standard habitat analysis). > > The data is therefore highly autocorrelated and this causes Type I error > two ways small standard errors around beta-coefficients and > over-paramaterization during model selection. Robust standard errors are > easily calculated by block-bootstrapping the data using "animal" as a > cluster with the Design library, however I haven't found a satisfactory > solution for model selection. > > A couple options are: > 1. Using QAIC where the deviance is divided by a variance inflationfactor> (Burnham & Anderson). However, this VIF can vary greatly depending onthe> data set and the set of covariates used in the global model. > 2. Manual forward stepwise regression using both changes in deviance and > robust p-values for the beta-coefficients. > > I have been looking for a solution to this problem for a couple years and > would appreciate any advice. > > Jesse > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html -- Renaud LANCELOT Dpartement Elevage et Mdecine Vtrinaire (EMVT) du CIRAD Directeur adjoint charg des affaires scientifiques CIRAD, Animal Production and Veterinary Medicine Department Deputy director for scientific affairs Campus international de Baillarguet TA 30 / B (Bt. B, Bur. 214) 34398 Montpellier Cedex 5 - France Tl +33 (0)4 67 59 37 17 Secr. +33 (0)4 67 59 39 04 Fax +33 (0)4 67 59 37 95
Reasonably Related Threads
- hypergeometric & population estimates
- Quasi AIC
- Estimating QAIC using glm with the quasibinomial family
- How do I set the dispersion parameter in poisson glm?
- Quasi-poisson glm and calculating a qAIC and qAICc...trying to modilfy Bolker et al. 2009 function to work for a glm model