thr3ads.net - R help - [R] Logistic regression model selection with overdispersed/autocorrelated data [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Jesse.Whittington@pc.gc.ca

2006-Jan-30 16:22 UTC

[R] Logistic regression model selection with overdispersed/autocorrelated data

I am creating habitat selection models for caribou and other species with
data collected from GPS collars.  In my current situation the radio-collars
recorded the locations of 30 caribou every 6 hours.  I am then comparing
resources used at caribou locations to random locations using logistic
regression (standard habitat analysis).

The data is therefore highly autocorrelated and this causes Type I error
two ways  small standard errors around beta-coefficients and
over-paramaterization during model selection.  Robust standard errors are
easily calculated by block-bootstrapping the data using animal as a
cluster with the Design library, however I havent found a satisfactory
solution for model selection.

A couple options are:
1.  Using QAIC where the deviance is divided by a variance inflation factor
(Burnham & Anderson).  However, this VIF can vary greatly depending on the
data set and the set of covariates used in the global model.
2.  Manual forward stepwise regression using both changes in deviance and
robust p-values for the beta-coefficients.

I have been looking for a solution to this problem for a couple years and
would appreciate any advice.

Jesse

Frank E Harrell Jr

2006-Jan-30 21:37 UTC

head link

[R] Logistic regression model selection with overdispersed/autocorrelated data

Jesse.Whittington at pc.gc.ca wrote:> 
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the radio-collars
> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
> 
> The data is therefore highly autocorrelated and this causes Type I error
> two ways  small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using animal as a
> cluster with the Design library, however I havent found a satisfactory
> solution for model selection.
> 
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation factor
> (Burnham & Anderson).  However, this VIF can vary greatly depending on
the
> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
> 
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
> 
> Jesse
If you must do non-subject-matter-driven model selection, look at the 
fastbw function in Design, which will use the cluster bootstrap variance 
matrix.

Frank
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Renaud Lancelot

2006-Jan-31 08:02 UTC

head link

[R] Logistic regression model selection with overdispersed/autocorrelated data

If you're not interested in fitting caribou-specific responses, you
can use beta-binomial logistic models. There are several package
available for this purpose on CRAN, among which aod. Because these
models are fitted using maximum-likelihood methods, you can use AIC
(or other information criteria) to compare different models.

Best,

Renaud

2006/1/30, Jesse.Whittington at pc.gc.ca <Jesse.Whittington at
pc.gc.ca>:>
>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the radio-collars
> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways  small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using "animal"
as a
> cluster with the Design library, however I haven't found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation factor
> (Burnham & Anderson).  However, this VIF can vary greatly depending on
the
> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

--
Renaud LANCELOT
Dpartement Elevage et Mdecine Vtrinaire (EMVT) du CIRAD
Directeur adjoint charg des affaires scientifiques

CIRAD, Animal Production and Veterinary Medicine Department
Deputy director for scientific affairs

Campus international de Baillarguet
TA 30 / B (Bt. B, Bur. 214)
34398 Montpellier Cedex 5 - France
Tl   +33 (0)4 67 59 37 17
Secr. +33 (0)4 67 59 39 04
Fax   +33 (0)4 67 59 37 95

Jesse.Whittington@pc.gc.ca

2006-Jan-31 16:09 UTC

head link

[R] Logistic regression model selection with overdispersed/autocorrelated data

Jesse.Whittington at pc.gc.ca wrote:>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the
radio-collars> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways  small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using animal as a
> cluster with the Design library, however I havent found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation
factor> (Burnham & Anderson).  However, this VIF can vary greatly depending on
the> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse
Frank E Harrell Jr wrote:

If you must do non-subject-matter-driven model selection, look at the
fastbw function in Design, which will use the cluster bootstrap variance
matrix.

Frank


Thanks for the tip.  I didn't know that the fastbw function could account
for the clustered variance.  For others, the code to run such a model from
the Design library would be:

model.1 <- lrm(y ~ x1+x2+x3+x4, data=data, x=T,y=T)          # create model
model.2 <- bootcov(model.1, cluster=data$animal, B=10000)    # calculate
robust variance matrix
fastbw(model.2)                                              # backward
step-wise selection.

Later we will examine individual caribou responses to trails
(subject-specific model selection).  For this we plan to use mixed effects
models (lmer).  Is this what you would also recommend?

I look forward to reading the new edition of your book when it is
published.

Jesse

Jesse.Whittington@pc.gc.ca

2006-Feb-08 15:49 UTC

head link

[R] Logistic regression model selection with overdispersed/autocorrelated data

Thanks for pointing out the aod package and the beta-binomial logistic
models Renaud.

While I see how betabinom could be applied to some of our other analyses ,
I don't see how it can be used in our habitat selection analysis where
individual locations are coded as 0 or 1 rather than proportions.  Gee
models (geeglm from geepack) could be used for our analyses.  Even though
these models are fit using maximum likelihood estimation, they do not solve
our model selection problem.

Beta-coefficients from gee, glm, glmm's, and lrm are nearly identical.  The
only thing that varies is the variance-covariance matrix and the resulting
standard errors.  Consequently, the deviances should be similar because
predicted values (p) are calculated from the beta-coefficients.  For an
individual data point, the loglikelihood = y * log(p) + (1 - y) * log(1-p)
and the deviance = -2 * sum(loglikelihoods).  Consequently, the difference
in deviance between two models is amplified by autocorrelated data and
causes models to be overparamaterized when using AIC or likelihood ratio
tests.

I am curious how others select models with autocorrelated data.

Thanks for your help,

Jesse






                      Renaud Lancelot
                      <renaud.lancelot@        To:      
"Jesse.Whittington at pc.gc.ca" <Jesse.Whittington at pc.gc.ca>
                      gmail.com>               cc:       r-help at
stat.math.ethz.ch
                                               Subject:  Re: [R] Logistic
regression model selection with overdispersed/autocorrelated
                      31/01/2006 01:02          data






If you're not interested in fitting caribou-specific responses, you
can use beta-binomial logistic models. There are several package
available for this purpose on CRAN, among which aod. Because these
models are fitted using maximum-likelihood methods, you can use AIC
(or other information criteria) to compare different models.

Best,

Renaud

2006/1/30, Jesse.Whittington at pc.gc.ca <Jesse.Whittington at
pc.gc.ca>:>
>
> I am creating habitat selection models for caribou and other species with
> data collected from GPS collars.  In my current situation the
radio-collars> recorded the locations of 30 caribou every 6 hours.  I am then comparing
> resources used at caribou locations to random locations using logistic
> regression (standard habitat analysis).
>
> The data is therefore highly autocorrelated and this causes Type I error
> two ways  small standard errors around beta-coefficients and
> over-paramaterization during model selection.  Robust standard errors are
> easily calculated by block-bootstrapping the data using "animal"
as a
> cluster with the Design library, however I haven't found a satisfactory
> solution for model selection.
>
> A couple options are:
> 1.  Using QAIC where the deviance is divided by a variance inflation
factor> (Burnham & Anderson).  However, this VIF can vary greatly depending on
the> data set and the set of covariates used in the global model.
> 2.  Manual forward stepwise regression using both changes in deviance and
> robust p-values for the beta-coefficients.
>
> I have been looking for a solution to this problem for a couple years and
> would appreciate any advice.
>
> Jesse
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html


--
Renaud LANCELOT
Dpartement Elevage et Mdecine Vtrinaire (EMVT) du CIRAD
Directeur adjoint charg des affaires scientifiques

CIRAD, Animal Production and Veterinary Medicine Department
Deputy director for scientific affairs

Campus international de Baillarguet
TA 30 / B (Bt. B, Bur. 214)
34398 Montpellier Cedex 5 - France
Tl   +33 (0)4 67 59 37 17
Secr. +33 (0)4 67 59 39 04
Fax   +33 (0)4 67 59 37 95

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2006 - Logistic regression model selection with overdispersed/autocorrelated data

[R] Logistic regression model selection with overdispersed/autocorrelated data

[R] Logistic regression model selection with overdispersed/autocorrelated data

[R] Logistic regression model selection with overdispersed/autocorrelated data

[R] Logistic regression model selection with overdispersed/autocorrelated data

[R] Logistic regression model selection with overdispersed/autocorrelated data

Reasonably Related Threads