thr3ads.net - R help - [R] glmmPQL and spatial correlation [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Luis Huckstadt

2012-Oct-10 17:03 UTC

[R] glmmPQL and spatial correlation

Hi all,

I'm running into some computer issues when trying to run a binomial model
for spatially correlated data using glmmPQL and was wondering if anyone
could help me out.
My whole dataset consists of about 300,000 points for which I have a suite
of environmental variables (I'm trying to come up with a habitat model for
a species of seal, using real (presence) and simulated dives (absence) as
my response variable).
Since my dataset is so large, I split it into thirds and ran the model
without spatial correlation. However, when checking my results, I did get a
spatial correlation in the residuals, which I'm trying to incorporate using
a variogram (spherical). The problem is that  when I run it  for my 1/3 of
my data (about 70k points), the calculatations go on forever. I was running
the first model for over a week and was still not getting past the first
iteration.

This is the model I used

M1.f.Spa <- glmmPQL(presence ~ sst + tmax100 + tbot + sss + ssu + tmax100d
+ sbot, random = ~1|fPTT, family = binomial, correlation
corSpher(c(91323.53,0.4279603), form =~ x+y, nugget = TRUE), data sample.df1)

This week, I tried a different approach and split the 70k subset into
smaller datasets of 10,000 points, and now the model runs much faster (just
a couple of hours tops), Yet the output of these models changes with
regards to the significance of one variable, which makes me think that I
need a larger dataset than 10,000 points.

Does someone have a suggestion on how to improve/run this code with the
spatial correlation for a larger dataset than 10,000 which wouldn't take
weeks to run?

I'm working with an Intel Core i7, 12 Gb RAM (plus a couple of 100 Gbs in
virtual memory), in Windows  7 64-bit.

Thanks for your help,
----------------------------------------
Luis A. Huckstadt, Ph.D.
Department of Ecology and Evolutionary Biology
University of California Santa Cruz
Long Marine Lab
100 Shaffer Road
Santa Cruz, CA  95060

	[[alternative HTML version deleted]]

Ben Bolker

2012-Oct-11 02:33 UTC

head link

[R] glmmPQL and spatial correlation

Luis Huckstadt <lahuckst <at> ucsc.edu> writes:
> I'm running into some computer issues when trying to run a binomial
model
> for spatially correlated data using glmmPQL and was wondering if anyone
> could help me out.
> My whole dataset consists of about 300,000 points for which I have a suite
> of environmental variables (I'm trying to come up with a habitat model
for
> a species of seal, using real (presence) and simulated dives (absence) as
> my response variable).
> Since my dataset is so large, I split it into thirds and ran the model
> without spatial correlation. However, when checking my results, I did get a
> spatial correlation in the residuals, which I'm trying to incorporate
using
> a variogram (spherical). The problem is that  when I run it  for my 1/3 of
> my data (about 70k points), the calculatations go on forever. I was running
> the first model for over a week and was still not getting past the first
> iteration.
> 
> This is the model I used
> 
> M1.f.Spa <- glmmPQL(presence ~ sst + tmax100 + tbot + sss + ssu +
tmax100d
> + sbot, random = ~1|fPTT, family = binomial, correlation >
corSpher(c(91323.53,0.4279603), form =~ x+y, nugget = TRUE), data >
sample.df1)
> 
> This week, I tried a different approach and split the 70k subset into
> smaller datasets of 10,000 points, and now the model runs much faster (just
> a couple of hours tops), Yet the output of these models changes with
> regards to the significance of one variable, which makes me think that I
> need a larger dataset than 10,000 points.
  That's a bit surprising.  When you say the significance of one variable
changes -- does that mean it changes from (e.g.) 0.045 to 0.055, which
crosses the magic line at p=0.05 but is not otherwise
meaningful?> 
> Does someone have a suggestion on how to improve/run this code with the
> spatial correlation for a larger dataset than 10,000 which wouldn't
take
> weeks to run?
> 
> I'm working with an Intel Core i7, 12 Gb RAM (plus a couple of 100 Gbs
in
> virtual memory), in Windows  7 64-bit.
  This is not going to be easy in any case.  I would look into the
INLA package and AD Model Builder ... (I would also be mildly nervous
about using PQL for binary data -- it might be OK, but this is known
to be more or less the worst case scenario for PQL ...)

  Ben Bolker

Maybe Matching Threads

Search for more seemingly similar threads

R help - Oct 2012 - glmmPQL and spatial correlation

[R] glmmPQL and spatial correlation

[R] glmmPQL and spatial correlation

Maybe Matching Threads