thr3ads.net - R help - [R] GAMs and GAMMS with correlated acoustic data [Nov 2008]

If this information is useful, please help other people find it:
Share via:

David M Warner

2008-Nov-15 17:19 UTC

[R] GAMs and GAMMS with correlated acoustic data

Greetings
This is a long email. 

I'm struggling with a data set comprising 2,278 hydroacoustic estimates of 
fish biomass density made along line transects in two lakes (lakes 
Michigan and Huron, three years in each lake).  The data represent 
lakewide surveys in each year and each data point represents the estimate 
for a horizontal interval 1 km in length.

I'm interested in comparing biomass density and bathymetric distribution 
(bottom depth) in the two lakes and there is graphical evidence of a 
non-linear relationship between biomass density and bottom depth.  Hence 
my interest in GAMs.

Predictors of primary interest are lake (factor) and bottom depth 
(continuous).

The fish data are autocorrelated at varying ranges, depending on species 
and year.  I've tested this using correlog (package ncf)

The bottom depth data are also highly autocorrelated.

Because of the autocorrelations in data, autocorrelations in GAM residuals 
(up to 20 lags in some cases), patterns in residual plots from GAM models, 
and very narrow confidence intervals for the predictions, I feel that GAM 
results are biased and have attempted to use GAMM.

Data and procedure examples:
#> fish[1:10, ]
   Transect yaoalebiom yaosmeltbiom yaobloaterbiom year     depth lake  x  
    y interval
1      nn_1  12.019655 34.910370110       2.647370 2005  97.07525    2 
526601.8 4850206        1
2      nn_1  12.164686 35.331548810       3.982028 2005  98.37024    2 
526742.2 4849339        2
3      nn_1  11.176009 32.460052230       1.646604 2005  99.98218    2 
526886.9 4848348        3
4      nn_1   0.000000  0.036457091       5.306225 2005  81.44616    2 
526993.4 4850849        4
5      nn_1  40.808118 10.988825410       3.222485 2005 101.45707    2 
526997.5 4847359        5
6      nn_1   6.273421 18.176753520      18.832348 2005  98.69197    2 
527084.1 4846366        6
7      nn_1   6.225799 16.050983390      66.941892 2005  94.14283    2 
527214.7 4845372        7
8      nn_1   7.322910 19.001196850      47.273341 2005  91.21771    2 
527331.6 4844636        8
9      nn_1   0.000000  0.067646462      20.912908 2005  87.76123    2 
527495.9 4843390        9
10     nn_1   0.000000  0.006012106      26.611785 2005  87.59767    2 
527606.6 4842426       10

#GAM example
bloat.gam8 <- gam(log10(yaobloaterbiom+0.00325) ~ lakef +s(depth, 
by=lakef), data=fish3)

#GAMM example:
bloat.gamm1 <- gamm(log10(yaobloaterbiom+0.00325) ~ lakef +  s(depth, 
by=lakef), correlation=corAR1(form = ~ interval|tranf), data=fish3)

However, GAMM results from models including a wide variety of correlation 
structures (corExp, CorSpher, CorLin, AR1, ARMA) produce autocorrelated 
residuals (similar lag range as GAM), patterns in residuals plots, and 
confidence intervals for predictions that are only slightly large than for 
GAMs.  This suggests to me that GAMM is not performing much better than 
GAM (or I've not specified models correctly).

Is my assessment of the GAMM performance reasonable?  None of the models 
(GAM or GAMM) explain much of the deviance (~20%).

I'm interested in an information-theoretic approach to selecting the best 
model from a set of possible models (AICc, dAICc, AICc weights), but 
cannot run some of the GAM models with GAM because they lack a random 
term.  I'm not sure how to use the GAMM output to compare the models I can 
run with this procedure.

Finally, as a last resort, I've subsampled the original data set so that I 
have 1 record per transect per lake per year for a total N=99.

I get different "best models" from GAM (original data) GAMM (original
data
but including correlation structure), and GAM (subsetted data).  Selection 
of different models leads to fairly different conclusions about the 
similarities and differences between the lakes.

I'm not sure where to go with this as a result. 

Any thoughts/comments would be appreciated. 
Dave


 



David Warner
Research Fishery Biologist
USGS Great Lakes Science Center
1451 Green Road
Ann Arbor MI 48105
734.214.9392
	[[alternative HTML version deleted]]

Simon Wood

2008-Nov-17 09:46 UTC

head link

[R] GAMs and GAMMS with correlated acoustic data

David, 

Are you using the normalized residuals from the $lme part of the gam object 
(i.e. something like residuals(foo$lme,type="normalized"))? Without 
standardization the raw residuals will look pretty much as bad for the gamm 
as they did for the gam (actually they might even lookl a little worse).

best,
simon 

On Saturday 15 November 2008 17:19, David M Warner
wrote:> Greetings
> This is a long email.
>
> I'm struggling with a data set comprising 2,278 hydroacoustic estimates
of
> fish biomass density made along line transects in two lakes (lakes
> Michigan and Huron, three years in each lake).  The data represent
> lakewide surveys in each year and each data point represents the estimate
> for a horizontal interval 1 km in length.
>
> I'm interested in comparing biomass density and bathymetric
distribution
> (bottom depth) in the two lakes and there is graphical evidence of a
> non-linear relationship between biomass density and bottom depth.  Hence
> my interest in GAMs.
>
> Predictors of primary interest are lake (factor) and bottom depth
> (continuous).
>
> The fish data are autocorrelated at varying ranges, depending on species
> and year.  I've tested this using correlog (package ncf)
>
> The bottom depth data are also highly autocorrelated.
>
> Because of the autocorrelations in data, autocorrelations in GAM residuals
> (up to 20 lags in some cases), patterns in residual plots from GAM models,
> and very narrow confidence intervals for the predictions, I feel that GAM
> results are biased and have attempted to use GAMM.
>
> Data and procedure examples:
> #> fish[1:10, ]
>    Transect yaoalebiom yaosmeltbiom yaobloaterbiom year     depth lake  x
>     y interval
> 1      nn_1  12.019655 34.910370110       2.647370 2005  97.07525    2
> 526601.8 4850206        1
> 2      nn_1  12.164686 35.331548810       3.982028 2005  98.37024    2
> 526742.2 4849339        2
> 3      nn_1  11.176009 32.460052230       1.646604 2005  99.98218    2
> 526886.9 4848348        3
> 4      nn_1   0.000000  0.036457091       5.306225 2005  81.44616    2
> 526993.4 4850849        4
> 5      nn_1  40.808118 10.988825410       3.222485 2005 101.45707    2
> 526997.5 4847359        5
> 6      nn_1   6.273421 18.176753520      18.832348 2005  98.69197    2
> 527084.1 4846366        6
> 7      nn_1   6.225799 16.050983390      66.941892 2005  94.14283    2
> 527214.7 4845372        7
> 8      nn_1   7.322910 19.001196850      47.273341 2005  91.21771    2
> 527331.6 4844636        8
> 9      nn_1   0.000000  0.067646462      20.912908 2005  87.76123    2
> 527495.9 4843390        9
> 10     nn_1   0.000000  0.006012106      26.611785 2005  87.59767    2
> 527606.6 4842426       10
>
> #GAM example
> bloat.gam8 <- gam(log10(yaobloaterbiom+0.00325) ~ lakef +s(depth,
> by=lakef), data=fish3)
>
> #GAMM example:
> bloat.gamm1 <- gamm(log10(yaobloaterbiom+0.00325) ~ lakef +  s(depth,
> by=lakef), correlation=corAR1(form = ~ interval|tranf), data=fish3)
>
> However, GAMM results from models including a wide variety of correlation
> structures (corExp, CorSpher, CorLin, AR1, ARMA) produce autocorrelated
> residuals (similar lag range as GAM), patterns in residuals plots, and
> confidence intervals for predictions that are only slightly large than for
> GAMs.  This suggests to me that GAMM is not performing much better than
> GAM (or I've not specified models correctly).
>
> Is my assessment of the GAMM performance reasonable?  None of the models
> (GAM or GAMM) explain much of the deviance (~20%).
>
> I'm interested in an information-theoretic approach to selecting the
best
> model from a set of possible models (AICc, dAICc, AICc weights), but
> cannot run some of the GAM models with GAM because they lack a random
> term.  I'm not sure how to use the GAMM output to compare the models I
can
> run with this procedure.
>
> Finally, as a last resort, I've subsampled the original data set so
that I
> have 1 record per transect per lake per year for a total N=99.
>
> I get different "best models" from GAM (original data) GAMM
(original data
> but including correlation structure), and GAM (subsetted data).  Selection
> of different models leads to fairly different conclusions about the
> similarities and differences between the lakes.
>
> I'm not sure where to go with this as a result.
>
> Any thoughts/comments would be appreciated.
> Dave
>
>
>
>
>
>
> David Warner
> Research Fishery Biologist
> USGS Great Lakes Science Center
> 1451 Green Road
> Ann Arbor MI 48105
> 734.214.9392
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.
-- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

Possibly Parallel Threads

Search for more reasonably related threads

R help - Nov 2008 - GAMs and GAMMS with correlated acoustic data

[R] GAMs and GAMMS with correlated acoustic data

[R] GAMs and GAMMS with correlated acoustic data

Possibly Parallel Threads