Worthington, Thomas A
2013-Aug-28 20:26 UTC
[R] Assessing temporal correlation in GAM with irregular time steps
I have constructed a GAM using the package mgcv to test whether the lengths of an emerging insect (Length) varies with day of the year (DOY) and between two sites (SiteCode). The data are collected at irregular time steps ranging from 2 days to 20 days between samples. The GAM takes the form M3 <- gam(Length ~s(DOY, by = SiteCode) + SiteCode) As the data are a time series I would like to test for temporal autocorrelation. I have read that it is not possible to use the autocorrelation function (ACF) due to the irregular spacing and that producing a variogram in relation to DOY would be an option. Is this a correct method to test for temporal autocorrelation? And could someone suggest the code to produce the variogram as I'm getting an error related to the 'distance' argument Best wishes Tom [[alternative HTML version deleted]]
Gavin Simpson
2013-Sep-03 19:16 UTC
[R] Assessing temporal correlation in GAM with irregular time steps
It is possible, but you can't use the discrete time or classical stochastic trend models (or evaluate using the ACF). Also, why do you care to do this with regard to DoY? The assumption of the model relates to the residuals, so you should check those for residual autocorrelation. As you are using `mgcv::gam` you could also use `mgcv::gamm` which can then leverage the correlation structures from the nlme package, which has spatial correlation structures (and you can think of time as a 1-d spatial direction). The package also has a `corCAR1()` correlation structure which is the continuous-time analogue of the AR(1). Fitting via `gamm()` will also allow you to use the `Variogram()` function from the nlme package to assess the model residuals for residual autocorrelation. For example you could compare the two fits m0 <- gamm(Length ~ s(DOY, by = SiteCode) + SiteCode, data = foo, method = "REML") m1 <- gamm(Length ~ s(DOY, by = SiteCode) + SiteCode, data = foo, method = "REML", correlation = corCAR1( ~ bar | SiteCode)) where `foo` is the object that contains the variables mentioned in the call, and `bar` is the variable (in `foo)` that indicates the ordering of the samples. Notice that I nest the CAR(1) within the two respective Sites, but do note IIRC that this fits the same residual correlation structure to both sites' residuals (i.e. there is 1 CAR(1) process, not two separate ones). require(nlme) anova(m0$lme, m1$lme) will perform a likelihood ratio test on the two models. If you have residual autocorrelation, do note that the smooth for DoY may be chosen to be more complex than is appropriate (it might be fitting the autocorrleated noise), so you may want to fix the degrees of freedom for the smoother at some a priori chosen value and use this same value when fitting both m0 and m1, or at the very least set an upper limit on the complexity of the DoY smooth, say via s(DoY, by SiteCode, k = 5). Finally, as a length <= 0 insect makes no sense, the assumption of Gaussian (Normal) errors may be in trouble with your data; apart from their strictly positive nature, the mean-variance relationship of the data may not follow that of the assumptions for the errors. You can move to a GLM (GAM) to account for this but things get very tricky with the correlation structures (you can use gamm() still but fitting then goes via glmmPQL() in the MASS package a thence to lme()). If you just want to fit a variogram to something, there are a large number of spatial packages available for R, several of which can fit variograms to data, though you will need to study their respective help files for how to use them. As for the input data, often the time/date of sampling encoded as a numeric will be sufficient input, but you will need to check individual functions for what they require. I would check out the Spatial Task View on CRAN. HTH G On 28 August 2013 14:26, Worthington, Thomas A <thomas.worthington at okstate.edu> wrote:> I have constructed a GAM using the package mgcv to test whether the lengths of an emerging insect (Length) varies with day of the year (DOY) and between two sites (SiteCode). The data are collected at irregular time steps ranging from 2 days to 20 days between samples. The GAM takes the form > > M3 <- gam(Length ~s(DOY, by = SiteCode) + SiteCode) > > As the data are a time series I would like to test for temporal autocorrelation. I have read that it is not possible to use the autocorrelation function (ACF) due to the irregular spacing and that producing a variogram in relation to DOY would be an option. > > Is this a correct method to test for temporal autocorrelation? > > And could someone suggest the code to produce the variogram as I'm getting an error related to the 'distance' argument > > Best wishes > Tom > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gavin Simpson, PhD