thr3ads.net - R help - [R] Autocorrelation in linear models [Mar 2011]

If this information is useful, please help other people find it:
Share via:

Arni Magnusson

2011-Mar-16 23:48 UTC

[R] Autocorrelation in linear models

I have been reading about autocorrelation in linear models over the last 
couple of days, and I have to say the more I read, the more confused I 
get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for 
guidance.

Most authors are mainly worried about autocorrelation in the residuals, 
but some authors are also worried about autocorrelation within Y and 
within X vectors before any model is fitted. Would you test for 
autocorrelation both in the data and in the residuals?

If we limit our worries to the residuals, it looks like we have a variety 
of tests for lag=1:

   stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
   stats::Box.test(residuals(fm))
   lmtest::dwtest(fm, alternative="two.sided")
   lmtest::bgtest(fm, type="F")

In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have 
significant _positive_ autocorrelation within Y and within both X vectors, 
but _negative_ autocorrelation in the residuals. The residual 
autocorrelation is not quite significant, with the p-values

   0.070
   0.064
   0.125
   0.077

from the tests above. I seem to remember some authors saying that the 
Durbin-Watson test has less power than some alternative tests, as 
reflected here. The difference in p-values is substantial, so choosing 
which test to use could in many cases make a big difference for the 
subsequent analysis and conclusions. Most of them (cor.test, Box.test, 
bgtest) can also test lags>1. Which test would you recommend? I imagine 
the basic cor.test is somehow inappropriate for this; the other tests 
wouldn't have been invented otherwise, right?

The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a 
very long simulation, which results in a p-value similar to 
lmtest::dwtest, at least in my case.

Finally, one question regarding remedies. If there was significant 
_positive_ autocorrelation in the residuals, some authors suggest 
remedying this by deflating the df (fewer effective df in the data) and 
redo the t-tests of the regression coefficients, rejecting fewer null 
hypotheses. Does that mean if the residuals are _negatively_ correlated 
then I should inflate the df (more effective df in the data) and reject 
more null hypotheses?

That's four question marks. I'd greatly appreciate guidance on any of 
them.

Thanks in advance,

Arni

Ben Bolker

2011-Mar-17 13:18 UTC

head link

[R] Autocorrelation in linear models

Arni Magnusson <arnima <at> hafro.is> writes:
> 
> I have been reading about autocorrelation in linear models over the last 
> couple of days, and I have to say the more I read, the more confused I 
> get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help
for
> guidance.
> 
> Most authors are mainly worried about autocorrelation in the residuals, 
> but some authors are also worried about autocorrelation within Y and 
> within X vectors before any model is fitted. Would you test for 
> autocorrelation both in the data and in the residuals?
   My immediate reaction is that autocorrelation in the raw data
(marginal autocorrelation) is not relevant. (There are exceptions,
of course -- in many ecological systems the marginal autocorrelation
tells us something about the processes driving the system, so we
may want to quantify/estimate it -- but I wouldn't generally think
that *testing* it (e.g. trying to reject a null hypothesis of
ACF=0) makes sense.)
> 
> If we limit our worries to the residuals, it looks like we have a variety 
> of tests for lag=1:
> 
>    stats::cor.test(residuals(fm)[-n], residuals(fm)[-1])
>    stats::Box.test(residuals(fm))
>    lmtest::dwtest(fm, alternative="two.sided")
>    lmtest::bgtest(fm, type="F")
  Note that (I think) all of these tests are based on
lag-1 autocorrelation only (I see you mention this
later).  Have you looked at nlme:::ACF ?  It is possible
to get non-significant autocorrelation at lag 1 with sig.
autocorrelation at higher lags.
> 
> In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have 
> significant _positive_ autocorrelation within Y and within both X vectors, 
> but _negative_ autocorrelation in the residuals. 
  That's plausible. Again, I think the residual autocorrelation
is what you should worry about.
> The residual 
> autocorrelation is not quite significant, with the p-values
> 
>    0.070
>    0.064
>    0.125
>    0.077
> 
> from the tests above. I seem to remember some authors saying that the 
> Durbin-Watson test has less power than some alternative tests, as 
> reflected here. The difference in p-values is substantial,
  ?? I wouldn't necessarily say so -- I would guess you could get this
range of p-values from a single test statistic if you had
multiple simulated data sets from the same underlying model
and parameters ...  Have you tried running such simulations?
> so choosing 
> which test to use could in many cases make a big difference for the 
> subsequent analysis and conclusions. Most of them (cor.test, Box.test, 
> bgtest) can also test lags>1. Which test would you recommend? I imagine 
> the basic cor.test is somehow inappropriate for this; the other tests 
> wouldn't have been invented otherwise, right?
  I don't know the details (it's been a while since I did time
series analysis, and it wasn't in this particular vein.)
> The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a 
> very long simulation, which results in a p-value similar to 
> lmtest::dwtest, at least in my case.
> 
> Finally, one question regarding remedies. If there was significant 
> _positive_ autocorrelation in the residuals, some authors suggest 
> remedying this by deflating the df (fewer effective df in the data) and 
> redo the t-tests of the regression coefficients, rejecting fewer null 
> hypotheses. Does that mean if the residuals are _negatively_ correlated 
> then I should inflate the df (more effective df in the data) and reject 
> more null hypotheses?
   My personal taste is that these df adjustments are bit cheesy.
Most of the time I would prefer to fit a model that incorporated
autocorrelation (i.e. nlme::gls(y~x1+x2,correlation=corAR1()) [or pick
another choice of time-series model from ?corClasses].

  More generally, this whole approach falls into the category of
"test for presence of XX; if XX is not statistically significant
then ignore it", which is worrisome (if your test for XX is very
powerful then you will be concerned about dealing with XX even when
its effect on your results would be trivial; if your test for XX
is weak or you have very little data then you won't detect
XX even when it is present).  I would say that if you're really
concerned about autocorrelation you should just automatically use
a modeling approach (see above) that incorporates it.
> 
> That's four question marks. I'd greatly appreciate guidance on any
of
> them.
> 
> Thanks in advance,
> 
  cheers
    Ben

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Mar 2011 - Autocorrelation in linear models

[R] Autocorrelation in linear models

[R] Autocorrelation in linear models

Possibly Parallel Threads