thr3ads.net - R devel - [Rd] Fix for bug in arima function [May 2015]

If this information is useful, please help other people find it:
Share via:

Patrick Perry

2015-May-20 20:41 UTC

[Rd] Fix for bug in arima function

I noticed that the 3.2.1 release cycle is about to start.  Is there any
chance that this fix will make it into the next version of R?

This bug is fairly serious: getting the wrong variance estimate leads to
the wrong log-likelihood and the wrong AIC, BIC etc, which can and does
lead to suboptimal model selection.  If it's not fixed, this issue will
affect every student taking our time series course in Fall 2015 (and
probably lots of other students in other time series courses).  When I
taught time series in Spring 2015, I had to teach students how to work
around the bug, which wasted class time and shook student confidence in R.
It'd be great if we didn?t have to deal with this issue next semester.

Again, the fix is trivial:

--- a/src/library/stats/R/arima.R
+++ b/src/library/stats/R/arima.R
@@ -211,8 +211,10 @@ arima <- function(x, order = c(0L, 0L, 0L),
         if(fit$rank == 0L) {
             ## Degenerate model. Proceed anyway so as not to break old code
             fit <- lm(x ~ xreg - 1, na.action = na.omit)
+            n.used <- sum(!is.na(resid(fit))) - length(Delta)
+        } else {
+            n.used <- sum(!is.na(resid(fit)))
         }
-        n.used <- sum(!is.na(resid(fit))) - length(Delta)
         init0 <- c(init0, coef(fit))
         ses <- summary(fit)$coefficients[, 2L]
         parscale <- c(parscale, 10 * ses)


The patch that introduced the bug (
https://github.com/wch/r-source/commit/32f633885a903bc422537dc426644f743cc645e0
) was designed to change the initialization for the optimization routine.
The proposed fix leaves the deliberate part of the patch unchanged (it
preserves the value of "init0").

--

Patrick Perry
Assistant Professor
Stern School of Business
New York University





On Tue, Apr 21, 2015 at 7:34 AM, peter dalgaard <pdalgd at gmail.com>
wrote:
> The bug repository is like an elephant: It doesn't forget, but the
> gestation period is long.
>
> In the present case, it is clear that something is not right, but someone
> needs to have sufficient recall and insight to check that your proposed fix
> is not unfixing a deliberate change. We should get to it eventually. (For
> some value of "we" not including "me"...)
>
> -pd
>
> On 20 Apr 2015, at 18:34 , Patrick Perry <pperry at stern.nyu.edu>
wrote:
>
> > There is currently a bug in the arima function. Namely, for arima
models
> with differencing or seasonal differencing, the innovation variance
> estimator uses the wrong denominator whenever xreg is non-null. This is the
> case, for example, when fitting an ARIMA(p,1,q) model with a drift term
> (common in financial applications). I reported the bug (and a fix) at
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16278 , but my
> report may have fallen through the cracks due to the timing around the
> 3.2.0 release.
> >
> > The bug was introduced in the patch displayed here:
> >
> >
>
https://github.com/wch/r-source/commit/32f633885a903bc422537dc426644f743cc645e0
> >
> > The fix is very simple:
> >
> >
>
https://github.com/patperry/r-source/commit/c1701c05ad91d5631eef196c2007ad9897b01f85
> >
> > I?ve posted a script that demonstrates the bug at
> >
> > https://gist.github.com/patperry/90a388b056e09cf6a51b
> >
> > Please let me know if there?s anything I can do to help get this fix
> incorporated.
> >
> >
> > --
> > Patrick Perry
> > Assistant Professor
> > Stern School of Business
> > New York University
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
>
>
	[[alternative HTML version deleted]]

Martin Maechler

2015-May-21 08:35 UTC

head link

[Rd] Fix for bug in arima function

> I noticed that the 3.2.1 release cycle is about to start.  Is there any
> chance that this fix will make it into the next version of R?
> 
> This bug is fairly serious: getting the wrong variance estimate leads to
> the wrong log-likelihood and the wrong AIC, BIC etc, which can and does
> lead to suboptimal model selection.  If it's not fixed, this issue will
> affect every student taking our time series course in Fall 2015 (and
> probably lots of other students in other time series courses).  When I
> taught time series in Spring 2015, I had to teach students how to work
> around the bug, which wasted class time and shook student confidence in R.
> It'd be great if we didn?t have to deal with this issue next semester.
> 
> Again, the fix is trivial:
> 
> --- a/src/library/stats/R/arima.R
> +++ b/src/library/stats/R/arima.R
> @@ -211,8 +211,10 @@ arima <- function(x, order = c(0L, 0L, 0L),
>          if(fit$rank == 0L) {
>              ## Degenerate model. Proceed anyway so as not to break old
code
>              fit <- lm(x ~ xreg - 1, na.action = na.omit)
> +            n.used <- sum(!is.na(resid(fit))) - length(Delta)
> +        } else {
> +            n.used <- sum(!is.na(resid(fit)))
>          }
> -        n.used <- sum(!is.na(resid(fit))) - length(Delta)
>          init0 <- c(init0, coef(fit))
>          ses <- summary(fit)$coefficients[, 2L]
>          parscale <- c(parscale, 10 * ses)
> 
Yes, such a change *is* small in the source code.
But we have to be sure about  its desirability.

In another post about this you mention "REML", and I think we
really are discussing if variance estimates should use a
denominator of  'n'  or 'n - p' in this case.

> The patch that introduced the bug (
>
https://github.com/wch/r-source/commit/32f633885a903bc422537dc426644f743cc645e0
> ) was designed to change the initialization for the optimization routine.
> The proposed fix leaves the deliberate part of the patch unchanged (it
> preserves the value of "init0").
I can confirm this... a change introduced in R 3.0.2.

I'm about to commit changes ... after also adding a proper
regression test.

Martin Maechler, ETH Zurich

peter dalgaard

2015-May-21 09:03 UTC

head link

[Rd] Fix for bug in arima function

On 21 May 2015, at 10:35 , Martin Maechler <maechler at
lynne.stat.math.ethz.ch> wrote:
>> 
>> I noticed that the 3.2.1 release cycle is about to start.  Is there any
>> chance that this fix will make it into the next version of R?
>> 
>> This bug is fairly serious: getting the wrong variance estimate leads
to
>> the wrong log-likelihood and the wrong AIC, BIC etc, which can and does
>> lead to suboptimal model selection.  If it's not fixed, this issue
will
>> affect every student taking our time series course in Fall 2015 (and
>> probably lots of other students in other time series courses).  When I
>> taught time series in Spring 2015, I had to teach students how to work
>> around the bug, which wasted class time and shook student confidence in
R.
>> It'd be great if we didn?t have to deal with this issue next
semester.
>> 
>> Again, the fix is trivial:
>> 
>> --- a/src/library/stats/R/arima.R
>> +++ b/src/library/stats/R/arima.R
>> @@ -211,8 +211,10 @@ arima <- function(x, order = c(0L, 0L, 0L),
>>         if(fit$rank == 0L) {
>>             ## Degenerate model. Proceed anyway so as not to break old
code
>>             fit <- lm(x ~ xreg - 1, na.action = na.omit)
>> +            n.used <- sum(!is.na(resid(fit))) - length(Delta)
>> +        } else {
>> +            n.used <- sum(!is.na(resid(fit)))
>>         }
>> -        n.used <- sum(!is.na(resid(fit))) - length(Delta)
>>         init0 <- c(init0, coef(fit))
>>         ses <- summary(fit)$coefficients[, 2L]
>>         parscale <- c(parscale, 10 * ses)
>> 
> 
> Yes, such a change *is* small in the source code.
> But we have to be sure about  its desirability.
> 
> In another post about this you mention "REML", and I think we
> really are discussing if variance estimates should use a
> denominator of  'n'  or 'n - p' in this case.
> 
> 
>> The patch that introduced the bug (
>>
https://github.com/wch/r-source/commit/32f633885a903bc422537dc426644f743cc645e0
>> ) was designed to change the initialization for the optimization
routine.
> 
>> The proposed fix leaves the deliberate part of the patch unchanged (it
>> preserves the value of "init0").
> 
> I can confirm this... a change introduced in R 3.0.2.
> 
> I'm about to commit changes ... after also adding a proper
> regression test.
> 
Be careful here! I was just about to say that the diagnosis is dubious, and that
the patch could very well be wrong!!

AFAICT, the issue is that n.used got changed from being based on lm(x~...) to
lm(dx~...) where dx is the differenced series. Now that surely loses one
observation in arima(.,1,.), most likely unintentionally, but it is not at all
clear that the fix is not to subtract length(Delta) -- that code has been there
long before the changes in 3.0.2.

I'd expect that a safer fix would be to add back the orders of the the two
differencing operations.
> Martin Maechler, ETH Zurich
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Maybe Matching Threads

Search for more possibly parallel threads

R devel - May 2015 - Fix for bug in arima function

[Rd] Fix for bug in arima function

[Rd] Fix for bug in arima function

[Rd] Fix for bug in arima function

Maybe Matching Threads