Mauro Andreolini
2009-Jan-29  18:51 UTC
[Rd] Arima_Like() and NaN - a (possible) problem, a patch, and RFC
Hi,
recently I have started working with R (v. 2.7.2), and I have been using
R's internal ARIMA_Like() function (from the "stats" package) to
estimate some ARIMA models. In particular, I use ARIMA_Like() in a
function "fn()" that I feed to the optim() method; the main goal is to
find optimal ARIMA prediction models for some time series.
The ARIMA_Like() function returns a three elements vector; under some
conditions (that I could not yet spot), the second element of this
vector is a 'NaN'. Since fn() is using this value to compute its return
value, it suddenly returns 'NaN' and optim() warns me about it:
Error in optim(init[mask], armafn, method = "BFGS", hessian = TRUE,
control = optim.control,  :
  non-finite finite-difference value [2] 
I looked into the code (file src/arima.c of the stats package) and
noticed that this second element is a sum of logarithmic terms, computed
through the following snippet of code:
gain = M[0];
for (j = 0; j < d; j++) gain += delta[j] * M[r + j];
if(gain < 1e4) {
    nu++;
    ssq += resid * resid / gain;
    sumlog += log(gain);
}
Here, sumlog is the second element of the resulting vector. However, the
"if(gain < 1e4) {" check does not explicitly check against negative
values of the gain variable. Indeed, whenever the gain variable assumes
a negative value, the statement "sumlog += log(gain);" evalutes to
NaN.
I changed the check as follows:
if (gain > 0 && gain < 1e4) {
This avoids computation of logarithms on negative values. I recompiled
and reinstalled R, and the sumlog value is no more 'NaN'. As a result,
optim() never warns about the non-finite finite-difference value.
Here is my question: does this modification make any sense? Have I
missed something big? To me, it looks reasonable to avoid computing
log(x) when x < 0, but maybe returning 'NaN' may have its purposes.
Could someone please clarify this? I searched the mailing list archives
and I could not spot anything even close to this argument, which may be
an indication that I am doing something really wrong, but I would like
to understand why.
Best regards
Mauro Andreolini
Mauro Andreolini
2009-Feb-02  11:29 UTC
[Rd] Arima_Like() and NaN - a (possible) problem, a patch, and RFC
Hi,
recently I have started working with R (v. 2.7.2), and I have been using
R's internal ARIMA_Like() function (from the "stats" package) to
estimate some ARIMA models. In particular, I use ARIMA_Like() in a
function "fn()" that I feed to the optim() method; the main goal is to
find optimal ARIMA prediction models for some time series.
The ARIMA_Like() function returns a three elements vector; under some
conditions (that I could not yet spot), the second element of this
vector is a 'NaN'. Since fn() is using this value to compute its return
value, it suddenly returns 'NaN' and optim() warns me about it:
Error in optim(init[mask], armafn, method = "BFGS", hessian = TRUE,
control = optim.control,  :
  non-finite finite-difference value [2] 
I looked into the code (file src/arima.c of the stats package) and
noticed that this second element is a sum of logarithmic terms, computed
through the following snippet of code:
gain = M[0];
for (j = 0; j < d; j++) gain += delta[j] * M[r + j];
if(gain < 1e4) {
    nu++;
    ssq += resid * resid / gain;
    sumlog += log(gain);
}
Here, sumlog is the second element of the resulting vector. However, the
"if(gain < 1e4) {" check does not explicitly check against negative
values of the gain variable. Indeed, whenever the gain variable assumes
a negative value, the statement "sumlog += log(gain);" evalutes to
NaN.
I changed the check as follows:
if (gain > 0 && gain < 1e4) {
This avoids computation of logarithms on negative values. I recompiled
and reinstalled R, and the sumlog value is no more 'NaN'. As a result,
optim() never warns about the non-finite finite-difference value.
Here is my question: does this modification make any sense? Have I
missed something big? To me, it looks reasonable to avoid computing
log(x) when x < 0, but maybe returning 'NaN' may have its purposes.
Could someone please clarify this? I searched the mailing list archives
and I could not spot anything even close to this argument, which may be
an indication that I am doing something really wrong, but I would like
to understand why.
Best regards
Mauro Andreolini