Mauro Andreolini
2009-Jan-29 18:51 UTC
[Rd] Arima_Like() and NaN - a (possible) problem, a patch, and RFC
Hi, recently I have started working with R (v. 2.7.2), and I have been using R's internal ARIMA_Like() function (from the "stats" package) to estimate some ARIMA models. In particular, I use ARIMA_Like() in a function "fn()" that I feed to the optim() method; the main goal is to find optimal ARIMA prediction models for some time series. The ARIMA_Like() function returns a three elements vector; under some conditions (that I could not yet spot), the second element of this vector is a 'NaN'. Since fn() is using this value to compute its return value, it suddenly returns 'NaN' and optim() warns me about it: Error in optim(init[mask], armafn, method = "BFGS", hessian = TRUE, control = optim.control, : non-finite finite-difference value [2] I looked into the code (file src/arima.c of the stats package) and noticed that this second element is a sum of logarithmic terms, computed through the following snippet of code: gain = M[0]; for (j = 0; j < d; j++) gain += delta[j] * M[r + j]; if(gain < 1e4) { nu++; ssq += resid * resid / gain; sumlog += log(gain); } Here, sumlog is the second element of the resulting vector. However, the "if(gain < 1e4) {" check does not explicitly check against negative values of the gain variable. Indeed, whenever the gain variable assumes a negative value, the statement "sumlog += log(gain);" evalutes to NaN. I changed the check as follows: if (gain > 0 && gain < 1e4) { This avoids computation of logarithms on negative values. I recompiled and reinstalled R, and the sumlog value is no more 'NaN'. As a result, optim() never warns about the non-finite finite-difference value. Here is my question: does this modification make any sense? Have I missed something big? To me, it looks reasonable to avoid computing log(x) when x < 0, but maybe returning 'NaN' may have its purposes. Could someone please clarify this? I searched the mailing list archives and I could not spot anything even close to this argument, which may be an indication that I am doing something really wrong, but I would like to understand why. Best regards Mauro Andreolini
Mauro Andreolini
2009-Feb-02 11:29 UTC
[Rd] Arima_Like() and NaN - a (possible) problem, a patch, and RFC
Hi, recently I have started working with R (v. 2.7.2), and I have been using R's internal ARIMA_Like() function (from the "stats" package) to estimate some ARIMA models. In particular, I use ARIMA_Like() in a function "fn()" that I feed to the optim() method; the main goal is to find optimal ARIMA prediction models for some time series. The ARIMA_Like() function returns a three elements vector; under some conditions (that I could not yet spot), the second element of this vector is a 'NaN'. Since fn() is using this value to compute its return value, it suddenly returns 'NaN' and optim() warns me about it: Error in optim(init[mask], armafn, method = "BFGS", hessian = TRUE, control = optim.control, : non-finite finite-difference value [2] I looked into the code (file src/arima.c of the stats package) and noticed that this second element is a sum of logarithmic terms, computed through the following snippet of code: gain = M[0]; for (j = 0; j < d; j++) gain += delta[j] * M[r + j]; if(gain < 1e4) { nu++; ssq += resid * resid / gain; sumlog += log(gain); } Here, sumlog is the second element of the resulting vector. However, the "if(gain < 1e4) {" check does not explicitly check against negative values of the gain variable. Indeed, whenever the gain variable assumes a negative value, the statement "sumlog += log(gain);" evalutes to NaN. I changed the check as follows: if (gain > 0 && gain < 1e4) { This avoids computation of logarithms on negative values. I recompiled and reinstalled R, and the sumlog value is no more 'NaN'. As a result, optim() never warns about the non-finite finite-difference value. Here is my question: does this modification make any sense? Have I missed something big? To me, it looks reasonable to avoid computing log(x) when x < 0, but maybe returning 'NaN' may have its purposes. Could someone please clarify this? I searched the mailing list archives and I could not spot anything even close to this argument, which may be an indication that I am doing something really wrong, but I would like to understand why. Best regards Mauro Andreolini