Nicolas Chapados
2011-Jul-04 16:35 UTC
[R] forecast: bias in sampling from seasonal Arima model?
Dear all, I stumbled upon what appears to be a troublesome issue when sampling from an ARIMA model (from Rob Hyndman's excellent 'forecast' package) that contains a seasonal AR component. Here's how to reproduce the issue. (I'm using R 2.9.2 with forecast 2.19; see sessionInfo() below). First some data:> x <- c(0.132475, 0.143119, 0.108104, 0.247291, 0.029510, -0.119591, -0.133313, -0.098128, 0.192698, 0.110328, 0.163671, -0.004925, -0.239209, -0.055122, -0.051121, 0.154108, 0.008665, -0.074702, 0.066534, -0.098728, -0.068668, 0.150935, -0.022547, 0.028625, 0.107092, -0.065396, -0.253247, -0.115240, -0.113535, -0.064191, -0.006032, 0.039233, 0.129013, -0.068462, 0.022398, -0.052427, -0.005586, 0.011447, -0.022667, -0.120536, -0.234398, -0.164087, -0.177160, -0.120624, -0.025104, 0.001144, -0.193424, -0.260674, -0.036976, -0.009590, -0.004920, 0.130545, 0.120527, 0.041121, -0.123321, 0.023836, -0.188418, 0.015807, -0.056012, 0.000496, 0.051806, -0.067574, 0.012775, 0.244083, 0.148857, 0.013874, 0.235252, 0.151935, 0.036986, 0.134482, -0.003359, -0.019422, 0.086195, 0.206569, 0.123565, 0.070835, -0.183189, -0.046513, 0.071920, -0.038360, 0.135293, 0.054746, -0.280340, 0.110638, 0.009729, 0.115541, 0.021397, 0.097835, -0.028434, -0.218416, 0.044552, 0.442563, 0.084317, 0.044149, 0.201100, 0.076112, -0.134955, 0.023870, 0.077111, 0.085490, 0.023154, 0.099757, -0.026509, -0.189839, 0.026614, 0.184916, -0.007266, 0.081276, 0.312526, 0.051199, -0.104707, -0.004206, 0.062440, 0.126385, -0.018100, 0.092513, 0.186459, -0.170184, -0.126168, 0.122739, 0.097495, 0.008633, -0.034519, 0.187264, -0.153409, 0.009440, 0.150561, 0.067744, 0.045129, 0.230831, -0.079700, -0.162694, -0.044251, -0.007663, 0.048986, 0.065724, 0.159706, 0.040067, -0.059949, 0.024810, -0.154852, 0.018080, 0.165935, 0.203050, 0.011035, -0.232585, -0.162248, -0.104872, -0.062516, -0.089766, 0.100304, 0.142170, -0.144969, -0.032500, -0.002131, 0.165890, 0.107629, 0.075752, 0.119003, 0.095955, 0.039842, 0.081208, 0.348529, 0.145694, -0.210700, 0.384966, -0.054503, 0.293329, 0.184295, 0.368986, 0.135270, 0.124917, 0.185286, -0.252088, -0.169708, -0.010204, 0.021934, 0.003572, 0.180148, 0.075836, -0.232065, -0.127255, -0.147122, 0.056163, 0.067004, 0.217810, 0.074513, -0.167389, 0.172578, -0.148127, 0.057025, 0.042623, 0.094214, 0.047004, -0.345453, -0.265104, -0.082897, 0.052705, -0.067002, 0.191941, 0.010989, -0.298567, -0.162841, 0.043773, 0.185459, 0.126305, 0.383101, 0.092747, -0.368453, -0.325097, 0.029564, -0.015390, 0.013807, 0.152062, -0.047015, -0.429245, -0.097742, 0.104502, -0.007547, -0.000245, 0.062830, 0.030093, -0.381043, -0.267704, -0.125930, -0.032264, -0.041657, 0.040073, 0.084431, -0.276316, -0.305253, -0.019942, 0.045390, 0.046090, 0.145700, 0.069920, -0.210079, 0.050967, 0.042283, 0.248840, 0.007883, 0.203171, 0.050722, -0.109773, -0.110301, -0.095433, 0.071133, 0.023793, 0.192476, 0.057746) First, a CORRECT model, containing a seasonal MA component but no seasonal AR component. After estimation, I forecast for 1 time-step, and I take the mean of sampling 10000 times from the same model:> my.arima1 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(0,0,2),period=7), include.mean=FALSE)> forecast(my.arima1, 1)Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 251 -0.03143283 -0.1882245 0.1253589 -0.271225 0.2083594> set.seed(1827) ; mean(sapply(seq_len(10000), function(i)as.numeric(simulate(my.arima1, 1)) )) [1] -0.03258454 The results ("Point Forecast" versus the output of mean()) are identical to some sampling error. Now the INCORRECT model arises from adding one seasonal AR component:> my.arima2 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(1,0,2),period=7), include.mean=FALSE)> forecast(my.arima2, 1)Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 251 -0.1848579 -0.322421 -0.04729492 -0.3952424 0.02552655> set.seed(1827) ; mean(sapply(seq_len(10000), function(i)as.numeric(simulate(my.arima2, 1)) )) [1] -0.05416299 For the results are substantially different (-0.18 versus -0.05), and the latter does not change much if I take a much bigger sample. Did anybody encounter this in the past? Is this a bug? For reference, here are the results of sessionInfo():> sessionInfo()R version 2.9.2 (2009-08-24) x86_64-pc-linux-gnu locale: C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] KernSmooth_2.23-3 digest_0.4.2 forecast_2.19 fracdiff_1.3-2 [5] tseries_0.10-12 zoo_1.4-0 quadprog_1.4-11 glmnet_1.7 [9] Matrix_0.999375-30 biglm_0.7 DBI_0.2-4 inline_0.3.7 [13] XML_2.6-0 timeSeries_2110.86 timeDate_2110.87 RODBC_1.3-2 [17] reshape_0.8.3 plyr_0.1.9 MASS_7.2-48 nnet_7.2-48 [21] latticeExtra_0.5-1 RColorBrewer_1.0-2 lattice_0.17-25 gsubfn_0.5-7 [25] proto_0.3-8 multicore_0.1-3 data.table_1.4.1 loaded via a namespace (and not attached): [1] grid_2.9.2 tools_2.9.2 Many thanks for any help or pointers! + Nicolas Chapados [[alternative HTML version deleted]]
Nicolas Chapados
2011-Jul-06 14:49 UTC
[R] forecast: bias in sampling from seasonal ARIMA model?
[Reposting, since there was problems with encodings the first time around.] Dear all, I stumbled upon what appears to be a troublesome issue when sampling from an ARIMA model (from Rob Hyndman's excellent 'forecast' package) that contains a seasonal AR component. Here's how to reproduce the issue. (I'm using R 2.9.2 with forecast 2.19; see sessionInfo() below). First some data:> x <- c(0.132475, 0.143119, 0.108104, 0.247291, 0.029510, -0.119591, -0.133313, -0.098128, 0.192698, 0.110328, 0.163671, -0.004925, -0.239209, -0.055122, -0.051121, 0.154108, 0.008665, -0.074702, 0.066534, -0.098728, -0.068668, 0.150935, -0.022547, 0.028625, 0.107092, -0.065396, -0.253247, -0.115240, -0.113535, -0.064191, -0.006032, 0.039233, 0.129013, -0.068462, 0.022398, -0.052427, -0.005586, 0.011447, -0.022667, -0.120536, -0.234398, -0.164087, -0.177160, -0.120624, -0.025104, 0.001144, -0.193424, -0.260674, -0.036976, -0.009590, -0.004920, 0.130545, 0.120527, 0.041121, -0.123321, 0.023836, -0.188418, 0.015807, -0.056012, 0.000496, 0.051806, -0.067574, 0.012775, 0.244083, 0.148857, 0.013874, 0.235252, 0.151935, 0.036986, 0.134482, -0.003359, -0.019422, 0.086195, 0.206569, 0.123565, 0.070835, -0.183189, -0.046513, 0.071920, -0.038360, 0.135293, 0.054746, -0.280340, 0.110638, 0.009729, 0.115541, 0.021397, 0.097835, -0.028434, -0.218416, 0.044552, 0.442563, 0.084317, 0.044149, 0.201100, 0.076112, -0.134955, 0.023870, 0.077111, 0.085490, 0.023154, 0.099757, -0.026509, -0.189839, 0.026614, 0.184916, -0.007266, 0.081276, 0.312526, 0.051199, -0.104707, -0.004206, 0.062440, 0.126385, -0.018100, 0.092513, 0.186459, -0.170184, -0.126168, 0.122739, 0.097495, 0.008633, -0.034519, 0.187264, -0.153409, 0.009440, 0.150561, 0.067744, 0.045129, 0.230831, -0.079700, -0.162694, -0.044251, -0.007663, 0.048986, 0.065724, 0.159706, 0.040067, -0.059949, 0.024810, -0.154852, 0.018080, 0.165935, 0.203050, 0.011035, -0.232585, -0.162248, -0.104872, -0.062516, -0.089766, 0.100304, 0.142170, -0.144969, -0.032500, -0.002131, 0.165890, 0.107629, 0.075752, 0.119003, 0.095955, 0.039842, 0.081208, 0.348529, 0.145694, -0.210700, 0.384966, -0.054503, 0.293329, 0.184295, 0.368986, 0.135270, 0.124917, 0.185286, -0.252088, -0.169708, -0.010204, 0.021934, 0.003572, 0.180148, 0.075836, -0.232065, -0.127255, -0.147122, 0.056163, 0.067004, 0.217810, 0.074513, -0.167389, 0.172578, -0.148127, 0.057025, 0.042623, 0.094214, 0.047004, -0.345453, -0.265104, -0.082897, 0.052705, -0.067002, 0.191941, 0.010989, -0.298567, -0.162841, 0.043773, 0.185459, 0.126305, 0.383101, 0.092747, -0.368453, -0.325097, 0.029564, -0.015390, 0.013807, 0.152062, -0.047015, -0.429245, -0.097742, 0.104502, -0.007547, -0.000245, 0.062830, 0.030093, -0.381043, -0.267704, -0.125930, -0.032264, -0.041657, 0.040073, 0.084431, -0.276316, -0.305253, -0.019942, 0.045390, 0.046090, 0.145700, 0.069920, -0.210079, 0.050967, 0.042283, 0.248840, 0.007883, 0.203171, 0.050722, -0.109773, -0.110301, -0.095433, 0.071133, 0.023793, 0.192476, 0.057746) First, a CORRECT model, containing a seasonal MA component but no seasonal AR component. After estimation, I forecast for 1 time-step, and I take the mean of sampling 10000 times from the same model:> my.arima1 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(0,0,2), period=7), include.mean=FALSE) > forecast(my.arima1, 1)Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 251 -0.03143283 -0.1882245 0.1253589 -0.271225 0.2083594> set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima1, 1)) ))[1] -0.03258454 The results ("Point Forecast" versus the output of mean()) are identical to some sampling error. Now the INCORRECT model arises from adding one seasonal AR component:> my.arima2 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(1,0,2), period=7), include.mean=FALSE) > forecast(my.arima2, 1)Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 251 -0.1848579 -0.322421 -0.04729492 -0.3952424 0.02552655> set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima2, 1)) ))[1] -0.05416299 For the results are substantially different (-0.18 versus -0.05), and the latter does not change much if I take a much bigger sample. Did anybody encounter this in the past? Is this a bug? For reference, here are the results of sessionInfo():> sessionInfo()R version 2.9.2 (2009-08-24) x86_64-pc-linux-gnu locale: C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] KernSmooth_2.23-3 digest_0.4.2 forecast_2.19 fracdiff_1.3-2 [5] tseries_0.10-12 zoo_1.4-0 quadprog_1.4-11 glmnet_1.7 [9] Matrix_0.999375-30 biglm_0.7 DBI_0.2-4 inline_0.3.7 [13] XML_2.6-0 timeSeries_2110.86 timeDate_2110.87 RODBC_1.3-2 [17] reshape_0.8.3 plyr_0.1.9 MASS_7.2-48 nnet_7.2-48 [21] latticeExtra_0.5-1 RColorBrewer_1.0-2 lattice_0.17-25 gsubfn_0.5-7 [25] proto_0.3-8 multicore_0.1-3 data.table_1.4.1 loaded via a namespace (and not attached): [1] grid_2.9.2 tools_2.9.2 Many thanks for any help or pointers! + Nicolas Chapados