Dear All, I am puzzled and probably I am misunderstanding something. Please consider the snippet at the end of the email. We see a time series that has clearly some pattern (essentially, it is an account where a salary is regularly paid followed by some expenses). However the output of the auto.arima from the forecast function does not seem to make any sense (at least to me). I wonder if the problem is the fact that the time series is not defined at regular intervals. Any suggestions and alternative ways to fit it (e.g.: sarima from the astsa package to account for the seasonality?) are really welcome. Many thanks Lorenzo ############################################## library(forecast) tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64, 4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05, 1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36, 2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36, 821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39, 800.81, 790.81), index = structure(c(16563L, 16565L, 16570L, 16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L, 16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L, 16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L, 16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L, 16650L, 16651L, 16654L), class = "Date"), class = "zoo") plot(tt) fit<-auto.arima(tt) ###########################################
> On Jan 29, 2016, at 12:59 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote: > > Dear All, > I am puzzled and probably I am misunderstanding something. > Please consider the snippet at the end of the email. > We see a time series that has clearly some pattern (essentially, it is > an account where a salary is regularly paid followed by some > expenses). > However the output of the auto.arima from the forecast function does > not seem to make any sense (at least to me). > I wonder if the problem is the fact that the time series is not > defined at regular intervals. > Any suggestions and alternative ways to fit it (e.g.: sarima from the astsa > package to account for the seasonality?) are really welcome. > Many thanks > > Lorenzo > > > > ############################################## > library(forecast) > > tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64, > 4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05, > 1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36, > 2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36, > 821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39, > 800.81, 790.81), index = structure(c(16563L, 16565L, 16570L, > 16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L, > 16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L, > 16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L, > 16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L, > 16650L, 16651L, 16654L), class = "Date"), class = "zoo") > > plot(tt) >library(forecast)> fit<-auto.arima(tt) > > ###########################################If , after runing plot(tt), you then run: fitted(fit) Time Series: Start = 16563 End = 16654 Frequency = 1 [1] 1448.8211 NA 1444.8612 NA NA NA NA [8] 1398.7752 NA 1359.0350 NA NA NA NA [15] 1309.1398 NA 1219.7420 NA NA NA NA [22] 2302.8903 3708.1762 2713.0349 2603.0512 1968.0100 1819.1484 1725.4634 [29] NA 1572.6179 1593.2628 NA NA NA NA [36] NA 1258.3403 NA NA NA NA NA [43] NA 1184.9656 955.3023 822.7394 NA NA NA [50] 1987.7634 3333.3131 2294.6941 NA NA 1760.6351 1551.5526 [57] 1406.6751 1309.3682 1238.1899 NA NA NA NA [64] NA NA 1251.6898 NA NA NA NA [71] 1179.9970 NA 988.3885 NA NA 888.4533 NA [78] NA NA 889.4017 NA NA NA NA [85] 1970.0911 3152.7668 2032.3935 1799.2350 1126.2794 NA NA [92] 1088.1525 Using that vector: lines(seq(16563 ,16654 ),fitted(fit), col="red", lwd=3) You can see that the fitted values are capturing quite a bit of the variation. I'm not a regular user of pkg:forecast, so there may be more refined methods of extracting information than using `fitted`. -- David Winsemius Alameda, CA, USA
Thanks, But something fishy is going on. The fitted time series is full of missing values, whereas the original tt object does not have any. I suppose that in trying to fit the time series defined on an irregular time grid, some problem arises inside the auto.arima function. Lorenzo On Fri, Jan 29, 2016 at 02:16:27PM -0800, David Winsemius wrote:> >> On Jan 29, 2016, at 12:59 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote: >> >> Dear All, >> I am puzzled and probably I am misunderstanding something. >> Please consider the snippet at the end of the email. >> We see a time series that has clearly some pattern (essentially, it is >> an account where a salary is regularly paid followed by some >> expenses). >> However the output of the auto.arima from the forecast function does >> not seem to make any sense (at least to me). >> I wonder if the problem is the fact that the time series is not >> defined at regular intervals. >> Any suggestions and alternative ways to fit it (e.g.: sarima from the astsa >> package to account for the seasonality?) are really welcome. >> Many thanks >> >> Lorenzo >> >> >> >> ############################################## >> library(forecast) >> >> tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64, >> 4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05, >> 1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36, >> 2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36, >> 821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39, >> 800.81, 790.81), index = structure(c(16563L, 16565L, 16570L, >> 16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L, >> 16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L, >> 16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L, >> 16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L, >> 16650L, 16651L, 16654L), class = "Date"), class = "zoo") >> >> plot(tt) >> > >library(forecast) > >> fit<-auto.arima(tt) >> >> ########################################### > >If , after runing plot(tt), you then run: > > fitted(fit) > >Time Series: >Start = 16563 >End = 16654 >Frequency = 1 > [1] 1448.8211 NA 1444.8612 NA NA NA NA > [8] 1398.7752 NA 1359.0350 NA NA NA NA >[15] 1309.1398 NA 1219.7420 NA NA NA NA >[22] 2302.8903 3708.1762 2713.0349 2603.0512 1968.0100 1819.1484 1725.4634 >[29] NA 1572.6179 1593.2628 NA NA NA NA >[36] NA 1258.3403 NA NA NA NA NA >[43] NA 1184.9656 955.3023 822.7394 NA NA NA >[50] 1987.7634 3333.3131 2294.6941 NA NA 1760.6351 1551.5526 >[57] 1406.6751 1309.3682 1238.1899 NA NA NA NA >[64] NA NA 1251.6898 NA NA NA NA >[71] 1179.9970 NA 988.3885 NA NA 888.4533 NA >[78] NA NA 889.4017 NA NA NA NA >[85] 1970.0911 3152.7668 2032.3935 1799.2350 1126.2794 NA NA >[92] 1088.1525 > > >Using that vector: > >lines(seq(16563 ,16654 ),fitted(fit), col="red", lwd=3) > >You can see that the fitted values are capturing quite a bit of the variation. > > > >I'm not a regular user of pkg:forecast, so there may be more refined methods of extracting information than using `fitted`. > >-- > >David Winsemius >Alameda, CA, USA >
Partially the trouble is that the zoo time series is then translated into a ts object by auto.arima. In doing so, the series along a regular time grid and some missing data appear. To fix this, I should replace each NA with the previous non-NA value. This is easy enough and the series exhibits some clear cycles: roughly every month there is a spike, followed by a decrease, then another spike and so on. I would like to forecast a couple of cycles (60 steps), but when I do so with auto.arima, nothing like what I expect appears (the seasonality is completely lost). Any idea why? I paste below the revised R code for reproducibility. Lorenzo library(forecast) tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64, 4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05, 1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36, 2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36, 821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39, 800.81, 790.81), index = structure(c(16563L, 16565L, 16570L, 16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L, 16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L, 16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L, 16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L, 16650L, 16651L, 16654L), class = "Date"), class = "zoo") tt2<-as.ts(tt) tt2<-na.locf(tt2) mm<-auto.arima(tt2) plot(forecast(mm, h=60)) On Fri, Jan 29, 2016 at 02:16:27PM -0800, David Winsemius wrote:> >> On Jan 29, 2016, at 12:59 PM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote: >> >> Dear All, >> I am puzzled and probably I am misunderstanding something. >> Please consider the snippet at the end of the email. >> We see a time series that has clearly some pattern (essentially, it is >> an account where a salary is regularly paid followed by some >> expenses). >> However the output of the auto.arima from the forecast function does >> not seem to make any sense (at least to me). >> I wonder if the problem is the fact that the time series is not >> defined at regular intervals. >> Any suggestions and alternative ways to fit it (e.g.: sarima from the astsa >> package to account for the seasonality?) are really welcome. >> Many thanks >> >> Lorenzo >> >> >> >> ############################################## >> library(forecast) >> >> tt<-structure(c(1494.5, 1367.57, 1357.57, 1222.23, 1124.02, 1011.64, >> 4575.64, 3201.87, 3050.04, 2173.38, 1967.88, 1838.55, 1666.05, >> 1656.05, 1524.96, 835.96, 775.36, 592.36, 494.15, 4058.15, 2624.36, >> 2448.47, 1598.47, 1398.47, 1264.14, 1165.88, 1053.67, 941.36, >> 821.36, 471.36, 373.15, 259.91, 3808.91, 2262.26, 1940.39, 1011.39, >> 800.81, 790.81), index = structure(c(16563L, 16565L, 16570L, >> 16572L, 16577L, 16579L, 16584L, 16585L, 16586L, 16587L, 16588L, >> 16589L, 16590L, 16592L, 16593L, 16599L, 16606L, 16607L, 16608L, >> 16612L, 16613L, 16614L, 16617L, 16618L, 16619L, 16620L, 16621L, >> 16628L, 16633L, 16635L, 16638L, 16642L, 16647L, 16648L, 16649L, >> 16650L, 16651L, 16654L), class = "Date"), class = "zoo") >> >> plot(tt) >> > >library(forecast) > >> fit<-auto.arima(tt) >> >> ########################################### > >If , after runing plot(tt), you then run: > > fitted(fit) > >Time Series: >Start = 16563 >End = 16654 >Frequency = 1 > [1] 1448.8211 NA 1444.8612 NA NA NA NA > [8] 1398.7752 NA 1359.0350 NA NA NA NA >[15] 1309.1398 NA 1219.7420 NA NA NA NA >[22] 2302.8903 3708.1762 2713.0349 2603.0512 1968.0100 1819.1484 1725.4634 >[29] NA 1572.6179 1593.2628 NA NA NA NA >[36] NA 1258.3403 NA NA NA NA NA >[43] NA 1184.9656 955.3023 822.7394 NA NA NA >[50] 1987.7634 3333.3131 2294.6941 NA NA 1760.6351 1551.5526 >[57] 1406.6751 1309.3682 1238.1899 NA NA NA NA >[64] NA NA 1251.6898 NA NA NA NA >[71] 1179.9970 NA 988.3885 NA NA 888.4533 NA >[78] NA NA 889.4017 NA NA NA NA >[85] 1970.0911 3152.7668 2032.3935 1799.2350 1126.2794 NA NA >[92] 1088.1525 > > >Using that vector: > >lines(seq(16563 ,16654 ),fitted(fit), col="red", lwd=3) > >You can see that the fitted values are capturing quite a bit of the variation. > > > >I'm not a regular user of pkg:forecast, so there may be more refined methods of extracting information than using `fitted`. > >-- > >David Winsemius >Alameda, CA, USA >