thr3ads.net - R help - [R] Choosing the optimum lag order of ARIMA model [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Megh Dal

2007-Aug-31 08:17 UTC

[R] Choosing the optimum lag order of ARIMA model

Dear all R users,
   
  I am really struggling to determine the most appropriate lag order of ARIMA
model. My understanding is that, as for MA [q] model the auto correlation coeff
vanishes after q lag, it says the MA order of a ARIMA model, and for a AR[p]
model partial autocorrelation vanishes after p lags it helps to determine the AR
lag. And most appropriate model choosed by this argument gives min AIC.
   
  Now I considered following data :
   
  2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176
2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068
2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631
3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928
2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365
3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059
3.1997 3.038 3.1613 3.2802 3.2194     
   
  ACF for 1st diff series:
  Autocorrelations of series 'diff(data1)', by lag
       0      1      2      3      4      5      6      7      8      9     10 
 1.000 -0.022 -0.258 -0.016  0.066  0.034  0.035 -0.001 -0.089  0.028  0.222 
    11     12     13     14     15     16     17     18 
-0.132 -0.184 -0.038  0.048 -0.026 -0.041 -0.067  0.059 
   
    PACF for 1st diff series:
  Partial autocorrelations of series 'diff(data1)', by lag
       1      2      3      4      5      6      7      8      9     10     11 
-0.022 -0.258 -0.031 -0.002  0.026  0.057  0.021 -0.069  0.029  0.194 -0.124 
    12     13     14     15     16     17     18 
-0.100 -0.111 -0.043 -0.078 -0.056 -0.085  0.086 

  On basis of that I choose ARIMA[2,1,2] for the original data
   
  But I got error while doing that :
   
  > arima(data1, c(2,1,2))
Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS
   
  And AIC for other combination of lags are:
  > arima(data1, c(2,1,1))$aic
[1] -84.83648> arima(data1, c(1,1,2))$aic
[1] -84.35737> arima(data1, c(1,1,1))$aic[1] -83.79392

  Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then the first
rule that I said earlier does not support.
   
  Am I making anything wrong? Can anyone give me any suggestion on what is the
"universal" rule for choosing the best lag?
   
  Regards,


  
 
   
   

       
---------------------------------


	[[alternative HTML version deleted]]

Prof Brian Ripley

2007-Aug-31 08:38 UTC

head link

[R] Choosing the optimum lag order of ARIMA model

On Fri, 31 Aug 2007, Megh Dal wrote:
> Dear all R users,
>
>  I am really struggling to determine the most appropriate lag order of 
> ARIMA model. My understanding is that, as for MA [q] model the auto 
> correlation coeff vanishes after q lag, it says the MA order of a ARIMA 
> model, and for a AR[p] model partial autocorrelation vanishes after p 
> lags it helps to determine the AR lag. And most appropriate model 
> choosed by this argument gives min AIC.
The last part is fallacious.  Also, you are applying your rules to 
selecting the orders in ARMA models, and they apply only to pure MA or AR 
models.

The R test file src/library/stats/tests/ts-tests.R has an example of model 
selection by AIC.
>
>  Now I considered following data :
>
>  2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176
> 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068
> 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631
> 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928
> 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365
> 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059
> 3.1997 3.038 3.1613 3.2802 3.2194
>
>  ACF for 1st diff series:
>  Autocorrelations of series 'diff(data1)', by lag
>       0      1      2      3      4      5      6      7      8      9    
10
> 1.000 -0.022 -0.258 -0.016  0.066  0.034  0.035 -0.001 -0.089  0.028  0.222
>    11     12     13     14     15     16     17     18
> -0.132 -0.184 -0.038  0.048 -0.026 -0.041 -0.067  0.059
>
>    PACF for 1st diff series:
>  Partial autocorrelations of series 'diff(data1)', by lag
>       1      2      3      4      5      6      7      8      9     10    
11
> -0.022 -0.258 -0.031 -0.002  0.026  0.057  0.021 -0.069  0.029  0.194
-0.124
>    12     13     14     15     16     17     18
> -0.100 -0.111 -0.043 -0.078 -0.056 -0.085  0.086
>
>  On basis of that I choose ARIMA[2,1,2] for the original data
>
>  But I got error while doing that :
>
>  > arima(data1, c(2,1,2))
> Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS
>
>  And AIC for other combination of lags are:
>  > arima(data1, c(2,1,1))$aic
> [1] -84.83648
>> arima(data1, c(1,1,2))$aic
> [1] -84.35737
>> arima(data1, c(1,1,1))$aic
> [1] -83.79392
>
>  Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then the 
> first rule that I said earlier does not support.
>
>  Am I making anything wrong? Can anyone give me any suggestion on what 
> is the "universal" rule for choosing the best lag?
>
>  Regards,
>
>
>
>
>
>
>
>
> ---------------------------------
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Megh Dal

2007-Sep-01 05:19 UTC

head link

[R] Choosing the optimum lag order of ARIMA model

Hi Leed, I got your point. Hence if I see both acf and pacf vanish after 3 then
I should try for all possible models and then choose that model giving min aic?
   i.e. (1,3), (3,1), (3,3), (2,3), (3,2), (1,2), (2,1), (1,1), and (2,2)?
   
  And my second doubt is : for the particular dataset that I provided, I got
nothing when I run arima(data, order=c(2,1,2)) however arima(diff(data),
order=c(2,0,2)) gives no problem :
   
  > arima(data, order=c(2,1,2))
Error in arima(data, order = c(2, 1, 2)) : 
        non-stationary AR part from CSS> arima(diff(data), order=c(2,0,2))  Call:
arima(x = diff(data), order = c(2, 0, 2))
  Coefficients:
         ar1      ar2      ma1     ma2  intercept
      0.1093  -0.3111  -0.1438  0.0632     0.0157
s.e.  0.5378   0.4464   0.5661  0.4796     0.0111
  sigma^2 estimated as 0.01329:  log likelihood = 47.38,  aic = -82.76

   
  Can anyone tell me what is the wrong there?
   
  Regars,

"Leeds, Mark (IED)" <Mark.Leeds@morganstanley.com> wrote:
  what ripley says below is kind of related to what I said about p and q
both being greater than 1 being very unlikely.
He's also right in that those "rules" only work in the sense that,
if
the acf drops off after q lags, then the
Implication is that p = 0
And if they pacf drops off after p lags, then it's implied that q = 0.
when the model is mixed, it's more complicated and
Mixed models are more rare than common but they could end up being the
best model. That's another place where
The aic can be used. In other words, if it looks like your acf drops off
after 1 and your pacf drops off after
1, then it could be a p = 1 and q =1 model but then the aic should be
checked against ( p =1 and q = 0 )
And p = 0 and q = 1 ) because the selection of p = 1 and q = 1 is really
flawed because the rules don't really
Hold when BOTH p and q are non zero.


-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Prof Brian Ripley
Sent: Friday, August 31, 2007 4:38 AM
To: Megh Dal
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Choosing the optimum lag order of ARIMA model

On Fri, 31 Aug 2007, Megh Dal wrote:
> Dear all R users,
>
> I am really struggling to determine the most appropriate lag order of
> ARIMA model. My understanding is that, as for MA [q] model the auto 
> correlation coeff vanishes after q lag, it says the MA order of a 
> ARIMA model, and for a AR[p] model partial autocorrelation vanishes 
> after p lags it helps to determine the AR lag. And most appropriate 
> model choosed by this argument gives min AIC.
The last part is fallacious. Also, you are applying your rules to
selecting the orders in ARMA models, and they apply only to pure MA or
AR models.

The R test file src/library/stats/tests/ts-tests.R has an example of
model selection by AIC.
>
> Now I considered following data :
>
> 2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176
> 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068
> 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631
> 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928
> 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365
> 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059
> 3.1997 3.038 3.1613 3.2802 3.2194
>
> ACF for 1st diff series:
> Autocorrelations of series 'diff(data1)', by lag
> 0 1 2 3 4 5 6 7 8 9
10> 1.000 -0.022 -0.258 -0.016 0.066 0.034 0.035 -0.001 -0.089 0.028
0.222> 11 12 13 14 15 16 17 18
> -0.132 -0.184 -0.038 0.048 -0.026 -0.041 -0.067 0.059
>
> PACF for 1st diff series:
> Partial autocorrelations of series 'diff(data1)', by lag
> 1 2 3 4 5 6 7 8 9 10
11> -0.022 -0.258 -0.031 -0.002 0.026 0.057 0.021 -0.069 0.029 0.194
-0.124> 12 13 14 15 16 17 18
> -0.100 -0.111 -0.043 -0.078 -0.056 -0.085 0.086
>
> On basis of that I choose ARIMA[2,1,2] for the original data
>
> But I got error while doing that :
>
> > arima(data1, c(2,1,2))
> Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS
>
> And AIC for other combination of lags are:
> > arima(data1, c(2,1,1))$aic
> [1] -84.83648
>> arima(data1, c(1,1,2))$aic
> [1] -84.35737
>> arima(data1, c(1,1,1))$aic
> [1] -83.79392
>
> Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then 
> the first rule that I said earlier does not support.
>
> Am I making anything wrong? Can anyone give me any suggestion on what
> is the "universal" rule for choosing the best lag?
>
> Regards,
>
>
>
>
>
>
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--------------------------------------------------------

This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

Megh Dal

2007-Sep-05 08:45 UTC

head link

[R] Choosing the optimum lag order of ARIMA model

Hi Leeds, Thanx for this reply. Actually I did not want to know whether any
differentiation is needed or not. My question was that : what is the difference
between two models :

  arima(data, c(2,1,2))

  and 

  arima(diff(data), c(2,0,2))

  If I am correct then those two models are same. Therefore I should get same
results for both of the cases. Am I doing something wrong?

"Leeds, Mark (IED)" <Mark.Leeds@morganstanley.com> wrote:
  you shouldn't just do a diff because the non diffed version gives you
an error. I don't know what that error means
but you definitely can't just ignore it and go to taking a difference.
Why don't you do an acf plot of the non diffed series and see 
if the acf doesn't die out quickly. If it doesn't, then it's
probably
okay to assume you need to difference it.
if you check out the source of the function, that might gives hintsa
about what the error means.

Whayt you say below about looking at combinations is okay but remember
that picking a model is an art rather than
S science. Maybe an arima(2,1,2) is the best model based on model
selection and aic but it gives forecasts that
Are very poor. Parsimony ( fewer parameters ) is stressed by boix and
jenkins so, when in doubt,, choose a lower order model
when all else fails. The series may not have an perfect arima
represenation so nothing is going to be perfect.

-----Original Message-----
From: Megh Dal [mailto:megh700004@yahoo.com] 
Sent: Saturday, September 01, 2007 1:20 AM
To: Leeds, Mark (IED)
Cc: r-help@stat.math.ethz.ch
Subject: RE: [R] Choosing the optimum lag order of ARIMA model

Hi Leed, I got your point. Hence if I see both acf and pacf vanish after
3 then I should try for all possible models and then choose that model
giving min aic?
i.e. (1,3), (3,1), (3,3), (2,3), (3,2), (1,2), (2,1), (1,1), and (2,2)?

And my second doubt is : for the particular dataset that I provided, I
got nothing when I run arima(data, order=c(2,1,2)) however
arima(diff(data), order=c(2,0,2)) gives no problem :
> arima(data, order=c(2,1,2))Error in arima(data, order = c(2, 1, 2)) : 
non-stationary AR part from CSS> arima(diff(data), order=c(2,0,2))Call:
arima(x = diff(data), order = c(2, 0, 2))
Coefficients:
ar1 ar2 ma1 ma2 intercept
0.1093 -0.3111 -0.1438 0.0632 0.0157
s.e. 0.5378 0.4464 0.5661 0.4796 0.0111
sigma^2 estimated as 0.01329: log likelihood = 47.38, aic = -82.76

Can anyone tell me what is the wrong there?

Regars,

"Leeds, Mark (IED)" wrote:

what ripley says below is kind of related to what I said about p
and q
both being greater than 1 being very unlikely.
He's also right in that those "rules" only work in the sense
that, if
the acf drops off after q lags, then the
Implication is that p = 0
And if they pacf drops off after p lags, then it's implied that
q = 0.
when the model is mixed, it's more complicated and
Mixed models are more rare than common but they could end up
being the
best model. That's another place where
The aic can be used. In other words, if it looks like your acf
drops off
after 1 and your pacf drops off after
1, then it could be a p = 1 and q =1 model but then the aic
should be
checked against ( p =1 and q = 0 )
And p = 0 and q = 1 ) because the selection of p = 1 and q = 1
is really
flawed because the rules don't really
Hold when BOTH p and q are non zero.

-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch
[mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Prof
Brian Ripley
Sent: Friday, August 31, 2007 4:38 AM
To: Megh Dal
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Choosing the optimum lag order of ARIMA model

On Fri, 31 Aug 2007, Megh Dal wrote:
> Dear all R users,
>
> I am really struggling to determine the most appropriate lagorder of
> ARIMA model. My understanding is that, as for MA [q] model the
auto > correlation coeff vanishes after q lag, it says the MA order
of a > ARIMA model, and for a AR[p] model partial autocorrelation
vanishes > after p lags it helps to determine the AR lag. And most
appropriate > model choosed by this argument gives min AIC.
The last part is fallacious. Also, you are applying your rules
to
selecting the orders in ARMA models, and they apply only to pure
MA or
AR models.

The R test file src/library/stats/tests/ts-tests.R has an
example of
model selection by AIC.
>
> Now I considered following data :
>
> 2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727
2.5176> 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452
3.068> 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886
2.9631> 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701
2.6928> 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724
3.1365> 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803
3.2059> 3.1997 3.038 3.1613 3.2802 3.2194
>
> ACF for 1st diff series:
> Autocorrelations of series 'diff(data1)', by lag
> 0 1 2 3 4 5 6 7 8 9
10> 1.000 -0.022 -0.258 -0.016 0.066 0.034 0.035 -0.001 -0.0890.028
0.222> 11 12 13 14 15 16 17 18
> -0.132 -0.184 -0.038 0.048 -0.026 -0.041 -0.067 0.059
>
> PACF for 1st diff series:
> Partial autocorrelations of series 'diff(data1)', by lag
> 1 2 3 4 5 6 7 8 9 10
11> -0.022 -0.258 -0.031 -0.002 0.026 0.057 0.021 -0.069 0.0290.194
-0.124> 12 13 14 15 16 17 18
> -0.100 -0.111 -0.043 -0.078 -0.056 -0.085 0.086
>
> On basis of that I choose ARIMA[2,1,2] for the original data
>
> But I got error while doing that :
>
> > arima(data1, c(2,1,2))
> Error in arima(data1, c(2, 1, 2)) : non-stationary AR part
from CSS>
> And AIC for other combination of lags are:
> > arima(data1, c(2,1,1))$aic
> [1] -84.83648
>> arima(data1, c(1,1,2))$aic
> [1] -84.35737
>> arima(data1, c(1,1,1))$aic
> [1] -83.79392
>
> Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model,
then > the first rule that I said earlier does not support.
>
> Am I making anything wrong? Can anyone give me any suggestionon what
> is the "universal" rule for choosing the best lag?
>
> Regards,
>
>
>
>
>
>
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
code.>
-- 
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.
--------------------------------------------------------

This is not an offer (or solicitation of an offer) to buy/sell
the securities/instruments mentioned or an official confirmation. Morgan
Stanley may deal as principal in or own or act as market maker for
securities/instruments mentioned or may advise the issuers. This is not
research and is not from MS Research but it may refer to a research
analyst/research report. Unless indicated, these views are the author's
and may differ from those of Morgan Stanley research or others in the
Firm. We do not represent this is accurate or complete and we may not
update this. Past performance is not indicative of future returns. For
additional information, research reports and important disclosures,
contact me or see https://secure.ms.com/servlet/cls. You should not use
e-mail to request, authorize or effect the purchase or sale of any
security or instrument, to send transfer instructions, or to effect any
other transactions. We cannot guarantee that any such requests received
via e-mail will be processed in a timely manner. This communication is
solely for the addressee(s) and may contain confidential information. We
do not waive confidentiality by mistransmission. Contact me if you do
not wish to receive these communications. In the UK, this communication
is directed in the UK to those persons who are market counterparties or
intermediate customers (as defined in the UK Financial Services
Authority's rules).

________________________________

Looking for a deal? Find great prices on flights and hotels
MTFicDJoNDllBF9TAzk3NDA3NTg5BHBvcwMxMwRzZWMDZ3JvdXBzBHNsawNlbWFpbC1uY20-

--------------------------------------------------------

This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

Apparently Analagous Threads

Search for more reasonably related threads

R help - Aug 2007 - Choosing the optimum lag order of ARIMA model

[R] Choosing the optimum lag order of ARIMA model

[R] Choosing the optimum lag order of ARIMA model

[R] Choosing the optimum lag order of ARIMA model

[R] Choosing the optimum lag order of ARIMA model

Apparently Analagous Threads