Dear all R users, I am really struggling to determine the most appropriate lag order of ARIMA model. My understanding is that, as for MA [q] model the auto correlation coeff vanishes after q lag, it says the MA order of a ARIMA model, and for a AR[p] model partial autocorrelation vanishes after p lags it helps to determine the AR lag. And most appropriate model choosed by this argument gives min AIC. Now I considered following data : 2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059 3.1997 3.038 3.1613 3.2802 3.2194 ACF for 1st diff series: Autocorrelations of series 'diff(data1)', by lag 0 1 2 3 4 5 6 7 8 9 10 1.000 -0.022 -0.258 -0.016 0.066 0.034 0.035 -0.001 -0.089 0.028 0.222 11 12 13 14 15 16 17 18 -0.132 -0.184 -0.038 0.048 -0.026 -0.041 -0.067 0.059 PACF for 1st diff series: Partial autocorrelations of series 'diff(data1)', by lag 1 2 3 4 5 6 7 8 9 10 11 -0.022 -0.258 -0.031 -0.002 0.026 0.057 0.021 -0.069 0.029 0.194 -0.124 12 13 14 15 16 17 18 -0.100 -0.111 -0.043 -0.078 -0.056 -0.085 0.086 On basis of that I choose ARIMA[2,1,2] for the original data But I got error while doing that : > arima(data1, c(2,1,2)) Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS And AIC for other combination of lags are: > arima(data1, c(2,1,1))$aic [1] -84.83648> arima(data1, c(1,1,2))$aic[1] -84.35737> arima(data1, c(1,1,1))$aic[1] -83.79392 Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then the first rule that I said earlier does not support. Am I making anything wrong? Can anyone give me any suggestion on what is the "universal" rule for choosing the best lag? Regards, --------------------------------- [[alternative HTML version deleted]]
On Fri, 31 Aug 2007, Megh Dal wrote:> Dear all R users, > > I am really struggling to determine the most appropriate lag order of > ARIMA model. My understanding is that, as for MA [q] model the auto > correlation coeff vanishes after q lag, it says the MA order of a ARIMA > model, and for a AR[p] model partial autocorrelation vanishes after p > lags it helps to determine the AR lag. And most appropriate model > choosed by this argument gives min AIC.The last part is fallacious. Also, you are applying your rules to selecting the orders in ARMA models, and they apply only to pure MA or AR models. The R test file src/library/stats/tests/ts-tests.R has an example of model selection by AIC.> > Now I considered following data : > > 2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176 > 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068 > 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631 > 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928 > 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365 > 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059 > 3.1997 3.038 3.1613 3.2802 3.2194 > > ACF for 1st diff series: > Autocorrelations of series 'diff(data1)', by lag > 0 1 2 3 4 5 6 7 8 9 10 > 1.000 -0.022 -0.258 -0.016 0.066 0.034 0.035 -0.001 -0.089 0.028 0.222 > 11 12 13 14 15 16 17 18 > -0.132 -0.184 -0.038 0.048 -0.026 -0.041 -0.067 0.059 > > PACF for 1st diff series: > Partial autocorrelations of series 'diff(data1)', by lag > 1 2 3 4 5 6 7 8 9 10 11 > -0.022 -0.258 -0.031 -0.002 0.026 0.057 0.021 -0.069 0.029 0.194 -0.124 > 12 13 14 15 16 17 18 > -0.100 -0.111 -0.043 -0.078 -0.056 -0.085 0.086 > > On basis of that I choose ARIMA[2,1,2] for the original data > > But I got error while doing that : > > > arima(data1, c(2,1,2)) > Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS > > And AIC for other combination of lags are: > > arima(data1, c(2,1,1))$aic > [1] -84.83648 >> arima(data1, c(1,1,2))$aic > [1] -84.35737 >> arima(data1, c(1,1,1))$aic > [1] -83.79392 > > Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then the > first rule that I said earlier does not support. > > Am I making anything wrong? Can anyone give me any suggestion on what > is the "universal" rule for choosing the best lag? > > Regards, > > > > > > > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi Leed, I got your point. Hence if I see both acf and pacf vanish after 3 then I should try for all possible models and then choose that model giving min aic? i.e. (1,3), (3,1), (3,3), (2,3), (3,2), (1,2), (2,1), (1,1), and (2,2)? And my second doubt is : for the particular dataset that I provided, I got nothing when I run arima(data, order=c(2,1,2)) however arima(diff(data), order=c(2,0,2)) gives no problem : > arima(data, order=c(2,1,2)) Error in arima(data, order = c(2, 1, 2)) : non-stationary AR part from CSS> arima(diff(data), order=c(2,0,2))Call: arima(x = diff(data), order = c(2, 0, 2)) Coefficients: ar1 ar2 ma1 ma2 intercept 0.1093 -0.3111 -0.1438 0.0632 0.0157 s.e. 0.5378 0.4464 0.5661 0.4796 0.0111 sigma^2 estimated as 0.01329: log likelihood = 47.38, aic = -82.76 Can anyone tell me what is the wrong there? Regars, "Leeds, Mark (IED)" <Mark.Leeds@morganstanley.com> wrote: what ripley says below is kind of related to what I said about p and q both being greater than 1 being very unlikely. He's also right in that those "rules" only work in the sense that, if the acf drops off after q lags, then the Implication is that p = 0 And if they pacf drops off after p lags, then it's implied that q = 0. when the model is mixed, it's more complicated and Mixed models are more rare than common but they could end up being the best model. That's another place where The aic can be used. In other words, if it looks like your acf drops off after 1 and your pacf drops off after 1, then it could be a p = 1 and q =1 model but then the aic should be checked against ( p =1 and q = 0 ) And p = 0 and q = 1 ) because the selection of p = 1 and q = 1 is really flawed because the rules don't really Hold when BOTH p and q are non zero. -----Original Message----- From: r-help-bounces@stat.math.ethz.ch [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Prof Brian Ripley Sent: Friday, August 31, 2007 4:38 AM To: Megh Dal Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Choosing the optimum lag order of ARIMA model On Fri, 31 Aug 2007, Megh Dal wrote:> Dear all R users, > > I am really struggling to determine the most appropriate lag order of> ARIMA model. My understanding is that, as for MA [q] model the auto > correlation coeff vanishes after q lag, it says the MA order of a > ARIMA model, and for a AR[p] model partial autocorrelation vanishes > after p lags it helps to determine the AR lag. And most appropriate > model choosed by this argument gives min AIC.The last part is fallacious. Also, you are applying your rules to selecting the orders in ARMA models, and they apply only to pure MA or AR models. The R test file src/library/stats/tests/ts-tests.R has an example of model selection by AIC.> > Now I considered following data : > > 2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176 > 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068 > 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631 > 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928 > 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365 > 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059 > 3.1997 3.038 3.1613 3.2802 3.2194 > > ACF for 1st diff series: > Autocorrelations of series 'diff(data1)', by lag > 0 1 2 3 4 5 6 7 8 910> 1.000 -0.022 -0.258 -0.016 0.066 0.034 0.035 -0.001 -0.089 0.0280.222> 11 12 13 14 15 16 17 18 > -0.132 -0.184 -0.038 0.048 -0.026 -0.041 -0.067 0.059 > > PACF for 1st diff series: > Partial autocorrelations of series 'diff(data1)', by lag > 1 2 3 4 5 6 7 8 9 1011> -0.022 -0.258 -0.031 -0.002 0.026 0.057 0.021 -0.069 0.029 0.194-0.124> 12 13 14 15 16 17 18 > -0.100 -0.111 -0.043 -0.078 -0.056 -0.085 0.086 > > On basis of that I choose ARIMA[2,1,2] for the original data > > But I got error while doing that : > > > arima(data1, c(2,1,2)) > Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS > > And AIC for other combination of lags are: > > arima(data1, c(2,1,1))$aic > [1] -84.83648 >> arima(data1, c(1,1,2))$aic > [1] -84.35737 >> arima(data1, c(1,1,1))$aic > [1] -83.79392 > > Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then > the first rule that I said earlier does not support. > > Am I making anything wrong? Can anyone give me any suggestion on what> is the "universal" rule for choosing the best lag? > > Regards, > > > > > > > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
Hi Leeds, Thanx for this reply. Actually I did not want to know whether any differentiation is needed or not. My question was that : what is the difference between two models : arima(data, c(2,1,2)) and arima(diff(data), c(2,0,2)) If I am correct then those two models are same. Therefore I should get same results for both of the cases. Am I doing something wrong? "Leeds, Mark (IED)" <Mark.Leeds@morganstanley.com> wrote: you shouldn't just do a diff because the non diffed version gives you an error. I don't know what that error means but you definitely can't just ignore it and go to taking a difference. Why don't you do an acf plot of the non diffed series and see if the acf doesn't die out quickly. If it doesn't, then it's probably okay to assume you need to difference it. if you check out the source of the function, that might gives hintsa about what the error means. Whayt you say below about looking at combinations is okay but remember that picking a model is an art rather than S science. Maybe an arima(2,1,2) is the best model based on model selection and aic but it gives forecasts that Are very poor. Parsimony ( fewer parameters ) is stressed by boix and jenkins so, when in doubt,, choose a lower order model when all else fails. The series may not have an perfect arima represenation so nothing is going to be perfect. -----Original Message----- From: Megh Dal [mailto:megh700004@yahoo.com] Sent: Saturday, September 01, 2007 1:20 AM To: Leeds, Mark (IED) Cc: r-help@stat.math.ethz.ch Subject: RE: [R] Choosing the optimum lag order of ARIMA model Hi Leed, I got your point. Hence if I see both acf and pacf vanish after 3 then I should try for all possible models and then choose that model giving min aic? i.e. (1,3), (3,1), (3,3), (2,3), (3,2), (1,2), (2,1), (1,1), and (2,2)? And my second doubt is : for the particular dataset that I provided, I got nothing when I run arima(data, order=c(2,1,2)) however arima(diff(data), order=c(2,0,2)) gives no problem :> arima(data, order=c(2,1,2))Error in arima(data, order = c(2, 1, 2)) : non-stationary AR part from CSS> arima(diff(data), order=c(2,0,2))Call: arima(x = diff(data), order = c(2, 0, 2)) Coefficients: ar1 ar2 ma1 ma2 intercept 0.1093 -0.3111 -0.1438 0.0632 0.0157 s.e. 0.5378 0.4464 0.5661 0.4796 0.0111 sigma^2 estimated as 0.01329: log likelihood = 47.38, aic = -82.76 Can anyone tell me what is the wrong there? Regars, "Leeds, Mark (IED)" wrote: what ripley says below is kind of related to what I said about p and q both being greater than 1 being very unlikely. He's also right in that those "rules" only work in the sense that, if the acf drops off after q lags, then the Implication is that p = 0 And if they pacf drops off after p lags, then it's implied that q = 0. when the model is mixed, it's more complicated and Mixed models are more rare than common but they could end up being the best model. That's another place where The aic can be used. In other words, if it looks like your acf drops off after 1 and your pacf drops off after 1, then it could be a p = 1 and q =1 model but then the aic should be checked against ( p =1 and q = 0 ) And p = 0 and q = 1 ) because the selection of p = 1 and q = 1 is really flawed because the rules don't really Hold when BOTH p and q are non zero. -----Original Message----- From: r-help-bounces@stat.math.ethz.ch [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Prof Brian Ripley Sent: Friday, August 31, 2007 4:38 AM To: Megh Dal Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Choosing the optimum lag order of ARIMA model On Fri, 31 Aug 2007, Megh Dal wrote:> Dear all R users, > > I am really struggling to determine the most appropriate lagorder of> ARIMA model. My understanding is that, as for MA [q] model theauto> correlation coeff vanishes after q lag, it says the MA orderof a> ARIMA model, and for a AR[p] model partial autocorrelationvanishes> after p lags it helps to determine the AR lag. And mostappropriate> model choosed by this argument gives min AIC.The last part is fallacious. Also, you are applying your rules to selecting the orders in ARMA models, and they apply only to pure MA or AR models. The R test file src/library/stats/tests/ts-tests.R has an example of model selection by AIC.> > Now I considered following data : > > 2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.57272.5176> 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.04523.068> 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.88862.9631> 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.7012.6928> 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.07243.1365> 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.08033.2059> 3.1997 3.038 3.1613 3.2802 3.2194 > > ACF for 1st diff series: > Autocorrelations of series 'diff(data1)', by lag > 0 1 2 3 4 5 6 7 8 910> 1.000 -0.022 -0.258 -0.016 0.066 0.034 0.035 -0.001 -0.0890.028 0.222> 11 12 13 14 15 16 17 18 > -0.132 -0.184 -0.038 0.048 -0.026 -0.041 -0.067 0.059 > > PACF for 1st diff series: > Partial autocorrelations of series 'diff(data1)', by lag > 1 2 3 4 5 6 7 8 9 1011> -0.022 -0.258 -0.031 -0.002 0.026 0.057 0.021 -0.069 0.0290.194 -0.124> 12 13 14 15 16 17 18 > -0.100 -0.111 -0.043 -0.078 -0.056 -0.085 0.086 > > On basis of that I choose ARIMA[2,1,2] for the original data > > But I got error while doing that : > > > arima(data1, c(2,1,2)) > Error in arima(data1, c(2, 1, 2)) : non-stationary AR partfrom CSS> > And AIC for other combination of lags are: > > arima(data1, c(2,1,1))$aic > [1] -84.83648 >> arima(data1, c(1,1,2))$aic > [1] -84.35737 >> arima(data1, c(1,1,1))$aic > [1] -83.79392 > > Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model,then> the first rule that I said earlier does not support. > > Am I making anything wrong? Can anyone give me any suggestionon what> is the "universal" rule for choosing the best lag? > > Regards, > > > > > > > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproduciblecode.>-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls. You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions. We cannot guarantee that any such requests received via e-mail will be processed in a timely manner. This communication is solely for the addressee(s) and may contain confidential information. We do not waive confidentiality by mistransmission. Contact me if you do not wish to receive these communications. In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority's rules). ________________________________ Looking for a deal? Find great prices on flights and hotels MTFicDJoNDllBF9TAzk3NDA3NTg5BHBvcwMxMwRzZWMDZ3JvdXBzBHNsawNlbWFpbC1uY20- -------------------------------------------------------- This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
Apparently Analagous Threads
- ARMA show different result between eview and R
- Split a vector by NA's - is there a better solution then a loop ?
- transforming a .csv file column names as per a particular column rows using R code
- "each" argument in rep (Bug?)
- Request - adding recycled "lwd" parameter to polygon