I am trying to make estimates of the predictive power of ARIMA models estimated by the auto.arima() function. I am looping through a large number of time seiries and fitting ARIMA models with the following code. data1 <- read.csv(file = "case.csv", header = T) data <- data1 output = c(1:length(data)) for(i in 1:length(data)) { point_data = unlist(data[i], use.names = FALSE) x = auto.arima(point_data , max.p = 10, max.q = 10, max.P = 0, max.Q = 0, approximation = TRUE) } However, I would like to find a way to test the out of sample predictive power of these models. I can think of a few ways I MIGHT be able to do this but nothing clean. I am a recen R user and despite my best efforts (looking on the mailing list, reading documentation) I cant figure out the best way to do this. I tried including something like this: output[i] = cor(model_data, real_data) but with poor results. Does anyone have any tricks to calculate the R^2 or an ARIMA model. Sample code would be apreciated.
Evan DeCorte wrote:> I am trying to make estimates of the predictive power of ARIMA models estimated by the auto.arima() function. > > I am looping through a large number of time seiries and fitting ARIMA models with the following code. > > data1 <- read.csv(file = "case.csv", header = T) > data <- data1 > > output = c(1:length(data)) > > for(i in 1:length(data)) > { > point_data = unlist(data[i], use.names = FALSE) > x = auto.arima(point_data , max.p = 10, max.q = 10, max.P = 0, max.Q = 0, approximation = TRUE) > > } > > However, I would like to find a way to test the out of sample predictive power of these models. I can think of a few ways I MIGHT be able to do this but nothing clean. I am a recen R user and despite my best efforts (looking on the mailing list, reading documentation) I cant figure out the best way to do this. > > I tried including something like this: > > output[i] = cor(model_data, real_data) > > but with poor results. > > Does anyone have any tricks to calculate the R^2 or an ARIMA model. Sample code would be apreciated.There are a couple of issues here. First, how to measure the predictive power of the model. I think a reasonable measure is mean-square error, i.e., predict ahead some k time steps, and compare that prediction with the observed timeseries. You can plot the MSE versus the forecast horizon. You can also calculate the proportion of explained variance MSE/Var(x). Second, the "out of sample" issue. You can use cross-validation or the moving-blocks bootstrap, in essence cutting the timeseries into separate blocks for training and testing. The blocks have to be large enough to cover all the ARIMA terms (e.g., more than 10 time steps for an AR(10)). -- Gad Abraham Dept. CSSE and NICTA The University of Melbourne Parkville 3010, Victoria, Australia email: gabraham at csse.unimelb.edu.au web: http://www.csse.unimelb.edu.au/~gabraham
> for(i in 1:length(data)) > { > point_data = unlist(data[i], use.names = FALSE) > x = auto.arima(point_data , max.p = 10, max.q = 10, max.P = 0, max.Q = 0, approximation = TRUE) > > } > > However, I would like to find a way to test the out of sample predictive power of these models. I can think of a few ways I MIGHT be able to do this but nothing clean. I am a recen R user and despite my best efforts (looking on the mailing list, reading documentation) I cant figure out the best way to do this. > >if you want to test the predictive power you need a prediction. so at least you need predict(x) in your loop and to store the predictions in another variable. Afterwards you can calculate several measures. another way is to use the dm.test and accuracy functions of the forecast(ing) package which you might already have used for the auto arima. one remark: the use of auto arima might not be the best solution since it uses in sample fit measures (e.g. AIC) for the best in sample fit. So for every i you might have a very different arima class. I am not sure whether this is what you intented... hth Stefan
Thanks for the great feedback. Conceptually I understand how you would go about testing out of sample performance. It seems like accuracy() would be the best way to test out of forecast performance and will help to automate the construction of statistics I would have calculated on my own. However, the real question now is how do you loop through a time series and automatically split a time series into training and testing sets. I know how I would do it for individual sets but to do so manually over a large number of time series seems excessively burdensome.> > for(i in 1:length(data)) > > { > > point_data = unlist(data[i], use.names = FALSE) > > x = auto.arima(point_data , max.p = 10, max.q = 10, max.P = 0, > max.Q = 0, approximation = TRUE) > > > > } > > > > However, I would like to find a way to test the out of sample > predictive power of these models. I can think of a few ways I MIGHT be > able to do this but nothing clean. I am a recen R user and despite my > best efforts (looking on the mailing list, reading documentation) I > cant figure out the best way to do this. > > > > > if you want to test the predictive power you need a prediction. so at > least you need predict(x) in your loop and to store the predictions in > another variable. Afterwards you can calculate several measures. > > another way is to use the dm.test and accuracy functions of the > forecast(ing) package which you might already have used for the auto arima. > > one remark: the use of auto arima might not be the best solution since > it uses in sample fit measures (e.g. AIC) for the best in sample fit. > So > for every i you might have a very different arima class. I am not sure > whether this is what you intented... > > hth > Stefan