Hello, I asked a question about what the most likely process to follow if after a time-series fit is performed the residuals are found to be non-normal. One peron responded and offered to help if I supplied a sample data set. Unfortunately now that I have a sample I have lost the emai addressl. If you are that person or have some ideas please email me back at rkevinburton at charter.net. Thank you. Kevin
rkevinburton wrote:> > Hello, > > I asked a question about what the most likely process to follow if after a > time-series fit is performed the residuals are found to be non-normal. One > peron responded and offered to help if I supplied a sample data set. > Unfortunately now that I have a sample I have lost the emai addressl. If > you are that person or have some ideas please email me back at > rkevinburton at charter.net. > >It wasn't me, but ... transform the data? See e.g. ?MASS::boxcox -- View this message in context: http://www.nabble.com/Non-normal-residuals.-tp26083746p26084836.html Sent from the R help mailing list archive at Nabble.com.
[Taking the liberty of posting back to r-help] I'd be curious what the particular objections were. I have one of those annoying "it depends" answers. In general, transformation can (1) change [(de)stabilize] variance across groups/gradients; (2) (non)normalize residuals; (3) change [(non)linearize] relationships with gradients or time; (4) modify interaction terms; (5) change the interpretation of responses. The problem is that sometimes these goals conflict. "AVAS" (acepack, Hmisc packages) attempts to do #1 and #3 at the same time. If transforming your data brings you closer to satisfying the assumptions of your analytic methods and having a sensible analysis, then that's good. If it makes things worse, that's bad. Other choices, depending on the situation, include robust methods (for "outlier" problems); generalized linear models etc. (for discrete data from standard distributions); models using t- instead of normally distributed residuals; generalized estimating equations; etc etc etc ... transformation (if it works) is simple and (sometimes) interpretable. rkevinburton at charter.net wrote:> That seems to be a general consensus to transform the data through > sqrt, log, diff, etc. I was particularly intrigued when I considered > Box-Cox transformation but there were other time-series gurus that > recommended against it. Particularly with seasonal or data with a > trend. Would you have any reservations? > > Thank you. > > Kevin > > ---- Ben Bolker <bolker at ufl.edu> wrote: >> >> >> rkevinburton wrote: >>> Hello, >>> >>> I asked a question about what the most likely process to follow >>> if after a time-series fit is performed the residuals are found >>> to be non-normal. One peron responded and offered to help if I >>> supplied a sample data set. Unfortunately now that I have a >>> sample I have lost the emai addressl. If you are that person or >>> have some ideas please email me back at rkevinburton at charter.net. >>> >>> >>> >> It wasn't me, but ... transform the data? See e.g. ?MASS::boxcox >> >> >> -- View this message in context: >> http://www.nabble.com/Non-normal-residuals.-tp26083746p26084836.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ R-help at r-project.org >> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do >> read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091027/f0fd2bf8/attachment-0002.bin>
Kevin, Kudos to you for asking a question that most do not.... I have attached an analysis of your residuals for "10 inch" called 10inchres.zip. I have also attached our analysis as "10inches.zip". I have posted some reports for you and added some commentary to help you understand this all fully. The conclusion is the your model/methodology is not capturing the pattern in the data properly. Worse yet it is actually creating or "injecting" structure into the errors. In turn, the forecast that comes out of a model/approach will be doomed. I have copied ACF/PACF from the enclosed report "details.htm" here. It shows that there is a "blip" at lag 3. This is may be evidence of something wrong. Either a model that is overzealous or a model that has not captured the structure. Most people aren't aware that bad modeling can create issues. Analysis for Variable Y 10inplates-RESIDUALS LAG ACF STND. T- CHI-SQUARE & PACF STND. T- VALUE ERROR RATIO PROBABILITY VALUE ERROR RATIO 1 .037 .154 .24 .1 .8059 .037 .154 .24 2 -.022 .155 -.14 .1 .9597 -.023 .154 -.15 3 -.383 .155 -2.48 7.0 .0711 -.382 .154 -2.47 4 -.174 .176 -.99 8.5 .0750 -.175 .154 -1.13 5 .148 .180 .82 9.6 .0877 .164 .154 1.06 6 -.001 .183 -.01 9.6 .1429 -.179 .154 -1.16 7 -.006 .183 -.03 9.6 .2128 -.176 .154 -1.14 8 -.009 .183 -.05 9.6 .2944 .113 .154 .73 9 -.011 .183 -.06 9.6 .3834 -.025 .154 -.16 10 -.035 .183 -.19 9.7 .4694 -.222 .154 -1.44 11 -.053 .183 -.29 9.8 .5448 -.021 .154 -.13 12 .036 .183 .20 9.9 .6229 .118 .154 .76 13 .013 .183 .07 9.9 .6995 -.157 .154 -1.02 14 .080 .183 .43 10.3 .7362 -.017 .154 -.11 15 -.132 .184 -.72 11.5 .7132 -.050 .154 -.33 16 -.109 .186 -.59 12.4 .7165 -.192 .154 -1.25 17 -.029 .188 -.16 12.5 .7717 -.073 .154 -.47 18 -.018 .188 -.09 12.5 .8214 -.084 .154 -.55 19 .157 .188 .84 14.5 .7556 -.027 .154 -.18 20 .040 .191 .21 14.6 .7984 -.017 .154 -.11 21 .030 .191 .16 14.7 .8384 -.032 .154 -.21 22 -.005 .192 -.03 14.7 .8753 -.018 .154 -.12 23 .008 .192 .04 14.7 .9053 .082 .154 .53 24 .046 .192 .24 14.9 .9232 .039 .154 .25 If you refer to stat.htm in the zip file you will see the model I pasted here. You will see that there are two "Seasonal Pulse" Interventions Identified starting 12/2007 and 1/2008. This indicates that this seasonal effect is being missed in your model. Also, note the two "level shift" Interventions identified at (or around) 5/08 and 4/09 indicating residuals that are clustered on one side of the negative or positive sign. There is also an Autoregressive factor with a lag of 3 (see Box-Jenkins textbook for more on ARIMA modeling). There are a few one-time or "pulse" interventions which reflect large or small (ie 3/09) values that are not being adjusted for. FORECASTING WITH FINAL MODEL MODEL COMPONENT LAG COEFF STANDARD P T # (BOP) ERROR VALUE VALUE 1CONSTANT .154 .804E-01 .0653 1.91 2Autoregressive-Factor # 1 3 -.711 .141 .0000 -5.04 INPUT SERIES X1 I~P00035 2009/ 3 PULSE 3Omega (input) -Factor # 2 0 3.24 .320 .0000 10.13 INPUT SERIES X2 I~S00021 2008/ 1 SEASP 4Omega (input) -Factor # 3 0 3.36 .353 .0000 9.53 INPUT SERIES X3 I~L00036 2009/ 4 LEVEL 5Omega (input) -Factor # 4 0 -.888 .159 .0000 -5.58 INPUT SERIES X4 I~L00025 2008/ 5 LEVEL 6Omega (input) -Factor # 5 0 .287 .110 .0143 2.60 INPUT SERIES X5 I~P00036 2009/ 4 PULSE 7Omega (input) -Factor # 6 0 -2.71 .373 .0000 -7.27 INPUT SERIES X6 I~P00031 2008/ 11 PULSE 8Omega (input) -Factor # 7 0 -1.44 .338 .0002 -4.26 INPUT SERIES X7 I~S00020 2007/ 12 SEASP 9Omega (input) -Factor # 8 0 -1.21 .224 .0000 -5.40 INPUT SERIES X8 I~P00037 2009/ 5 PULSE 10Omega (input) -Factor # 9 0 -.838 .334 .0177 -2.51 INPUT SERIES X9 I~P00021 2008/ 1 PULSE 11Omega (input) -Factor # 10 0 -2.18 .452 .0000 -4.83 INPUT SERIES X 10 I~P00025 2008/ 5 PULSE 12Omega (input) -Factor # 11 0 .648 .313 .0470 2.07 Here is our model for 10 inch plates using the historical data. Autobox identified a seasonal AR1 and AR12 model. Note that the again the seasonal pulse found at November and December appears in the model along with two interventions. MODEL COMPONENT LAG COEFF STANDARD P T # (BOP) ERROR VALUE VALUE 1CONSTANT 119. 72.9 .1113 1.63 2Autoregressive-Factor # 1 1 .941 .557E-01 .0000 16.90 3Autoregressive-Factor # 2 12 -.738 .220 .0019 -3.35 INPUT SERIES X1 I~P00035 2009/ 3 PULSE 4Omega (input) -Factor # 3 0 .110E+04 109. .0000 10.12 INPUT SERIES X2 I~S00020 2007/ 12 SEASP 5Omega (input) -Factor # 4 0 -645. 71.6 .0000 -9.01 INPUT SERIES X3 I~S00019 2007/ 11 SEASP 6Omega (input) -Factor # 5 0 -342. 64.4 .0000 -5.31 INPUT SERIES X4 I~P00033 2009/ 1 PULSE 7Omega (input) -Factor # 6 0 297. 122. .0197 2.44 With all of this said, you have some very difficult time series. Using simple and free methods may not give you what you are looking for. Autobox is completely automatic like R, but has the ability to recognize and adjust for 4 types of interventions. If you don?t adjust the model for these interventions then the "fit" would be off as we have seen with this case study. Contact me or go to our website to learn more about us. Tom Reilly Vice President of Sales Automatic Forecasting Systems 215-675-0652 http://www.autobox.com tomreilly at autobox.com skype:tomreilly at autobox.com Here is Kevin's original post...... This is kind of a general question about methodology more than anything. But I was looking for fome advice. I have fit a time-series model and feel pretty confident that I have taken this model (exponential smoothing) as far as it will go. In other words looking at the data and the fitted curves I think it is as close as I can get. But when I plot the residuals and form a qqplot it seems that the residuals are not "normal". From the QQ-plot there is some factor that is influencing the series that cannot be attributed to "noramal random" fluxuation. I can run 'tsdiag' to determine basically whether the residuals are normall and random, but what if they are not? What would be the next set of 'R' commands that I might run to find this influence? Any suggestions? Kevin rkevinburton wrote:> > Hello, > > I asked a question about what the most likely process to follow if after a > time-series fit is performed the residuals are found to be non-normal. One > peron responded and offered to help if I supplied a sample data set. > Unfortunately now that I have a sample I have lost the emai addressl. If > you are that person or have some ideas please email me back at > rkevinburton at charter.net. > > Thank you. > > Kevin > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >http://old.nabble.com/file/p26322376/10inches.zip 10inches.zip http://old.nabble.com/file/p26322376/10inchres.zip 10inchres.zip -- View this message in context: http://old.nabble.com/Non-normal-residuals.-tp26083746p26322376.html Sent from the R help mailing list archive at Nabble.com.