Clifford Long
2009-Jun-22 16:04 UTC
[R] question about using _apply and/or aggregate functions
Hi R-list, I'll apologize in advance for (1) the wordiness of my note (not sure how to avoid it) and (2) any deficiencies on my part that lead to my difficulties. I have an application with several stages that is meant to simulate and explore different scenarios with respect to product sales (in units sold per month). My session info is at the bottom of this note. The steps include (1) an initial linear regression model, (2) building an ARIMA model, (3) creating 12 months of simulated 'real' data - for various values of projected changes in slope from the linear regression - from the ARIMA model using arima.sim with subsequent undifferencing and appending to historical data, (3) one-step-ahead forecasting for 12 months using the 'fixed' arima model and simulated data, (4) taking the residuals from the forecasting (simulated minus forecast for each of the 12 months) and applying QC charting, and (5) counting the number (if any) of runs-rules violations detected. The simulation-aspect calculates 100 simulations for each of 50 values of slope. All of this seems to work fine (although I'm sure that the coding could be improved through functions, vectorization, etc. ... ). However, the problem I'm having is at the end where I had hoped that things would be easier. I want to summarize and graph the probability of detecting a runs-rule violation vs. the amount of the shift in slope (of logunit sales). The output data array passed from the qcc section at the end includes: - the adjustment made to the slope (a multiplier) - the actual value of the slope - the iteration number of the simulation loop (within each value of slope) - the count of QC charting limits violations - the count of QC charting runs rules violations My code is in the attached file ("generic_code.R) and my initial sales data needed to "prime" the simulation is in the other attached file ("generic_data.csv"). The relevant section of my code is at the bottom of the .R file after the end of the outer loop. I've tried to provide meaningful comments. I've read the help files for _apply, aggregate, arrays and data types, and have also consulted with several texts (Maindonald and Braun; Spector; Venebles and Ripley for S-plus). Somehow I still don't get it. My attempts usually result in a message like the following:> agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)Error in FUN(X[[1L]], ...) : arguments must have same length But when I check the length of the arguments, they appear to match. (??)> length(simarray.part[,3])[1] 5000> length(simarray.part[,4])[1] 5000> length(list4)[1] 5000 I would have rather passed along a subset of the simulation/loop output dataset, but was unsure how to save it to preserve whatever properties that I may have missed that are causing my difficulties. If anyone still wants to help at this point, I believe that you'll need to load the .csv data and run the simulation (maybe reducing the number of iterations). Many thanks to anyone who can shed some light on my difficulties (whether self-induced or otherwise). Cliff I'm using a pre-compiled binary version of R for Windows. Session info:> sessionInfo()R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] qcc_1.3 forecast_1.24 tseries_0.10-18 zoo_1.5-5 [5] quadprog_1.4-11 loaded via a namespace (and not attached): [1] grid_2.9.0 lattice_0.17-22> Sys.getlocale()[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Clifford Long
2009-Jun-22 16:18 UTC
[R] Fwd: question about using _apply and/or aggregate functions
Resending, as am not sure about the original "To:" address. Sorry for any redundancy. - Cliff ---------- Forwarded message ---------- From: Clifford Long <gnolffilc at gmail.com> Date: Mon, Jun 22, 2009 at 11:04 AM Subject: question about using _apply and/or aggregate functions To: R-help at lists.r-project.org Hi R-list, I'll apologize in advance for (1) the wordiness of my note (not sure how to avoid it) and (2) any deficiencies on my part that lead to my difficulties. I have an application with several stages that is meant to simulate and explore different scenarios with respect to product sales (in units sold per month). ?My session info is at the bottom of this note. The steps include (1) an initial linear regression model, (2) building an ARIMA model, (3) creating 12 months of simulated 'real' data - for various values of projected changes in slope from the linear regression - from the ARIMA model using arima.sim with subsequent undifferencing and appending to historical data, (3) one-step-ahead forecasting for 12 months using the 'fixed' arima model and simulated data, (4) taking the residuals from the forecasting (simulated minus forecast for each of the 12 months) and applying QC charting, and (5) counting the number (if any) of runs-rules violations detected. The simulation-aspect calculates 100 simulations for each of 50 values of slope. All of this seems to work fine (although I'm sure that the coding could be improved through functions, vectorization, etc. ... ). However, the problem I'm having is at the end where I had hoped that things would be easier. ?I want to summarize and graph the probability of detecting a runs-rule violation vs. the amount of the shift in slope (of logunit sales). The output data array passed from the qcc section at the end includes: ?- the adjustment made to the slope (a multiplier) ?- the actual value of the slope ?- the iteration number of the simulation loop (within each value of slope) ?- the count of QC charting limits violations ?- the count of QC charting runs rules violations My code is in the attached file ("generic_code.R) and my initial sales data needed to "prime" the simulation is in the other attached file ("generic_data.csv"). ?The relevant section of my code is at the bottom of the .R file after the end of the outer loop. ?I've tried to provide meaningful comments. I've read the help files for _apply, aggregate, arrays and data types, and have also consulted with several texts (Maindonald and Braun; Spector; Venebles and Ripley for S-plus). ?Somehow I still don't get it. ?My attempts usually result in a message like the following:> agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)Error in FUN(X[[1L]], ...) : arguments must have same length But when I check the length of the arguments, they appear to match. (??)> length(simarray.part[,3])[1] 5000> length(simarray.part[,4])[1] 5000> length(list4)[1] 5000 I would have rather passed along a subset of the simulation/loop output dataset, but was unsure how to save it to preserve whatever properties that I may have missed that are causing my difficulties. If anyone still wants to help at this point, I believe that you'll need to load the .csv data and run the simulation (maybe reducing the number of iterations). Many thanks to anyone who can shed some light on my difficulties (whether self-induced or otherwise). Cliff I'm using a pre-compiled binary version of R for Windows. Session info:> sessionInfo()R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] qcc_1.3 ? ? ? ? forecast_1.24 ? tseries_0.10-18 zoo_1.5-5 [5] quadprog_1.4-11 loaded via a namespace (and not attached): [1] grid_2.9.0 ? ? ?lattice_0.17-22> Sys.getlocale()[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
David Winsemius
2009-Jun-22 16:28 UTC
[R] question about using _apply and/or aggregate functions
On Jun 22, 2009, at 12:04 PM, Clifford Long wrote:> Hi R-list, > > I'll apologize in advance for (1) the wordiness of my note (not sure > how to avoid it) and (2) any deficiencies on my part that lead to my > difficulties. > > I have an application with several stages that is meant to simulate > and explore different scenarios with respect to product sales (in > units sold per month). My session info is at the bottom of this note. > > The steps include (1) an initial linear regression model, (2) building > an ARIMA model, (3) creating 12 months of simulated 'real' data - for > various values of projected changes in slope from the linear > regression - from the ARIMA model using arima.sim with subsequent > undifferencing and appending to historical data, (3) one-step-ahead > forecasting for 12 months using the 'fixed' arima model and simulated > data, (4) taking the residuals from the forecasting (simulated minus > forecast for each of the 12 months) and applying QC charting, and (5) > counting the number (if any) of runs-rules violations detected. > > The simulation-aspect calculates 100 simulations for each of 50 > values of slope. > > All of this seems to work fine (although I'm sure that the coding > could be improved through functions, vectorization, etc. ... ). > However, the problem I'm having is at the end where I had hoped that > things would be easier. I want to summarize and graph the probability > of detecting a runs-rule violation vs. the amount of the shift in > slope (of logunit sales). > > The output data array passed from the qcc section at the end includes: > - the adjustment made to the slope (a multiplier) > - the actual value of the slope > - the iteration number of the simulation loop (within each value of > slope) > - the count of QC charting limits violations > - the count of QC charting runs rules violations > > > My code is in the attached file ("generic_code.R) and my initial sales > data needed to "prime" the simulation is in the other attached file > ("generic_data.csv"). The relevant section of my code is at the > bottom of the .R file after the end of the outer loop. I've tried to > provide meaningful comments. > > I've read the help files for _apply, aggregate, arrays and data types, > and have also consulted with several texts (Maindonald and Braun; > Spector; Venebles and Ripley for S-plus). Somehow I still don't get > it. My attempts usually result in a message like the following: > >> agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean) > Error in FUN(X[[1L]], ...) : arguments must have same lengthI cannot comment on the overall strategy, but wondered if this minor mod to the code might succeed;>> agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean)My personal experience with aggregate() is not positive. I generally end up turning to tapply() (which is at the heart of aggregate() anyway) probably because I forget to wrap the second argument in a list. Slow learner, I guess.> > But when I check the length of the arguments, they appear to match. > (??) > >> length(simarray.part[,3]) > [1] 5000 >> length(simarray.part[,4]) > [1] 5000 >> length(list4) > [1] 5000 > > > I would have rather passed along a subset of the simulation/loop > output dataset, but was unsure how to save it to preserve whatever > properties that I may have missed that are causing my difficulties. > If anyone still wants to help at this point, I believe that you'll > need to load the .csv data and run the simulation (maybe reducing the > number of iterations). > > Many thanks to anyone who can shed some light on my difficulties > (whether self-induced or otherwise). > > Cliff > > > > I'm using a pre-compiled binary version of R for Windows. > > Session info: > >> sessionInfo() > R version 2.9.0 (2009-04-17) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] qcc_1.3 forecast_1.24 tseries_0.10-18 zoo_1.5-5 > [5] quadprog_1.4-11 > > loaded via a namespace (and not attached): > [1] grid_2.9.0 lattice_0.17-22 > > >> Sys.getlocale() > [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252" > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT