I appear to have hit one of the "drop" issues raised in some discussions a couple of years ago by Frank Harrell. They don't seem to have been fixed, and I'm under some pressure to get a quick solution for a forecasting task I'm doing. I have been modelling some retail sales data, and the days just after Thanksgiving (US version!) are important. So I created some dummy variables by a factor called "events" and (really ugly!!) have TG, TG+1, TG+2, etc. Now I also have DEC1, and the calendar and data are such that in the period I'm forecasting I have TG+3 but this is NOT in the estimation data. There are also weekday factors (wdf) and some cross factors (Saturday + some special days is highly significant). The model is Sales ~ daynumber + wdf*events + wdf*specialevents where daynumber is the day sequence in the year and specialevents is a set of factors to tell when the business has promotional activities. The entire model has about 330 coefficients (it seriously needs some economizing), but only about 140 of these are estimated. I'm using lm() to do the estimation. I plan to change the model and possibly the method once I've seen if forecasting works. The current model "works" moderately well for in-sample fits, though I suspect there is too much variability generally. I want to advance 1 week at a time, reestimate, and iterate. This is a test case where we know the "future". I can get this to work for a few weeks starting at 20041101, but then get an error msg "new factor levels in 'events' ...". I have tried putting drop.factor.levels = TRUE in predict(), but this didn't seem to register. Also tried suggestion from web to use ifac <- sapply(estndta,is.factor) fcstdta[ifac] <- lapply(fcstdta[ifac],factor) Still get same error. I've tried a couple of dozen variants on this with no joy. Finally have tried using the full data set in lm() but set weights for the estimation period to 1, and those for the forecast period to 0. This "computes", but the results include NAs at a point where there seems no reason for them. I'm starting to suspect that there's some sort of bug somewhere in the R internals. Any advice welcome. -- John C. Nash, School of Management, University of Ottawa, Vanier Hall 451, 136 Jean-Jacques Lussier Private, P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada email: nashjc on mail server uottawa.ca, voice mail: 613 562 5800 X 4796 fax 613 562 5164, Web URL = http://macnash.admin.uottawa.ca "Practical Forecasting for Managers" web site is at http://www.arnoldpublishers.com/support/nash/
Can you please specify a small reproducible example? Uwe Ligges Prof J C Nash wrote:> I appear to have hit one of the "drop" issues raised in some discussions > a couple of years ago by Frank Harrell. They don't seem to have been > fixed, and I'm under some pressure to get a quick solution for a > forecasting task I'm doing. > > I have been modelling some retail sales data, and the days just after > Thanksgiving (US version!) are important. So I created some dummy > variables by a factor called "events" and (really ugly!!) have TG, TG+1, > TG+2, etc. Now I also have DEC1, and the calendar and data are such > that in the period I'm forecasting I have TG+3 but this is > NOT in the estimation data. There are also weekday factors (wdf) and some > cross factors (Saturday + some special days is highly significant). > > The model is Sales ~ daynumber + wdf*events + wdf*specialevents > > where daynumber is the day sequence in the year and specialevents is a > set of factors to tell when the business has promotional activities. > The entire model has about 330 coefficients (it seriously needs some > economizing), but only about 140 of these are estimated. > > I'm using lm() to do the estimation. I plan to change the model and > possibly > the method once I've seen if forecasting works. The current model "works" > moderately well for in-sample fits, though I suspect there is too > much variability generally. > > I want to advance 1 week at a time, reestimate, and iterate. This is > a test case where we know the "future". I can get this to work for a few > weeks starting at 20041101, but then get an error msg > > "new factor levels in 'events' ...". > > I have tried putting drop.factor.levels = TRUE in predict(), but this > didn't seem to register. Also tried suggestion from web to use > > ifac <- sapply(estndta,is.factor) > fcstdta[ifac] <- lapply(fcstdta[ifac],factor) > > Still get same error. > > I've tried a couple of dozen variants on this with no joy. > > Finally have tried using the full data set in lm() but set weights for > the estimation period to 1, and those for the forecast period to 0. This > "computes", but the results include NAs at a point where there seems no > reason for them. > > I'm starting to suspect that there's some sort of bug somewhere in the R > internals. > > > Any advice welcome. > > >
Apparently Analagous Threads
- Implementing tf-idf weighting scheme in Xapian
- manual flushing thresholds for deletes?
- Xapian 1.4.3 "Db block overwritten - are there multiple writers?"
- Implementation of the PL2 weighting scheme of the DFR Framework
- manual flushing thresholds for deletes?