Prof J C Nash
2005-Aug-24 20:56 UTC
[Rd] Model forecasts with new factor levels - predict.warn
predict.warn() -- a function to display factor levels in new data for linear model prediction that do not exist in the estimating data. Date: 2005-8-24 From: John C. Nash (with thanks to Uwe Ligges for suggestions) nashjc at uottawa.ca Motivation: In computing predictions from a linear model using factors, it is possible to introduce new factor levels. This was encountered on a practical level in forecasting retail sales where certain events had not occurred during the estimation period but did occur during the validation (i.e., trial forecast) period. predict() then gives an error message that it cannot continue because the new data has new levels of a factor. Indeed, it will give the new level of the first such factor found. predict.warn is designed to give more detailed information. for example, if there are other factor levels that are also "new". The underlying interest is to find a suitable method for introducing estimates of the effects of such "new" factor levels into the model, and I welcome exchanges on how that may be done. Indeed, one of my main interests in submitting this tool is to learn how that may be done appropriately and with suitable information to the user to avoid egregious errors. For example, a new sales promotion may have many similarities with existing promotions, so a rough estimate for the effect may be possible, allowing for rudimentary validation testing and for "working" values of forecasts. Unfortunately, the lack of some facility to provide "guesstimates" in this way leads users to ignore R or other tools and to work with Excel. I have experience of this situation arising in practise, and we may anticipate that these users will be difficult to convert to using better tools. I believe we must find ways to allow users to carry out such operations in as safe and informed manner possible. Ideally, the follow-on to predict.warn should be a function called something like predict.expand or predict.force that asks the user to supply values (or possibly sets or distributions) of the "new" factor level estimates. Sample function: predict.warn<-function (object, newdata, ...) # this is function predict.warn.R # Tries to warn if we do not have appropriate factor levels in estimation data # that appear in the newdata. # It does NOT yet test for non-factor variables, but that would also be nice. # However, we usually assume that numerical variables can be interpolated and # extrapolated. { message("predict.warn -- check for factor levels not present") message("in linear models with factors") message(" ") if (missing(newdata) || is.null(newdata)) { # Without newdata, issue warning and return message("You do not have any new data.") } else { xlev <- object$xlevels nn<-names(xlev) dnn<-length(nn) message("there are ",dnn," factor names ") dostop<-FALSE # Ensure the return value is defined for (i in seq(along=nn)) { message("Factor ",nn[i]," in estimation data") ofac <- object[["model"]][[nn[i]]] print(ofac) # diagnostic oldlev<-levels(factor(ofac)) nfac <- newdata[[nn[i]]] # ?? note need for double brackets. Why? message("Factor ",nn[i], " in new data") print(nfac) # diagnostic message("about to try levels on newfac") newlev<-levels(factor(nfac)) for (j in seq(along=newlev)) { if (!(newlev[j] %in% oldlev)) { message("New level ", newlev[j]," not found in estimation levels of factor ", nn[i]) dostop = TRUE } } } # end loop over names } # end part where we DO have new data return(dostop) # TRUE if we should stop } # end of predict.warn code Test example: # A simple test of predict.warn.R for 2 factors # J C Nash 20050822 rm(list=ls()) source("predict.warn.R") x<-c(1, 2, 3, 4, 5, 6, 7, 8) fac1<-c("A", "A", "", "", "", "B", "") fac1<-c(fac1, "A") fac1 y<-c( 2, 4, 5, 8, 9.5, 12, 13, 17) fac2<-c(1,1,1,2,2,1,2,3) fac2<-as.factor(fac2) # Note: or else defaults to num joe<-data.frame(x, y, fac1, fac2) bill<-subset(joe, x<=5) mary<-subset(joe, x>5) message("bill") str(bill) print(bill) message("mary") str(mary) print(mary) est<-lm(y ~ fac1 + fac2 + x, bill) print(est) test<-predict.warn(est, newdata = mary) message('predict.warn returns ',test) message("\n") message("Now try predict()") predict(est, newdata = mary) # ======================= end of msg ========================== -- John C. Nash, School of Management, University of Ottawa, Vanier Hall 451, 136 Jean-Jacques Lussier Private, P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada email: nashjc on mail server uottawa.ca, voice mail: 613 562 5800 X 4796 fax 613 562 5164, Web URL = http://macnash.admin.uottawa.ca "Practical Forecasting for Managers" web site is at http://www.arnoldpublishers.com/support/nash/