HelponR
2015-Jan-13 17:14 UTC
[R] any r package can handle factor levels not in the training set
sorry I notice the email subject is not accurate. to be specific, when I do predict, there are error messages like factor x has new levels 1, 2 Here x is an attribute(independent var), not outcome. I wonder if the incremental packages (if any) solve this problem? Maybe it is time to write my own package. On Tue, Jan 13, 2015 at 8:59 AM, HelponR <suncertain at gmail.com> wrote:> Thanks for your reply. But I cannot control the data. > I am dealing with real world stream data. It is very normal that the test > data(when you apply model to do prediction) have new values that are not > seen in training data. > If I code myself, I would give a random guess or just an intercept for > such situation. But it seems most R package returns an error and exit. > > > On Mon, Jan 12, 2015 at 6:08 PM, Richard M. Heiberger <rmh at temple.edu> > wrote: > >> You need to define the levels of the training set to include all >> levels that you might see. >> Something like this >> >> > A <- factor(letters[1:5]) >> > B <- factor(letters[c(1,3,5,7,9)]) >> > A >> [1] a b c d e >> Levels: a b c d e >> > B >> [1] a c e g i >> Levels: a c e g i >> > training <- factor(A, levels=unique(c(levels(A), levels(B)))) >> > training >> [1] a b c d e >> Levels: a b c d e g i >> > >> >> In the future please "provide commented, minimal, self-contained, >> reproducible code." >> >> On Mon, Jan 12, 2015 at 9:00 PM, HelponR <suncertain at gmail.com> wrote: >> > It looks like gbm, glm all has this issue >> > >> > I wonder if any R package is immune of this? >> > >> > In reality, it is very normal that test data has data unseen in training >> > data. It looks like I have to give up R? >> > >> > Thanks! >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]