HelponR
2015-Jan-13 02:00 UTC
[R] any r package can handle factor levels not in the test set
It looks like gbm, glm all has this issue I wonder if any R package is immune of this? In reality, it is very normal that test data has data unseen in training data. It looks like I have to give up R? Thanks! [[alternative HTML version deleted]]
Richard M. Heiberger
2015-Jan-13 02:08 UTC
[R] any r package can handle factor levels not in the test set
You need to define the levels of the training set to include all levels that you might see. Something like this> A <- factor(letters[1:5]) > B <- factor(letters[c(1,3,5,7,9)]) > A[1] a b c d e Levels: a b c d e> B[1] a c e g i Levels: a c e g i> training <- factor(A, levels=unique(c(levels(A), levels(B)))) > training[1] a b c d e Levels: a b c d e g i>In the future please "provide commented, minimal, self-contained, reproducible code." On Mon, Jan 12, 2015 at 9:00 PM, HelponR <suncertain at gmail.com> wrote:> It looks like gbm, glm all has this issue > > I wonder if any R package is immune of this? > > In reality, it is very normal that test data has data unseen in training > data. It looks like I have to give up R? > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.