The error message is pretty clear, really. To spell it out a bit more,
what you have done is as follows.
Your training set has factor variables in it. Suppose one of them is
"f". In the training set it has 5 levels, say.
Your test set also has a factor "f", as it must, but it appears that
in
the test set it has 6 levels, or more, or levels that do not agree with
those for "f" in the training set.
This mismatch measn that the predict method for randomForest cannot use
this test set.
What you have to do is make sure that the factor levels agree for every
factor in both test and training set. One way to do this is to put the
test and training set together with rbind(...) say, and then separate
them again. But even this will still have a problem for you. Because
you training set will have some factor levels empty, which are not empty
in the test set. The error will most likely be more subtle, though.
You really need to sort this out yourself. It is not particularly an R
problem, but a confusion over data. To be useful, your training set
need to cover the field for all levels of every factor. Think about it.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Nagu
Sent: Saturday, 8 March 2008 5:37 AM
To: r-help at r-project.org; r-help at stat.math.ethz.ch
Subject: [R] error in random forest
Hi,
I get the following error when I try to predict the probabilities of a
test sample:
Error in predict.randomForest(fit.EBA.OM.rf.50, x.OM, type = "prob") :
New factor levels not present in the training data
I have about 630 predictor variables in the dataset x.OM (25 factor
variables and the remaining are continuous variables). Any ideas on
how to trace it?
Thank you,
Nagu
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.