This is seems to be an unstated repeat of much of an earlier and
unanswered post
https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html
entitled
[R] error in predict glm (new levels cause problems)
It is nothing to do with `nbinomial glm' (sic): all model fitting
functions including lm and glm do this. The reason you did not get at
least one reply from your first post is that you seemed not to have done
your homework. (One thing the posting guide does ask is for you to try
the current version of R, and yours is three versions old.)
The code is protecting you from an attempt at statistical nonsense.
(Indeed, the check was added to catch such misuses.) Your email address
seems to be that of a student, so please seek the help of your advisor.
You seem surprised that you are not allowed to make predictions about
levels for which you have supplied no relevant data.
On Tue, 16 Aug 2005, K. Steinmann wrote:
> Dear R-helpers,
>
> let us assume, that I have the following dataset:
>
> a <- rnbinom(200, 1, 0.5)
> b <- (1:200)
> c <- (30:229)
> d <- rep(c("q", "r", "s", "t"),
rep(50,4))
> data_frame <- data.frame(a,b,c,d)
>
> In a first step I run a glm.nb (full code is given at the end of this mail)
and
> want to predict my response variable a.
> In a second step, I would like to run a glm.nb based on a subset of the
> data_frame. As soon as I want to predict the response variable a, I get the
> following error message:
> "Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev > object$xlevels) :
> factor d has new level(s) q"
>
> Does anybody have a solution to this problem?
>
> Thank you in advance,
> K. Steinmann (working with R 2.0.0)
>
>
> Code:
>
> library(MASS)
>
> a <- rnbinom(200, 1, 0.5)
> b <- (1:200)
> c <- (30:229)
> d <- rep(c("q", "r", "s", "t"),
rep(50,4))
>
> data_frame <- data.frame(a,b,c,d)
>
> model_1 = glm.nb(a ~ b + d , data = data_frame)
>
> pred_model_1 = predict(model_1, newdata = data_frame, type =
"response", se.fit
> = FALSE, dispersion = NULL, terms = NULL)
>
> subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))
>
> model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
> pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type >
"response", se.fit = FALSE, dispersion = NULL, terms = NULL)
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595