thr3ads.net - R help - [R] error in predict glm (new levels cause problems) [Aug 2005]

If this information is useful, please help other people find it:
Share via:

K. Steinmann

2005-Aug-15 13:39 UTC

[R] error in predict glm (new levels cause problems)

Dear R-helpers,

I try to perform glm's with negative binomial distributed data.
So I use the MASS library and the commands:
model_1 = glm.nb(response ~ y1 + y2 + ...+ yi, data = data.frame)
and
predict(model_1, newdata = data.frame)


So far, I think everything should be ok.

But when I want to perform a glm with a subset of the data,
I run into an error message as soon as I want to predict values, based on the
new model. The problem seems to be the reduced number of levels of one of the
factors yi ( a categorical factor) in the subset of the original data set.

On cran search I found some related hint, that the line
"mf$drop.unused.levels
<- TRUE " in the glm (or glm.nb) function could cause the problem.

Therefore I changed the line to "mf$drop.unused.levels <- FALSE ".
Indeed the error message disappears and when I compare the prediction of model_1
with the prediction of the model, carried out with the full data set but with
the changed glm.nb function, I get the same predicted numbers.

However, the change of glm.nb function was more of an intuitive action, and
since I still consider myself as a beginner of R, I don't feel comfortable.

So my questions:
1. Is there an easier way to solve my problem?
2. Do I affect the glm.nb function seriously, by changing the line mentioned
above?


Thank you for your help,
Katharina

PS: I am working with R 2.0.0
PPS: Concrete error message:
"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
object$xlevels) :
        factor I(kanton) has new level(s) GE"




--
K. Steinmann
Botanisches Institut
Universit??t Basel
CH-4056 Basel
Switzerland
Tel  0041 61 267 35 02
E-mail: Katharina.Steinmann at stud.unibas.ch

Spencer Graves

2005-Aug-17 13:15 UTC

head link

[R] error in predict glm (new levels cause problems)

Have you considered replacing the offending factor with explicit 
coding of your own choosing?  See "?contr.helmert" and in
library(MASS)
"contr.sdif" plus the "contrasts" attribute in
"options", obtained,
e.g., via 'options("contrasts")'.  Every k-level factor is by
default
converted into a set of (k-1) columns of numeric codes.  The exact 
numbers do not matter to p values obtained from anova, though they will 
matter to the coefficients estimated.

	  Have you tried something like the following with "glm.nb":

 > set.seed(1)
 > DF <- data.frame(a=sample(letters[1:3], 30, replace=TRUE),
+                  b=sample(LETTERS[1:3], 30, replace=TRUE),
+                  y=rnorm(30))
 > with(DF, table(a, b))
    b
a   A B C
   a 3 3 3
   b 2 6 3
   c 2 4 4
 > fit1 <- lm(y~a+b, DF[DF$a!="a",])
 > fit1$contrasts
$a
[1] "contr.treatment"

$b
[1] "contr.treatment"

 > options(contrasts=c(unordered="contr.helmert",
ordered="contr.poly"))
 > fit2 <- lm(y~a+b, DF[DF$a!="a",])
 > fit2$contrasts
$a
[1] "contr.helmert"

$b
[1] "contr.helmert"

 > anova(fit1)
Analysis of Variance Table

Response: y
           Df Sum Sq Mean Sq F value Pr(>F)
a          1 0.0008  0.0008  0.0015 0.9698
b          2 0.6186  0.3093  0.5775 0.5719
Residuals 17 9.1050  0.5356
 > anova(fit2)
Analysis of Variance Table

Response: y
           Df Sum Sq Mean Sq F value Pr(>F)
a          1 0.0008  0.0008  0.0015 0.9698
b          2 0.6186  0.3093  0.5775 0.5719
Residuals 17 9.1050  0.5356
 >
	  spencer graves

K. Steinmann wrote:
> Dear R-helpers,
> 
> I try to perform glm's with negative binomial distributed data.
> So I use the MASS library and the commands:
> model_1 = glm.nb(response ~ y1 + y2 + ...+ yi, data = data.frame)
> and
> predict(model_1, newdata = data.frame)
> 
> 
> So far, I think everything should be ok.
> 
> But when I want to perform a glm with a subset of the data,
> I run into an error message as soon as I want to predict values, based on
the
> new model. The problem seems to be the reduced number of levels of one of
the
> factors yi ( a categorical factor) in the subset of the original data set.
> 
> On cran search I found some related hint, that the line
"mf$drop.unused.levels
> <- TRUE " in the glm (or glm.nb) function could cause the problem.
> 
> Therefore I changed the line to "mf$drop.unused.levels <- FALSE
".
> Indeed the error message disappears and when I compare the prediction of
model_1
> with the prediction of the model, carried out with the full data set but
with
> the changed glm.nb function, I get the same predicted numbers.
> 
> However, the change of glm.nb function was more of an intuitive action, and
> since I still consider myself as a beginner of R, I don't feel
comfortable.
> 
> So my questions:
> 1. Is there an easier way to solve my problem?
> 2. Do I affect the glm.nb function seriously, by changing the line
mentioned
> above?
> 
> 
> Thank you for your help,
> Katharina
> 
> PS: I am working with R 2.0.0
> PPS: Concrete error message:
> "Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev > object$xlevels) :
>         factor I(kanton) has new level(s) GE"
> 
> 
> 
> 
> --
> K. Steinmann
> Botanisches Institut
> Universit??t Basel
> CH-4056 Basel
> Switzerland
> Tel  0041 61 267 35 02
> E-mail: Katharina.Steinmann at stud.unibas.ch
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915

Maybe Matching Threads

Search for more maybe matching threads

R help - Aug 2005 - error in predict glm (new levels cause problems)

[R] error in predict glm (new levels cause problems)

[R] error in predict glm (new levels cause problems)

Maybe Matching Threads