Hi there, I have been having trouble running negative binomial regression (glm.nb) using library MASS in R v2.15.0 on Mac OSX. I am running multiple models on the variables influencing the group size of damselfish in coral reefs (count data). For total group size and two of my species, glm.nb is working great to deal with overdispersion in my count data. For two of my species, I am getting a mixture of warning and error messages. These species are different from the others in that they have slightly smaller group sample size (so there are more 0s) and the group size varies more widely (from 1 to 45 fish versus 1 to 11 fish for the other species for which the model is working). Here is a sample of my output and the error messages:> model1<-glm.nb(X...of.C..viridis~PC1+Average.Branch.Diameter+Average.Branch.Spacing+PC1*Average.Branch.Diameter+Average.Branch.Diameter*Average.Branch.Spacing+PC1*Average.Branch.Spacing)There were 50 or more warnings (use warnings() to see the first 50)> warnings()Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace control$trace > ... : iteration limit reached 2: In sqrt(1/i) : NaNs produced 3: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace control$trace > ... : iteration limit reached (ETC...CONTINUES ON THE SAME FOR 50 WARNINGS)> summary(model1)Call: glm.nb(formula = X...of.C..viridis ~ PC1 + Average.Branch.Diameter + Average.Branch.Spacing + PC1 * Average.Branch.Diameter + Average.Branch.Diameter * Average.Branch.Spacing + PC1 * Average.Branch.Spacing, init.theta = 61302.24256, link = log) Deviance Residuals: Min 1Q Median 3Q Max -7.3292 -1.0162 -0.7605 -0.5167 8.5282 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.0591 3.5020 -2.016 0.043828 * PC1 -1.0078 0.2950 -3.416 0.000636 *** Average.Branch.Diameter 5.9910 4.1130 1.457 0.145230 Average.Branch.Spacing 6.5972 2.8974 2.277 0.022788 * PC1:Average.Branch.Diameter 1.0415 0.2756 3.779 0.000158 *** Average.Branch.Diameter:Average.Branch.Spacing -6.5321 3.3292 -1.962 0.049755 * PC1:Average.Branch.Spacing 0.6469 0.1629 3.971 7.17e-05 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for Negative Binomial(61302.24) family taken to be 1) Null deviance: 1331.8 on 177 degrees of freedom Residual deviance: 641.1 on 171 degrees of freedom (3 observations deleted due to missingness) AIC: 729.44 Number of Fisher Scoring iterations: 1 Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : invalid 'nsmall' argument I am also getting this Error message for my other model for this species:> model1<-glm.nb(X...of.C..viridis~Depth+Coral.Species+Reef.Type+Depth*Coral.Species+Depth*Reef.Type+Coral.Species*Reef.Type)Error: no valid set of coefficients has been found: please supply starting values I have tried everything I could find online. I tried family=negative.binomial(theta) but I was unsure what to use for theta so didn't find it reliable for estimating dispersion. I have also tried negbin in the library aod but it is coming up with NA values where the p-values/deviance etc should be. I was hoping to find out if the warning messages make the model invalid and if there was some way to get around the error message in the second model. Any advice would be greatly appreciated. Thank you very much. Kind regards, Lauren Nadler -- View this message in context: http://r.789695.n4.nabble.com/R-Error-Warning-Messages-with-library-MASS-using-glm-tp4556771p4556771.html Sent from the R help mailing list archive at Nabble.com.
Ben Bolker
2012-Apr-14 15:34 UTC
[R] R Error/Warning Messages with library(MASS) using glm.
LNadler <lauren.e.nadler <at> gmail.com> writes:> > Hi there, > > I have been having trouble running negative binomial regression (glm.nb) > using library MASS in R v2.15.0 on Mac OSX. > > I am running multiple models on the variables influencing the group size of > damselfish in coral reefs (count data). For total group size and two of my > species, glm.nb is working great to deal with overdispersion in my count > data. For two of my species, I am getting a mixture of warning and error > messages. These species are different from the others in that they have > slightly smaller group sample size (so there are more 0s) and the group size > varies more widely (from 1 to 45 fish versus 1 to 11 fish for the other > species for which the model is working). Here is a sample of my output and > the error messages:> > > model1<-glm.nb(X...of.C..viridis~PC1+Average.Branch.Diameter+ > Average.Branch.Spacing+PC1*Average.Branch.Diameter+ > Average.Branch.Diameter*Average.Branch.Spacing+PC1*Average.Branch.Spacing)By the way, this model specification is equivalent to X...of.C..viridis~(PC1+Average.Branch.Diameter+Average.Branch.Spacing)^2 (i.e. main effects plus all pairwise interactions); also, A*B stands for interactions plus all main effects (A + B+ A:B) so your formula is slightly (harmlessly) redundant> There were 50 or more warnings (use warnings() to see the first 50) > > warnings() > Warning messages: > 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace > control$trace > ... : > iteration limit reached > 2: In sqrt(1/i) : NaNs produced > 3: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace > control$trace > ... : > iteration limit reached (ETC...CONTINUES ON THE SAME FOR 50 WARNINGS)> > (Dispersion parameter for Negative Binomial(61302.24) family taken to be 1)This estimated dispersion parameter is ridiculously large ... either your data are effectively Poisson, or the the theta estimate is questionable.> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : > invalid 'nsmall' argumentThis error actually derives not from the model fitting itself, but from R's attempt to print the summary -- the print.summary method is trying to use the standard error of theta to determine the format, but that's NA (or NaN, I forget) in this case. Here's a reproducible example I came up with several years ago that shows a similar pathology; I was able to get around it using bbmle::mle2 ... ############################################# library(MASS) tadT <- c(0,0,0,450,0,0,0,315,233,0,200,300) Distance <- rep(c(10,100,500),rep(4,3)) d <- data.frame(tadT,Distance) plot(tadT) plot(Distance,jitter(tadT)) library(ggplot2) ggplot(d,aes(Distance,tadT))+stat_sum(aes(size=factor(..n..)))+ theme_bw()+geom_smooth(method="glm",family="quasipoisson") ## constant model tad.con <- glm.nb(tadT ~ 1, control=glm.control(trace=10)) log(mean(tadT)) ## OK, it got the mean right at least ... tad.con$theta ss <- summary(tad.con) print(ss) ## error ## Error in prettNum(...) invalid 'nsmall' argument ## debug(MASS:::print.summary.negbin) ## x$SE.theta is NaN so dp is NaN so format(...,nsmall=dp) ## gives an error ## given more iterations to get where it's going, it goes ## out of control (and crashes) glm.nb(tadT ~ 1, control=glm.control(trace=10,maxit=100)) ## doesn't help to make an identity link glm.nb(tadT ~ 1,link=identity, control=glm.control(trace=10)) glm.nb(tadT ~ 1,link=identity, control=glm.control(trace=10,maxit=100)) library(bbmle) ## can also use starting values of 0,0, but get warnings m1 <- mle2(tadT~dnbinom(mu=exp(logmu),size=exp(logk)), data=d, start=list(logmu=4,logk=-1)) confint(m1) ## works OK p1 <- profile(m1) plot(p1) ## also try as a function of distance tad.dist <- glm.nb(tadT ~ Distance) ## with trace/maxit set glm.nb(tadT ~ Distance,control=glm.control(trace=10,maxit=100)) ## does setting init.theta <<1 help? glm.nb(tadT ~ Distance,init.theta=1e-6,control=glm.control(trace=10,maxit=100)) glm.nb(tadT ~ Distance,init.theta=1e-1,control=glm.control(trace=10,maxit=100)) m2 <- mle2(tadT~dnbinom(mu=exp(logmu),size=exp(logk)), data=d, parameters=list(logmu~Distance), start=list(logmu=4,logk=-1)) summary(m2) p2 <- profile(m2) plot(p2) confint(p2) anova(m1,m2) ## setting all zeros to 1 gives reasonable answers but??? tadT1 <- tadT tadT1[tadT1==0] <- 1 glm.nb(tadT1 ~ Distance,control=glm.control(trace=10,maxit=100)) ## quasipoisson does work ... but no likelihood/AIC available glm.q1 <- glm(tadT ~ 1,family=quasipoisson()) glm.qd <- glm(tadT ~ Distance,family=quasipoisson())