Ben Raymond
2010-Nov-15 10:32 UTC
[R] comparing levels of aggregation with negative binomial models
Dear R community, I would like to compare the degree of aggregation (or dispersion) of bacteria isolated from plant material. My data are discrete counts from leaf washes. While I do have xy coordinates for each plant, it is aggregation in the sense of the concentration of bacteria in high density patches that I am interested in. My attempt to analyze this was to fit negative binomial glms to each of my leaf treatments (using MASS) and to compare estimates of theta and use the standard errors to calculate confidence limits. My values of theta (se) were 0.387 (0.058) and 0.1035 (0.015) which were in the right direction for my hypothesis. However, some of the stats literature suggests that the confidence intervals of theta (or k) are not very robust and it would be better to calculate confidence intervals for 1/k. Is there a way I can estimate confidence intervals for 1/k in R, or indeed a more elegant way of looking at aggregation? Many thanks for your time. yours, Dr Ben Raymond, NERC Advanced Research Fellow, Lecturer in Population Genetics, School of Biological Sciences, Royal Holloway University of London, Egham, Surrey. TW20 0EX tel 0044 1784443547 Ben.Raymond at rhul.ac.uk
Ben Bolker
2010-Nov-15 23:50 UTC
[R] comparing levels of aggregation with negative binomial models
Ben Raymond <Ben.Raymond <at> rhul.ac.uk> writes:> > Dear R community, > > I would like to compare the degree of aggregation (or dispersion) of > bacteria isolated from plant material. My data are discrete counts > from leaf washes. While I do have xy coordinates for each plant, it > is aggregation in the sense of the concentration of bacteria in high > density patches that I am interested in. > My attempt to analyze this was to fit negative binomial glms to each > of my leaf treatments (using MASS) and to compare estimates of theta > and use the standard errors to calculate confidence limits. My > values of theta (se) were 0.387 (0.058) and 0.1035 (0.015) which were > in the right direction for my hypothesis. However, some of the stats > literature suggests that the confidence intervals of theta (or k) are > not very robust and it would be better to calculate confidence > intervals for 1/k. Is there a way I can estimate confidence intervals > for 1/k in R, or indeed a more elegant way of looking at aggregation?Slightly surprising, there isn't a really really easy way to do this, but here's a suggestion. The mle2() function in the bbmle package can fit negative binomial models (although a bit less efficiently and stably than glm.nb in the MASS package); it allows you to construct a profile confidence interval for the k parameter (profile confidence intervals are invariant to transformation of the parameter, so you'd get the same answer if you wrote a model with 1/k as the parameter). library(MASS) quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine) summary(quine.nb1) confint(quine.nb1) ## only gives CIs on fixed effects library(bbmle) npar <- length(coef(quine.nb1)) quine.nb1M <- mle2(Days~dnbinom(mu=exp(logmu),size=k), parameters=list(logmu~Sex/(Age+Eth*Lrn)), data=quine, start=list(logmu=1,k=1.6), lower=c(rep(-Inf,npar),0), method="L-BFGS-B") all.equal(coef(quine.nb1),coef(quine.nb1M)[1:npar]) ## mean relative difference: 1.7e-4 pp <- profile(quine.nb1M,which="k") plot(pp) confint(pp) confint(quine.nb1M,method="quad",par="k") 1.59799+c(-1,1)*1.96*0.213 ## based on summary(quine.nb1) The profile shows that the quadratic approximation (equivalent to the way that glm.nb is estimating the standard error of the parameter) is pretty good in this case, but you can see how it works for you data. It would be more efficient to write a profiling function that wrapped glm.nb, but as far as I know that hasn't been done. Another alternative would be to fit a lognormal-Poisson model in glmer (lme4 package), but that's opening another can of worms ...