Meredith Jantzen
2011-Sep-26 16:44 UTC
[R] normalizing a negative binomial distribution and/or incorporating variance structures in a GAMM
Hello everyone, Apologies in advance, as this is partially a stats question and partially an R question. I have been using a GAM to model the activity level of bats going into and coming out from a forested edge. I had eight microphones set up in a line transect at each of eight sites, and I am hoping to construct a model for each of 7 species. My count data has a reverse J-shaped skew and is overdispersed with a fair amount of zeros, and I haven't found any transformations that will completely normalize it (I've tried square roots and logs). Meanwhile, the variance in call numbers varies between sites and between microphones. I wanted to use a GAMM to incorporate varComb and varIdent, but these can only be applied on data with a gaussian distribution. Are there any packages I should be looking into that I don't know about that will apply a variance structure on a negative binomial distribution? Or is there some transformation that I should be using that will solve my normality issues? I've been searching the R-help boards, everything in Zuur and Woods, but I haven't found an answer yet. Thanks very much, I am very appreciative of any help I can get on this matter. Sincerely, Meredith Jantzen M.Sc. Candidate Department of Biology University of Western Ontario [[alternative HTML version deleted]]
Ben Bolker
2011-Sep-27 18:41 UTC
[R] normalizing a negative binomial distribution and/or incorporating variance structures in a GAMM
Meredith Jantzen <mjantzen <at> uwo.ca> writes:> ?Hello everyone, Apologies in advance, as this is partially a stats > question and partially an R question.? I have been using a GAM to > model the activity level of bats going into and coming out from a > forested edge.? I had eight microphones set up in a line transect at > each of eight sites, and I am hoping to construct a model for each > of 7 species.?> My count data has a reverse J-shaped skew and is overdispersed with > a fair amount of zeros, and I haven't found any transformations that > will completely normalize it (I've tried square roots and logs).? > Meanwhile, the variance in call numbers? varies between sites and > between microphones.? I wanted to use a GAMM to incorporate varComb > and varIdent, but these can only be applied on data with a gaussian > distribution.?> Are there any packages I should be looking into that I don't know > about that will apply a variance structure on a negative binomial > distribution?? Or is there some transformation that I should be > using that will solve my normality issues?? I've been searching the > R-help boards, everything in Zuur and Woods, but I haven't found an > answer yet.?I'm not entirely clear about this, but this question and the previous question that Simon Wood answered (about neg binom and GAMM) suggest to me that you might be going in slightly the wrong direction. If your data are non-normally distributed, your choices are typically (1) pick an alternative family of distributions to characterize the variation (e.g. neg binomial or ZINB), (2) use some form of robust estimation (e.g. rlm in the MASS package), or (3) try to find a transformation of the data that makes the data normal (and/or homoscedastic, and/or linear with respect to the predictor variables). Among ecologists #3 is the classical approach and #1 is the most common modern approach. Combining #1 and #3 doesn't make that much sense to me. One doesn't necessarily expect the variance to be constant in a negative binomial model; are the *standardized* residuals heteroscedastic? (i.e. does the boxplot of residuals(m,type="pearson") vs site, microphone, or site*microphone combination look funky?) It's not absolutely clear whether you need zero-inflation explicitly or not. There are tests for zero-inflation and overdispersion (see ref below), but I don't know of any that are implemented in R ... your choices seem to be * negative binomial in mgcv:gam, without zero-inflation; * ZINB in pscl, without the sophisticated GAM machinery of mgcv (but you can use spline terms via splines::ns(v,n) where v is the predictor variable and n is the number of knots -- it just won't do all the slick automatic complexity selection that mgcv does) * it looks like the COZIGAM package will do zero-inflated GAMs, but it doesn't do negative binomials ... @article{deng_score_2005, title = {Score tests for zero-inflation and over-dispersion in generalized linear models}, volume = {15}, url = {http://www3.stat.sinica.edu.tw/statistica/j15n1/j15n115/j15n115.html}, journal = {Statistica Sinica}, author = {Deng, D. and Paul, {S.R.}}, year = {2005}, pages = {257?276} }