Dear all, I want to build a model in R based on animal collection data, that look like the following Nr Village District Site Survey Species Count 1 AX A F Dry B 0 2 AY A V Wet A 5 3 BX B F Wet B 1 4 BY B V Dry B 0 Each data point shows one collection unit in a certain Village, District, Site, and Survey for a certain Species. 'Count' is the number of animals collected in that collection unit. It is possible that zero animals are collected in that unit because of very low densities, but also because of climatic conditions (wind, rain, etc), so we would expect an excess in zeroes. I have tested that the data are overdispersed (variance much bigger than mean), so a zero-inflated negative binomial model seems the most suitable model in this case. To be sure, I will compare the zero-inflated model to the standard binomial model using the vuong test. The models will be made for each species separately. For these models I can use the glm.nb(), and the and zeroinfl () in the package pscl, looking something like this (after selection of the subset B<-subset(data, Species=="B")): NB=glm.nb(formula = Count ~ District+Site+Survey, data = B) ZINB=zeroinfl(formula = Count ~ District+Site+Survey, dist="negbin", data = B) Vuong(NB,ZINB) I have tried this and it works very elegantly. However, the animal collections were only done in 4 districts, and in each district 3 villages were chosen (a total of 12 villages). This should be included in the design. The package survey allows this for the standard negative binomial model, but it seems to me that it is not possible for the zero-inflated NB. So, my question is two-fold: 1. Is a zero-inflated NB possible in the survey package. If yes, how? 2. If no, how can I build a zero-inflated NB model that takes into account the clustering of the observations (animal counts) in villages and the clustering of the villages in districts. Thank you very much for the help. ITM Colloquium Antwerp, Belgium 3-5 December 2012 www.itg.be/colloq2012 Disclaimer: Http://www.itg.be/disclaimer Directions to our location(s): http://g.co/maps/ua89b
Lies Durnez <ldurnez <at> itg.be> writes:> I want to build a model in R based on animal collection data, that look likethe following> > Nr Village District Site Survey Species Count > 1 AX A F Dry B 0 > 2 AY A V Wet A 5 > 3 BX B F Wet B 1 > 4 BY B V Dry B 0> Each data point shows one collection unit in a certain Village, > District, Site, and Survey for a certain Species. 'Count' is the > number of animals collected in that collection unit. It is possible > that zero animals are collected in that unit because of very low > densities, but also because of climatic conditions (wind, rain, > etc), so we would expect an excess in zeroes. I have tested that the > data are overdispersed (variance much bigger than mean), so a > zero-inflated negative binomial model seems the most suitable model > in this case.[snip snip snip]> However, the animal collections were only done in 4 districts, and > in each district 3 villages were chosen (a total of 12 > villages). This should be included in the design. The package survey > allows this for the standard negative binomial model, but it seems > to me that it is not possible for the zero-inflated NB. So, my > question is two-fold: 1. Is a zero-inflated NB possible in the > survey package. If yes, how? 2. If no, how can I build a > zero-inflated NB model that takes into account the clustering of the > observations (animal counts) in villages and the clustering of the > villages in districts.Treating villages and districts as random effects (clusters) basically puts you in the domain of generalized linear mixed models. You can use the glmmADMB package to fit zero-inflated, mixed negative binomial models. You can also use the MCMCglmm package to fit lognormal-Poisson models, which are another form of overdispersed count data (it depends how strongly you require that the actual model be NB as opposed to just a reasonable model for overdispersed count data). 4 districts is not very many for estimating an among-district variance (which is basically what you are doing when you fit a clustered/ mixed model), so I might suggest using district as a fixed effect, but then using district:village (i.e. the interaction between district and village, or village alone if they are uniquely labeled). http://glmm.wikidot.com/faq may be useful. I would suggest that you send follow-ups to the r-sig-mixed-models <at> r-project.org mailing list.