SSimek
2012-Jun-26  19:10 UTC
[R] Zero inflated: is there a limit to the level of inflation
Hello, I have count data that illustrate the presence or absence of individuals in my study population. I created a grid cell across the study area and calcuated a count value for each individual per season per year for each grid cell. The count value is the number of time an individual was present in each grid cell. For illustration my data columns look something like this and are repeated for each individual: Cell_ID Param1 Param2 Param3 Param4 COUNT Name Year Season Cov 1 160.565994 729.08 1503 7930.3 0 AA 2010 AUT Open 1 160.565994 729.08 1503 7930.3 22 AA 2011 SPR Open 1 160.565994 729.08 1503 7930.3 12 AA 2009 SUM Open 1 160.565994 729.08 1503 7930.3 0 AA 2010 SUM Open 2 169.427001 491.87 1503.31 5101.09 0 AA 2010 AUT oldHard 2 169.427001 491.87 1503.31 5101.09 16 AA 2011 SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2009 SUM oldHard 2 169.427001 491.87 1503.31 5101.09 0 AA 2010 SUM oldHard ? 563 86.777099 612.69 977 4474.6 62 AA 2010 AUT Water 563 86.777099 612.69 977 4474.6 12 AA 2011 SPR Water 563 86.777099 612.69 977 4474.6 55 AA 2009 SUM Water 1 160.565994 729.08 1503 7930.3 0 BB 2010 SUM Open 2 169.427001 491.87 1503.31 5101.09 72 BB 2010 SUM oldHard 5 160.75 614.95 1503.31 2878.98 16 BB 2010 SUM medHard 6 170.404998 510.58 1489.44 743.14 0 BB 2010 SUM Water ? 563 86.777099 612.69 977 4474.6 0 BB 2010 SUM Water 1 160.565994 729.08 1503 7930.3 14 C 2005 AUT Open 1 160.565994 729.08 1503 7930.3 0 C 2006 AUT Open 1 160.565994 729.08 1503 7930.3 0 C 2006 SPR Open 1 160.565994 729.08 1503 7930.3 56 C 2007 SPR Open 1 160.565994 729.08 1503 7930.3 0 C 2006 SUM Open 2 169.427001 491.87 1503.31 5101.09 124 C 2005 AUT oldHard 2 169.427001 491.87 1503.31 5101.09 231 C 2006 AUT oldHard 2 169.427001 491.87 1503.31 5101.09 889 C 2006 SPR oldHard 2 169.427001 491.87 1503.31 5101.09 0 C 2007 SPR oldHard ? 563 86.777099 612.69 977 4474.6 0 C 2005 AUT Water 563 86.777099 612.69 977 4474.6 231 C 2006 AUT Water 563 86.777099 612.69 977 4474.6 185 C 2006 SPR Water 563 86.777099 612.69 977 4474.6 123 C 2007 SPR Water 563 86.777099 612.69 977 4474.6 52 C 2006 SUM Water I have 563 grid cells across my study area and each individual has 1-563 cells associated for each year and each season the individual was monitored. Therefore my grid cells are repeated. I end up with 71,000 records and 925 records have a Count value >0; which means 70,075 records have a Count value = 0. I wanted to run a zero inflated poisson model to determine mixed effects (of parameters) with individual as the random effect. But I have been advised two things: 1. I cannot run a zero inflated poisson model because my data are too "extremely" inflated (i.e. 70,075 vs 925) and 2. I cannot run the model with each cell repeated for each individual. I am told the model doesn't recognize that Cell_ID #1 for individual "A" is the same Cell_ID #1 for individual "B". Does anyone know if either or both of these points are true? I would appreciate any thoughts, advice, or suggestions. Thanks! -Stephanie -- View this message in context: http://r.789695.n4.nabble.com/Zero-inflated-is-there-a-limit-to-the-level-of-inflation-tp4634532.html Sent from the R help mailing list archive at Nabble.com.
Marc Schwartz
2012-Jun-26  21:31 UTC
[R] Zero inflated: is there a limit to the level of inflation
On Jun 26, 2012, at 2:10 PM, SSimek wrote:> Hello, > > I have count data that illustrate the presence or absence of individuals in > my study population. I created a grid cell across the study area and > calcuated a count value for each individual per season per year for each > grid cell. The count value is the number of time an individual was present > in each grid cell. For illustration my data columns look something like > this and are repeated for each individual: > > Cell_ID Param1 Param2 Param3 Param4 COUNT Name Year Season Cov > 1 160.565994 729.08 1503 7930.3 0 AA 2010 AUT Open > 1 160.565994 729.08 1503 7930.3 22 AA 2011 SPR Open > 1 160.565994 729.08 1503 7930.3 12 AA 2009 SUM Open > 1 160.565994 729.08 1503 7930.3 0 AA 2010 SUM Open > 2 169.427001 491.87 1503.31 5101.09 0 AA 2010 AUT oldHard > 2 169.427001 491.87 1503.31 5101.09 16 AA 2011 SPR oldHard > 2 169.427001 491.87 1503.31 5101.09 0 AA 2009 SUM oldHard > 2 169.427001 491.87 1503.31 5101.09 0 AA 2010 SUM oldHard > ? > 563 86.777099 612.69 977 4474.6 62 AA 2010 AUT Water > 563 86.777099 612.69 977 4474.6 12 AA 2011 SPR Water > 563 86.777099 612.69 977 4474.6 55 AA 2009 SUM Water > > > 1 160.565994 729.08 1503 7930.3 0 BB 2010 SUM Open > 2 169.427001 491.87 1503.31 5101.09 72 BB 2010 SUM oldHard > 5 160.75 614.95 1503.31 2878.98 16 BB 2010 SUM medHard > 6 170.404998 510.58 1489.44 743.14 0 BB 2010 SUM Water > ? > 563 86.777099 612.69 977 4474.6 0 BB 2010 SUM Water > > > 1 160.565994 729.08 1503 7930.3 14 C 2005 AUT Open > 1 160.565994 729.08 1503 7930.3 0 C 2006 AUT Open > 1 160.565994 729.08 1503 7930.3 0 C 2006 SPR Open > 1 160.565994 729.08 1503 7930.3 56 C 2007 SPR Open > 1 160.565994 729.08 1503 7930.3 0 C 2006 SUM Open > 2 169.427001 491.87 1503.31 5101.09 124 C 2005 AUT oldHard > 2 169.427001 491.87 1503.31 5101.09 231 C 2006 AUT oldHard > 2 169.427001 491.87 1503.31 5101.09 889 C 2006 SPR oldHard > 2 169.427001 491.87 1503.31 5101.09 0 C 2007 SPR oldHard > ? > 563 86.777099 612.69 977 4474.6 0 C 2005 AUT Water > 563 86.777099 612.69 977 4474.6 231 C 2006 AUT Water > 563 86.777099 612.69 977 4474.6 185 C 2006 SPR Water > 563 86.777099 612.69 977 4474.6 123 C 2007 SPR Water > 563 86.777099 612.69 977 4474.6 52 C 2006 SUM Water > > > > I have 563 grid cells across my study area and each individual has 1-563 > cells associated for each year and each season the individual was monitored. > Therefore my grid cells are repeated. I end up with 71,000 records and 925 > records have a Count value >0; which means 70,075 records have a Count value > = 0. > > I wanted to run a zero inflated poisson model to determine mixed effects (of > parameters) with individual as the random effect. But I have been advised > two things: > > 1. I cannot run a zero inflated poisson model because my data are too > "extremely" inflated (i.e. 70,075 vs 925) and > > 2. I cannot run the model with each cell repeated for each individual. I am > told the model doesn't recognize that Cell_ID #1 for individual "A" is the > same Cell_ID #1 for individual "B". > > Does anyone know if either or both of these points are true? I would > appreciate any thoughts, advice, or suggestions. > > Thanks! > > -StephanieHi Stephanie, Some comments: 1. You should think about or at least be open to a zero inflated negative binomial distribution rather than zero inflated poisson. 2. You should at least review the vignette for the pscl CRAN package, which provides standard fixed effects models and related functions for count based data and importantly, some good conceptual content: http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf 3. Given the repeated measures framework and correlation issues you likely have, you should subscribe to and re-post your query to the R-sig-mixed-models list: https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models which will avail you of experts in the field. 4. There is also a draft FAQ for mixed models here: http://glmm.wikidot.com/faq which I believe is maintained by Ben Bolker, who actively participates in the above list. Based upon the content there, I suspect that you will be pointed to the glmmADMB package which is on R-Forge (http://glmmadmb.r-forge.r-project.org/) and can handle zero inflated mixed effects models of at least some types. 5. If all else fails, just to plant a seed, you might want to consider a mixed effects logistic regression model with a binary response, since you appear to have a relatively small "event" incidence in your data. The above list will also be helpful in that setting and you would likely be pointed to the glmer() function in the lme4 package for that application, which provides for GLMs in a mixed effects framework. Regards, Marc Schwartz