Lauria, Valentina
2013-Oct-21 16:28 UTC
[R] Predicting hurdle model results on spatial scale
Dear List, I apologise in advance for all my questions. I am interested to predict the habitat selection of fish species using the hurdle model. I know that I can perform this in R with the function predict.hurdle() on newdata, however how this work is not entirely clear. Usually with a two-step approach a binary and a poisson models are created to deal with zero-inflated and over-dispersed data, then the binary model is multiplied by the poisson model in order weight the predictions. Is this already included in the predict.hurdle function? Also I am using the function dredge (from the MuMin package) to select my best model based on AIC, still in this case the best model selected seems to be a combination between the truncated poisson and the binary model (hurdle model). Is there any way that I could dredge the two model components separately? I did some research and in the NEWS section I found that a package pscf was created for this but when I did more digging around I did not have much luck. I would be grateful if someone could help me. Thank you very much once again, Valentina -----Original Message----- From: Achim Zeileis [mailto:Achim.Zeileis at uibk.ac.at] Sent: 18 October 2013 18:57 To: Lauria, Valentina Cc: r-help at r-project.org Subject: Re: [R] hurdle model error why does need integer values for the dependent variable? On Fri, 18 Oct 2013, Lauria, Valentina wrote:> Dear list, > > I am using the hurdle model for modelling the habitat of rare fish > species. However I do get an error message when I try to model my data: > >> test_new1<-hurdle(GALUMEL~ depth + sal + slope + vrm + lat:long + >> offset(log(haul_numb)), dist = "negbin", data = datafit_elasmo) > > Error in hurdle(GALUMEL ~ depth + sal + slope + vrm + lat:long + offset(log(haul_numb)), : > invalid dependent variable, non-integer values > > When I do fit the same model with round(my dependent variable) the > model works. Sorry for the stupid question but could anyone explain me > why? My data are zero inflated (zeros occurring for 78%) and positively skewed.hurdle() fits a count data distribution (poisson, negbin, geometric) by maximum likelihood. Hence, its response needs to be a count variable (i.e., integer). See vignette("countreg", package = "pscl") for the underlying likelihoods employed.> Thank you very much in advance. > Kind Regards, > Valentina > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >