Strubbe Diederik
2010-Jun-02 10:56 UTC
[R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"
Dear all, I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1, Windows XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit variables. However, most models fail because “the global deviance is increasing” and I am not sure what causes this behaviour. The dataset consists of counts of birds (duck) and 5 habit variables measured in the field (n= 182). The dependent variable (the number of ducks counted)’suffers’ from zero-inflation and overdisperion:> proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182) > mean <- mean(data$duck) > var <- var(data$duck) > proportion_non_zero[1] 0.1153846> mean[1] 1.906593> var[1] 37.35587 (I have no idea how to simulate a zero-inflated overdispersed Poisson variable, but the data used can be found at http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519). First, I create a (strong) pattern in the dataset by: data$LFAP200 <- data$LFAP200 + (data$duck*data$duck) I try to analyze these data by fitting several possible distributions (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and type II NBI NBII and zero-inflated negative binomial ZINBI) while using cubic splines with a df=3. The best fitting model will then be choses on the basis of its AIC. However, these models frequently fail to converge, and I am not sure why, and what to do about it. For example:> model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= PO)GAMLSS-RS iteration 1: Global Deviance = 1350.623 GAMLSS-RS iteration 2: Global Deviance = 1350.623> model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= ZIP)GAMLSS-RS iteration 1: Global Deviance = 326.3819 GAMLSS-RS iteration 2: Global Deviance = 225.1232 GAMLSS-RS iteration 3: Global Deviance = 319.9663 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate In addition: There were 14 warnings (use warnings() to see them)> model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= NBI)GAMLSS-RS iteration 1: Global Deviance = 291.8607 GAMLSS-RS iteration 2: Global Deviance = 291.3291 ######......###### GAMLSS-RS iteration 4: Global Deviance = 291.1135 GAMLSS-RS iteration 20: Global Deviance = 291.107 Warning message: In RS() : Algorithm RS has not yet converged> model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= NBII)GAMLSS-RS iteration 1: Global Deviance = 284.5993 GAMLSS-RS iteration 2: Global Deviance = 281.9548 ######......###### GAMLSS-RS iteration 5: Global Deviance = 280.7311 GAMLSS-RS iteration 15: Global Deviance = 280.6343> model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family= ZINBI)GAMLSS-RS iteration 1: Global Deviance = 1672.234 GAMLSS-RS iteration 2: Global Deviance = 544.742 GAMLSS-RS iteration 3: Global Deviance = 598.9939 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate Thus, in this case, only the Poisson (PO) and Negative Binomial type I (NBI)converge whereas all other models fail… My first approach was to omit the smoothing factors for each model, or further reduce the number of variables but this does not solve the problem and most models fail, often yielding a “Error in RS() : The global deviance is increasing” message. I would think that, given the fact that the dependent variable is zero-inflated and overdispersed, that the Zero-Inflated Negative Binomial (ZINBI) distribution would be the best fit, but the ZINBI even fails in the following very simple examples.> model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family= ZINBI)GAMLSS-RS iteration 1: Global Deviance = 3508.533 GAMLSS-RS iteration 2: Global Deviance = 1117.121 GAMLSS-RS iteration 3: Global Deviance = 652.5771 GAMLSS-RS iteration 4: Global Deviance = 632.8885 GAMLSS-RS iteration 5: Global Deviance = 645.1169 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate> model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI)GAMLSS-RS iteration 1: Global Deviance = 3831.864 GAMLSS-RS iteration 2: Global Deviance = 1174.605 GAMLSS-RS iteration 3: Global Deviance = 562.5428 GAMLSS-RS iteration 4: Global Deviance = 344.0637 GAMLSS-RS iteration 5: Global Deviance = 1779.018 Error in RS() : The global deviance is increasing Try different steps for the parameters or the model maybe inappropriate Any suggestions on how to proceed with this? Many thanks in advance, Diederik Diederik Strubbe Evolutionary Ecology Group Department of Biology University of Antwerp Groenenborgerlaan 171 2020 Antwerpen, Belgium tel: +32 3 265 3464 [[alternative HTML version deleted]]
Joris Meys
2010-Jun-02 11:45 UTC
[R] Problems using gamlss to model zero-inflated and overdispersed count data: "the global deviance is increasing"
Hi Diederik, I can't say immediately why your models fail without seeing the data and running a number of tests. You could play around with the other parameters, as the problem might be related to the optimization algorithm. A different approach would be using the methods in the vegan package. This package contains a set of ordination methods that allow to map environmental variables on the ordination space. You could also use the hurdle model from the pscl library, but this is not guaranteed to work better. It is not a zero-inflation model, but a two-component model that fits the zeros using eg a binomial and the rest of the counts using eg a poisson or a negative binomial. Cheers Joris On Wed, Jun 2, 2010 at 12:56 PM, Strubbe Diederik <diederik.strubbe@ua.ac.be> wrote:> Dear all, > > I am using gamlss (Package gamlss version 4.0-0, R version 2.10.1, Windows > XP Service Pack 3 on a HP EliteBook) to relate bird counts to habit > variables. However, most models fail because “the global deviance is > increasing” and I am not sure what causes this behaviour. The dataset > consists of counts of birds (duck) and 5 habit variables measured in the > field (n= 182). The dependent variable (the number of ducks > counted)’suffers’ from zero-inflation and overdisperion: > > > proportion_non_zero <- (sum(ifelse(data$duck == 0,0,1))/182) > > mean <- mean(data$duck) > > var <- var(data$duck) > > proportion_non_zero > [1] 0.1153846 > > mean > [1] 1.906593 > > var > [1] 37.35587 > > (I have no idea how to simulate a zero-inflated overdispersed Poisson > variable, but the data used can be found at > http://www.ua.ac.be/main.aspx?c=diederik.strubbe&n=23519). > > > First, I create a (strong) pattern in the dataset by: > data$LFAP200 <- data$LFAP200 + (data$duck*data$duck) > > I try to analyze these data by fitting several possible distributions > (Poisson PO, zero-inflated Poisson ZIP, negative binomial type I and type II > NBI NBII and zero-inflated negative binomial ZINBI) while using cubic > splines with a df=3. The best fitting model will then be choses on the basis > of its AIC. > > However, these models frequently fail to converge, and I am not sure why, > and what to do about it. For example: > > > model_Poisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family> PO) > GAMLSS-RS iteration 1: Global Deviance = 1350.623 > GAMLSS-RS iteration 2: Global Deviance = 1350.623 > > > model_ZIPoisson <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family> ZIP) > GAMLSS-RS iteration 1: Global Deviance = 326.3819 > GAMLSS-RS iteration 2: Global Deviance = 225.1232 > GAMLSS-RS iteration 3: Global Deviance = 319.9663 > Error in RS() : The global deviance is increasing > Try different steps for the parameters or the model maybe inappropriate > In addition: There were 14 warnings (use warnings() to see them) > > > model_NBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family> NBI) > GAMLSS-RS iteration 1: Global Deviance = 291.8607 > GAMLSS-RS iteration 2: Global Deviance = 291.3291 > ######......###### > GAMLSS-RS iteration 4: Global Deviance = 291.1135 > GAMLSS-RS iteration 20: Global Deviance = 291.107 > Warning message: > In RS() : Algorithm RS has not yet converged > > > model_NBII <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family> NBII) > GAMLSS-RS iteration 1: Global Deviance = 284.5993 > GAMLSS-RS iteration 2: Global Deviance = 281.9548 > ######......###### > GAMLSS-RS iteration 5: Global Deviance = 280.7311 > GAMLSS-RS iteration 15: Global Deviance = 280.6343 > > > model_ZINBI <- gamlss(duck ~ cs(HHCDI200,df=3) + cs(HHCDI1000,df=3) + > cs(HHHDI200,df=3) + cs(HHHDI1000,df=3) + cs(LFAP200,df=3),data=data,family> ZINBI) > GAMLSS-RS iteration 1: Global Deviance = 1672.234 > GAMLSS-RS iteration 2: Global Deviance = 544.742 > GAMLSS-RS iteration 3: Global Deviance = 598.9939 > Error in RS() : The global deviance is increasing > Try different steps for the parameters or the model maybe inappropriate > > > Thus, in this case, only the Poisson (PO) and Negative Binomial type I > (NBI)converge whereas all other models fail… > > My first approach was to omit the smoothing factors for each model, or > further reduce the number of variables but this does not solve the problem > and most models fail, often yielding a “Error in RS() : The global deviance > is increasing” message. > > I would think that, given the fact that the dependent variable is > zero-inflated and overdispersed, that the Zero-Inflated Negative Binomial > (ZINBI) distribution would be the best fit, but the ZINBI even fails in the > following very simple examples. > > > model_ZINBI <- gamlss(duck ~ cs(LFAP200,df=3),data=data,family= ZINBI) > GAMLSS-RS iteration 1: Global Deviance = 3508.533 > GAMLSS-RS iteration 2: Global Deviance = 1117.121 > GAMLSS-RS iteration 3: Global Deviance = 652.5771 > GAMLSS-RS iteration 4: Global Deviance = 632.8885 > GAMLSS-RS iteration 5: Global Deviance = 645.1169 > Error in RS() : The global deviance is increasing > Try different steps for the parameters or the model maybe inappropriate > > > model_ZINBI <- gamlss(duck ~ LFAP200,data=data,family= ZINBI) > GAMLSS-RS iteration 1: Global Deviance = 3831.864 > GAMLSS-RS iteration 2: Global Deviance = 1174.605 > GAMLSS-RS iteration 3: Global Deviance = 562.5428 > GAMLSS-RS iteration 4: Global Deviance = 344.0637 > GAMLSS-RS iteration 5: Global Deviance = 1779.018 > Error in RS() : The global deviance is increasing > Try different steps for the parameters or the model maybe inappropriate > > > > Any suggestions on how to proceed with this? > > Many thanks in advance, > > > Diederik > > > Diederik Strubbe > Evolutionary Ecology Group > Department of Biology > University of Antwerp > Groenenborgerlaan 171 > 2020 Antwerpen, Belgium > tel: +32 3 265 3464 > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Joris Meys Statistical Consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control Coupure Links 653 B-9000 Gent tel : +32 9 264 59 87 Joris.Meys@Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
Apparently Analagous Threads
- Problem in building a package in R 2.0.0
- gamlss results for EXP and LNO seem to have reversed AIC scores
- Problem: Importing two packages which export a function with the same name
- can't use function vcov with a GAMLSS object??
- low sigma in lognormal fit of gamlss