Hi there, I'm pretty new to the field of fitting (anything). I try to fit a distribution with mle, because my real data seems to follow a zero-inflated poisson distribution. So far, I tried a simple example to see whether I understand how to do it or not: # example count data x <- 0:10 y <- dpois(x, lambda = 1.4) # zero-inflated poisson zip <- function(x, lambda, prop) { (1 - prop)*dpois(x,0) + prop*dpois(x,lambda) } ll <- function(lambda = 2, prop = 0.9) { y.fit <- zip(x, lambda, prop) sum( (y - y.fit)^2 ) } fit <- mle(ll) So far, so good. The result gives me lambda prop 1.4 1.0 which is pretty nice. But what goes wrong if I want to display confidence intervals? I get a lot of warnings but I simply don't know why... confint(fit) Has it something to do with constraints for my parameters (lambda should be > than zero and prop should range from 0 to 1)? Do I have to put it into the ll-function? Is there any general comment on what I'm doing? Antje
Antje Niederlein <niederlein-rstat <at> yahoo.de> writes: [snip]> But what goes wrong if I want to display confidence intervals? I get a > lot of warnings but I simply don't know why... > > confint(fit) > > Has it something to do with constraints for my parameters (lambda > should be > than zero and prop should range from 0 to 1)? Do I have to > put it into the ll-function? > > Is there any general comment on what I'm doing? >You are exactly right, it is caused by violations of the bounds on the parameters. The reason you see warnings when you ask for confidence intervals and not for the original fit is that confint() is computing profile confidence intervals, which force it to evaluate the likelihood function over a much wider range of possibilities. There are four reasonable solutions to your problems: 1. ignore the warnings, as long as they are all of the same type (NaNs/NAs being produced by dbinom or dpois), and as long as the final results look sensible. 2. use method="L-BFGS-B" and set lower and upper bounds on your parameters (this can be a little bit finicky because L-BFGS-B will often try parameters *on* the boundary, and it can't handle NAs or infinities, so you may have to set the lower and upper bounds a little bit in from their theoretical limits (e.g. 0.002 instead of 0). 3. Fit your parameters on the transformed scale (typically logit for probabilities, log for Poisson intensities). This will cause problems if the parameter really lies on the boundary, e.g. if the best estimate of your zero-inflation parameter is zero or very close to it. 4. Use the pscl package, which has reasonably robust and efficient built-in functions for fitting zero-inflated (and hurdle) models. good luck, Ben Bolker
Hi Ben, thanks a lot for your answer.> There are four reasonable solutions to your problems: > > 1. ignore the warnings, as long as they are all of the > same type (NaNs/NAs being produced by dbinom or dpois), > and as long as the final results look sensible.probably fine for me. The fit for my dummy data was nice, but for the real data it's not so nice... (see below - other ideas...)> 2. use method="L-BFGS-B" and set lower and upper bounds > on your parameters (this can be a little bit finicky because > L-BFGS-B will often try parameters *on* the boundary, and > it can't handle NAs or infinities, so you may have to set > the lower and upper bounds a little bit in from their theoretical > limits (e.g. 0.002 instead of 0).I tried it but even if I use the following statement, I get the warnings with confint() fit <- mle(ll, method = "L-BFGS-B", lower = c(0.001,0), upper = c(Inf,1)) I added the output of the current parameters for my ll-function and obviously the second parameter goes far beyond the limit. Is there anything wrong with how I tried to set the limits? Is it possible that mle() takes the boundaries into account but confint() does not?> 3. Fit your parameters on the transformed scale (typically logit > for probabilities, log for Poisson intensities). This will cause > problems if the parameter really lies on the boundary, e.g. > if the best estimate of your zero-inflation parameter is zero > or very close to it.Not my prefered solution (I'm too new to this area and afraid to do anything wrong)> 4. Use the pscl package, which has reasonably robust and > efficient built-in functions for fitting zero-inflated (and > hurdle) models.I played around with it but I cannot find a way how to simply estimate the parameters for my distribution. My data is simply discrete histogram data (counts) and I'm probably too stupid to put it into a model... If you can give me any hint - I would be happy. Other ideas concerning my approach: Do I use the right criteria to minimize on (so far I use the sum of squared errors). May it make sense to use the pearsons chi-squared test? (Is there any easy way to do it in R?) Ciao, Antje
Hi Ben, thanks a lot for your answer.> There are four reasonable solutions to your problems: > > 1. ignore the warnings, as long as they are all of the > same type (NaNs/NAs being produced by dbinom or dpois), > and as long as the final results look sensible.probably fine for me. The fit for my dummy data was nice, but for the real data it's not so nice... (see below - other ideas...)> 2. use method="L-BFGS-B" and set lower and upper bounds > on your parameters (this can be a little bit finicky because > L-BFGS-B will often try parameters *on* the boundary, and > it can't handle NAs or infinities, so you may have to set > the lower and upper bounds a little bit in from their theoretical > limits (e.g. 0.002 instead of 0).I tried it but even if I use the following statement, I get the warnings with confint() fit <- mle(ll, method = "L-BFGS-B", lower = c(0.001,0), upper = c(Inf,1)) I added the output of the current parameters for my ll-function and obviously the second parameter goes far beyond the limit. Is there anything wrong with how I tried to set the limits? Is it possible that mle() takes the boundaries into account but confint() does not?> 3. Fit your parameters on the transformed scale (typically logit > for probabilities, log for Poisson intensities). This will cause > problems if the parameter really lies on the boundary, e.g. > if the best estimate of your zero-inflation parameter is zero > or very close to it.Not my prefered solution (I'm too new to this area and afraid to do anything wrong)> 4. Use the pscl package, which has reasonably robust and > efficient built-in functions for fitting zero-inflated (and > hurdle) models.I played around with it but I cannot find a way how to simply estimate the parameters for my distribution. My data is simply discrete histogram data (counts) and I'm probably too stupid to put it into a model... If you can give me any hint - I would be happy. Other ideas concerning my approach: Do I use the right criteria to minimize on (so far I use the sum of squared errors). May it make sense to use the pearsons chi-squared test? (Is there any easy way to do it in R?) Ciao, Antje