stefan.duke at gmail.com
2010-Oct-06 12:08 UTC
[R] comparing the fit of two (gamma) distributions for aggregated data
Hi, I have aggregated data in 5 categories and want to fit a gamma distribution to it (works fine). My question is that on theoretical grounds I could claim that the observation in the first category (the zero count) are certain, i.e. I know for sure the number. This would then mean in fitting a gamma distribution to the remaining 4 categories. My question is now how can I compare the fit between the two to choose what fits better? I think the AIC/BIC (or all likelihood based test) don't apply as I compare two different data sets (one with data for all 5 categories and one with the data of the 4 categories, hence the number of observations differs substantially) and similarly I think chi2 square test is not really revealing as in the number of observations is so large (the Null is rejected in both cases). Or would one of the tests still apply, claiming one is somehow nested in the other? Another I idea might be to fit to the 5 categories but restrict the estimator somehow so that it replicates the number of observations in the first category (all basically zero). I don't know if that is advisable (and if so, feasible). Any ideas or comments welcome. Below is some code with example data (but I have much more data to fit than just those). library(stats4) breaks.5 <- c(0,.25,20,40,60,120) #breaks counts.5 <- c(4422, 40225, 10973, 3145 ,3228) #number of obs, breaks.4 <- c(.25,20,40,60,120) #like above but just without the first categories counts.4 <- c( 40225, 10973, 3145 ,3228) #number of obs, ll <- function(shape, rate) #my likelihood function { z <- pgamma(breaks, shape=shape, rate=rate) -sum(counts * log(diff(z))) } #fitting the 5 categories breaks <-breaks.5 counts <- counts.5 results.mle.5<-mle(ll, start=list(shape=1, rate=1/mean(breaks))) #fitting the 4 categories breaks <-breaks.4 counts <- counts.4 results.mle.4<-mle(ll, start=list(shape=1, rate=1/mean(breaks))) Thanks, Stefan