Daniel Meddings
2015-Aug-25 07:44 UTC
[R] Estimating group modes using GLMs for skewed distributions
I am wondering why for generalized linear models with Gamma, Poisson and Negative Binomial distributions that there appears to be no discussion about estimating the medians or the modes of the distributions. For example in clinical trials for count data where a log link is used it is the quantity E[Y|T] / E[Y|C] = exp( beta_T + beta^{-}x^{*} ) / exp(beta_C + beta^{-}x^{*}) = exp( beta_T ) / exp(beta_C ) that seems to be of interest, where beta_T, and beta_C are the effects of treatment and control respectively, x^{*} is the chosen covariate point to estimate the ratio at (doesn't matter what this is here since they cancel), and beta^{-} is the model parameters excluding the treatment and control effects. Whilst I have no objection to this ratio, in addition I would also wish to know what the mode or the median of the treated and control group is (and the difference in these quantities), given that these distributions are skewed (i.e. the mean is not too relevant). For example for a skewed continuous variable modeled with the gamma distribution if $alpha$ is the shape parameter then the mode for treated subjects at x^{*} is given as follows mode(Y|T) = ((alpha-1)(alpha))* exp(beta_T+beta^{-}x^{*}) as long as alpha >= 1. However I see no mention of this kind of summary being estimated in these GLMs and I am wondering why. Is it perhaps that the ratio of means is more difficult to affect by small treatment effects than is a difference in modes or medians - i.e. analogous to risk ratios generally being preferred to risk differences when comparing disease incidence rates? The reason I am interested in estimating modes or medians is that I wish to compare how well a linear mixed model performs (which assumes normally distributed responses) at estimating the mode or median by using the standard mixed model estimates of the group means when the distribution of Y is skewed. However perhaps I should be looking at how well the mixed model estimates the ratio of means? For comparison I have implemented the above estimation of the treatment and control group modes using GLMs with random effects (the formula is similar to the above but with simple functions of the random effects covariance parameters multiplying the expression). As expected estimates of the group means from the mixed model agree well with the estimates of the modes from the GLM for reasonably symmetrical distributions, but the mixed model's mean estimates start to increase beyond the modes as the distribution becomes skewed. I can do inference on the difference in the modes using a parametric bootstrap, so as far as I am concerned I cannot see any problems with this approach. However if there are some I would welcome somebody pointing these out. Many thanks Dan [[alternative HTML version deleted]]