drbn wrote:> Hello,
> I have seen that some papers do this:
>
> 1.) Group data by year (e.g. 35 years)
>
> 2.) Estimate the mean of the key variable through the distribution that
fits
> better (some years is a normal distribution , others is a more skewed,
gamma
> distribution, etc.)
>
> 3.) With these estimated means of each year do a GLM.
>
> I'd like to know if it is possible (to use these means in a GLM) or is
a
> wrong idea.
>
> Thanks in advance
>
> David
>
David,
You can model functions of data, such as means, but you must be careful
to carry over most of the uncertainty in the original data into the
model. If you don't, for example if you let the model know only the
values of the means, then you are actually assuming that these means
were observed with absolute certainty instead of being estimated from
the data. To carry over the uncertainty in the original data to your
modeling you can use a Bayesian approach or you can use a marginal
likelihood approach. A marginal likelihood is a true likelihood function
not of the data, but of functions of the data, such as of maximum
likelihood estimates. If your means per year were estimated using
maximum likelihood (for example with fitdistr in package MASS) and you
sample size is not too small then you can use a normal marginal
likelihood model for the means. Note however that each mean may come
from a different distribution so the full likelihood model for your data
would be a mixture of normal distributions. You may not be able to use
the pre-built glm function so you may face the challenge to write your
own code.
HTH
Rub?n