I am trying to fit the following linear model to logged per capita
fecundity data (ie number of babies per female) for a mouse:
RsNRlS <- glm(formula = ln.fecundity ~ summer.rainfall + N +
lagged.rainfall + season, ....)
I am using this relationship in a simulation model, and the current
statistical model I have fit is unsatisfactory. The problem is I get a
global estimate of variance (MSE), but I think it varies across subsets
of the data. Specifically, seasons when there is lots of reproduction
(e.g. fall) tend to have high variance, while seasons with little
reproduction (e.g. summer) have small amounts of variance. I am
looking for a method for estimating the coefficients in my linear
model, and estimating a separate error for subsets of the data (ie for
each of the 4 seasons). The end goal is to take this linear model back
into my simulation model to make predictions about fecundity, but with
separate variance terms for subsets of the data.
I thought of several solutions, but some seem pretty ad hoc, while I
think others would suffer from loss of a lot of power due to small
sample size. The solutions I thought of were:
1) Fit the linear model to all data, but then estimate variances by
season directly from the residuals. Seems ad hoc given the assumption
of common variance in the initial fitting.
2) Use regression weights by season to reweight the data points/fitted
values. Then in the simulation model inflate/deflate the estimate of
MSE by these weights - seems ad hoc too, as translating weights (which
would presumably basically downgrade the importance of outliers) into
something meaningful for prediction using the regression equation seems
arbitrary.
3) Fit a separate model for each season, either assume the global model
(but maybe get poor fits due to #parameter v. #data points) or try
reducing models and choosing the best for each season using AIC. The
problem with this is that I think that some parameters have effects
common to all seasons - eg lagged rainfall represents the effect of
rainfall drowning babies in their burrows prior to the age at which
they can be captured. This is pretty clearly a strong effect, just
based on graphing the data, so I am uncertain about loosing power by
splitting it by season.
4) Fit a mixed model to all the data, treating each season as a random
factor with levels 1 or 0, and estimating the variance for each. Again
this seems somewhat ad hoc, although based on my reading in elementary
texts it would allow me to estimate the variance component for each of
the seasons.
Based on talking to a colleague in my department some of the modern
methods for fitting mixed models, in which the variance for each of the
levels of a factor in a random effect can be estimated. That seems the
best option, as it explicitly does what I want. However, it is well
over my head in terms of either the R code or the statistical
background. Can anyone suggest a good approach, or a published
example, or some useful reading on how one might approach this problem?
Chris Wilcox
The Ecology Centre
Department of Zoology and Entomology
Brisbane, 4072 QLD
Australia
tel 61-7-3365-1686
fax 61-7-3365-1655
website www.uq.edu.au/~uqcwilco