Hi Sean,
Thanks for the data. Your first problem is that Age is a factor with
14 levels. Second, Year is a factor with 16 levels. Now consider
what interactions of these will be---factors with a huge number of
levels, exceeding what is reasonable to fit to your size of data. I
included inline code below. First I convert these variables to
numeric. I also do some plots.
While I get a model to run, the estimated variance of the random
intercept is 0, which is a bit troublesome. I looked at the data, but
I do not have a particularly good reason, other than presumably there
is very little variability by fish after with age and lake in the
model. Anyway, I go on to try a log transformation of Age because of
its relationship with Increment. That seems to work pretty well, but
the residuals are clearly heteroscedastic, so I also show how you can
use the sandwich package to (at least partially) address this.
Cheers,
Josh
require(lme4)
## shows that there are 224 levels in the Age by Year interaction!!
with(odata1, interaction(Age, Year))
## create numeric variables instead
odata1 <- within(odata1, {
numAge <- as.numeric(levels(Age))[Age]
numYear <- as.numeric(levels(Year))[Year]
})
## look at a density plot of Increment
plot(density(odata1$Increment))
rug(odata1$Increment)
## look at Increment versus Age (numeric)
## looks like a straight line is likely a poor fit
xyplot(Increment ~ numAge, data = odata1)
## compare by lake (looks pretty similar)
xyplot(Increment ~ numAge | lake, data = odata1)
## also note that it looks like there is more variability when age is
log than hight
## we'll keep that in mind for later
## simple model
m1 <- lmer(Increment ~ 0 + numAge*lake + (1 | FishID),
data = odata1)
## check residuals against normal
qqnorm(resid(m1))
## residuals against age
## this looks pretty bad
plot(odata1$numAge, resid(m1))
## what about the random effects?
plot(ranef(m1))
## checking the model summary
## we see that the variance for the random intercepts is 0
## this is also rather concerning
summary(m1)
## lets look at some descriptives
descriptives <- with(odata1, do.call("rbind", tapply(Increment,
FishID, function(x) {
c(M = mean(x, na.rm = TRUE), V = var(x, na.rm = TRUE), N = length(x))
})))
descriptives <- descriptives[order(descriptives[, "M"]), ]
descriptives # print
## looks like there are quite a few fish with only one observations
## also, those with only one tend to have a higher mean increment
## given the shape of the relationship, we might try a log transformation on Age
## also given that the random intercepts have 0 variance, we can drop it
malt <- lm(Increment ~ 1 + log(numAge)*lake, data = odata1)
qqnorm(resid(malt)) ## looks decent enough
## not too bad, but clearly not homoscedastic residuals
plot(odata1$numAge, resid(malt))
## the residuals appear heteroscedastic, so we could
## use a sandwhich estimator of the covariance matrix
## the general form is: inv(X'X) X' W X inv(X'X)
## the correction comes in terms of the weight matrix, W which
## can be defined in various ways
## load two required packages
require(lmtest)
require(sandwich)
## the default
coef(summary(malt))
## using sandwhich estimator
## (okay, the results come out pretty similar, not parameter estimates
will be unchaged
## only SEs change)
coeftest(malt, vcov. = vcovHC(malt, type = "HC", sandwich = TRUE))
On Thu, Feb 16, 2012 at 11:44 PM, Sean Godwin <sean.godwin at gmail.com>
wrote:> Hi Josh,
>
> Thank you so much for your response.
>
> The point of the process was actually to find out whether there are
> different age effects for each lake, so an interaction term between age and
> lake is a necessary one. ?Taking out the other random effects to make the
> model simpler would work perfectly to make this easier to tackle though,
and
> I can add the other terms on later as you say!
>
> So:? m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|FishID),lakedata)
>
> But apparently I don't understand the Age*lake syntax... all I'm
trying to
> do with this is to have an interaction term between age and lake, but since
> neither are random effects, I can't just write (1|Age:lake) as I could
with
> (1|Age:Year), right? ?It is the Age*lake term that is causing the error...
> when I take it out, the model runs fine.
>
> There are 117 fish observations, so 117 unique values of fishID. ?The
number
> of observations per fish changes depending on how old the fish is ( a2 year
> old fish = 2 observations), with a maximum fish age of 13 years (so max 13
> observations per fish and 13 years total in the dataset).
>
> I've attached my data (Josh_fulldata.csv) and the script used to
rearrange
> it into a usable format (Josh_model.R). ?For some reason, I wasn't able
to
> export the resulting dataframe without causing errors when re-inputting it,
> so hopefully this works for you. ?Hopefully the inclusion of the full
> dataset answers your other questions. ?Interestingly, when I used a 50 fish
> subset, the error message didn't occur (I've attached that just in
case:
> Josh_data.csv).
>
> I can't thank you enough for your help. ?Already you've pointed me
in a
> better direction. ?All I'm trying to do is to include a non-random
> interaction in here (age:lake), and it sure has me stumped!
>
> -Sean
>
> On 16 February 2012 23:06, Joshua Wiley <jwiley.psych at gmail.com>
wrote:
>>
>> Hi Sean,
>>
>> Can you fit a simpler model? ?Particularly I would start with just one
>> random effect (say, (1|fishid)). ?I would not expect the age*lake
>> interaction to be a problem per se.
>>
>> Also you may not be fitting the model you think you are. ?Writing
age*lake
>> in a formula expands to:
>>
>> Age + lake + age:lake
>>
>> That is the two main effects of age and lake plus their interaction.
?That
>> makes your specification of the main effect of age redundant and means
you
>> will have a main effect of lake, which since you suppressed the
intercept by
>> adding the 0, will be the conditional expected values of the two lakes
when
>> age is 0 for the same random effects.
>>
>> In terms of getting around the error, some other information about your
>> data would be helpful. ?How many observations do you have? ?How many
unique
>> values of fishid are there? ?How many observations per fish? ?How many
>> years? ?Is it a numeric variable or factor? ?What about age? ?For
factor
>> variables what do the joint distributions look like? ?Do you really
mean to
>> have a random constant by year AND age:year? ?Within a fish, do age and
fish
>> have a perfect correlation?
>>
>> In terms of practical advice, I would start with simple models and make
>> them more complex. ?Try to find the term that when you add it leads to
the
>> error. ?Start with an unconditional model only if you have to. ?As you
>> estimate additional parameters in your model, what happens to the
others?
>> ?Are they stable? ?Do they change a lot? What about the standard
errors?
>> ?Does adding one term the model still fit but the SEs become huge?
>>
>> If you need more help, posting a sample of your data that reproduces
the
>> error so we can run it on our machines would help (a plaintext file or
the
>> output of dput() would work well...we do not need all of it, just
enough
>> that a model could work (say 50-100 observations) and that reproduces
the
>> same error.
>>
>> Cheers,
>>
>> Josh
>>
>> On Feb 16, 2012, at 21:39, Sean Godwin <sean.godwin at gmail.com>
wrote:
>>
>> > Hi all,
>> >
>> > I am fairly new to mixed effects models and lmer, so bear with me.
>> >
>> > Here is a subset of my data, which includes a binary variable
(lake (TOM
>> > or
>> > JAN)), one other fixed factor (Age) and a random factor (Year).
>> > ?lake FishID Age Increment Year
>> > 1 ?TOM ? ? ?1 ? 1 ? ? 0.304 ? ? 2007
>> > 2 ?TOM ? ? ?1 ? 2 ? ? 0.148 ? ? 2008
>> > 3 ?TOM ? ? ?1 ? 3 ? ? 0.119 ? ? 2009
>> > 4 ?TOM ? ? ?1 ? 4 ? ? 0.053 ? ? 2010
>> > 5 ?JAN ? ? ? 2 ? 1 ? ? 0.352 ? ? 2009
>> > 6 ?JAN ? ? ? 2 ? 2 ? ? 0.118 ? ? 2010
>> >
>> > The model I'm trying to fit is:
>> > m1 <- lmer(Increment ~ 0 + Age + Age*lake + (1|Year) +
(1|Year:Age) +
>> > (1|FishID),lakedata)
>> >
>> > The error message I get is: *"Error in mer_finalize(ans) :
Downdated X'X
>> > is
>> > not positive definite, 27."*
>> > *
>> > *
>> >> From reading up on the subject, I think my problem is that I
can't
>> > incorporate the 'lake' variable in a fixed-effect
interaction because it
>> > is
>> > only has one binary observation. ?But I don't know what to do
to be able
>> > to
>> > fit this model. ?Any help would be greatly appreciated!
>> > -Sean
>> >
>> > ? ?[[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/