Hi all,
I'm hoping I might be able to get some help with some issues specifying
priors for MCMCglmm.
I'm trying to fit a gaussian glmm using MCMCglmm to a data set with two
(correlated) response variables. The response variables are both
logit-transformed proportions (there are a few reasons why I've chosen these
with gaussian error over binomal glmm, which I won't go into). Each
probability represents a response for a single plant species (c.150 species),
and I want to incorporate the phylogeny (almost fully resolved with only 2
polytomies, including branch lengths) in the error structure. There is only 1
explanatory variable, a 5-level factor. I'm using uninformative priors for
fixed parameters (i.e. not specifying any priors).
Here's my code:
#prior model1
prior1<-list(R = list(V = diag(2)*x, n=2),
G = list(G1=list(V = diag(2)*x, n=2)))
#model1
m1<-MCMCglmm(cbind(y1.logit,y2.logit) ~ trait:Ecotype - 1,
random=~us(trait):animal, rcov=~us(trait):units,
family=c("gaussian","gaussian"),
prior=prior1, data=meandata, pedigree=meantree,
nodes="TIPS",
thin=100, nitt=150000, burnin=30000, verbose=F)
#prior model2
prior2<-list(R = list(V = diag(2)*x, n=2))
#model2
m2<-MCMCglmm(cbind(reac.prob.logit,inert.prob.logit) ~ trait:Ecotype - 1,
rcov=~us(trait):units,
family=c("gaussian","gaussian"),
prior=prior2, data=meandata,
thin=100, nitt=150000, burnin=30000, verbose=F)
I've tried prior variance values of 5, 10, 50, 100, 500, 1000, 5000 and
10000 (by varying x in the code above), and I've used a large thinning
interval (100) to deal with autocorrelation arising from the correlated
responses. Chains seem to mix well as long as I use enough iterations (over
100000 seems to be best). Posterior distributions from plot(model) are
normal-distributed for fixed effects and residuals in every case. However,
posteriors for errors don't behave as nicely:
- Lower prior values (superficially <100) result in positively-skewed
distributions for trait:animal whereas higher prior values (superficially
>100) are much more normal. With the logit data running from ~-3.5 to ~10,
it seems that the prior values that result in well-fitting models are higher
than the variance of the data itself.
- Fixed parameter values vary (but not considerably) with choice of prior, but
error parameters seem to be more sensitive to prior choice.
- Posterior means for errors and residuals are larger and more variable for
y1:y1 and y2:y2 combinations (in the hundreds) but smaller for y1:y2 and y2:y1
combinations (in the tens).
Considering this, my first question is: what are appropriate priors for G and R?
I'm aware that my priors assume a priori independence between the two
responses, and I know that they will covary, but my understanding is that
specifying us(trait) in my errors and residuals in my residuals should estimate
covariance anyway. But, my second question is this: what are the merits in
using a multi-response model? The phylogenetic models are taking a fair amount
of time to run, and I hope that without the need to estimate covariances, they
might be a bit speedier.
At some point, I will want to incorporate replication within species in the
analysis (populations nested within species), but for now I'm really just
interested in looking at mean response per species. I'm assuming that
priors will probably be of similar value (but more of them) once I include
within-species replication... or will I have to go through this again??
Any help appreciated, thanks in advance
Iain
--------------------------------------------------------------------------------
Iain Stott
Centre for Ecology and Conservation
University of Exeter, Cornwall Campus
Tremough
Treliever Road
Penryn
Cornwall
TR10 9EZ
Tel: 01326 371852
http://biosciences.exeter.ac.uk/cec/staff/postgradresearch/iainstott/
--------------------------------------------------------------------------------