Henrik Parn
2006-Aug-08 10:11 UTC
[R] fixed effects following lmer and mcmcsamp - which to present?
Dear all, I am running a mixed model using lmer. In order to obtain CI of individual coefficients I use mcmcsamp. However, I need advice which values that are most appropriate to present in result section of a paper. I have not used mixed models and lmer so much before so my question is probably very naive. However, to avoid to much problems with journal editors and referees addicted to p-values, I would appreciate advice of which values of the output for the fixed factor that would be most appropriate to present in a result section, in order to convince them of the p-value free 'lmer-mcmcsamp'-approach! Using the example from the help page on lmer: fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) ...I obtain the following for 'Days': summary(fm1) ... Estimate Std. Error t value Days 10.4673 1.5458 6.771 ...and from mcmcsamp: summary(mcmcsamp(fm1 , n = 10000)) 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE Days 10.4695 1.7354 0.017354 0.015921 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% Days 7.0227 9.3395 10.4712 11.5719 13.957 The standard way of presenting coefficients following a 'non-lmer' output is often (beta=..., SE=..., statistic=..., P=...). What would be the best equivalent in a 'lmer-mcmcsamp-context'? (beta=..., CI=...) is a good start I believe. But which beta? And what else? I assume that the a 95% CI in this case would be 7.0227-13.957 (please, do correct me I have completely misunderstood!). But which would be the corresponding beta? 10.4673?, 10.4695? 10.4712? Is the t-value worth presenting or is it 'useless' without corresponding degrees of freedom and P-value? If I present the mcmcsamp-CI, does it make sense to present any of the three SE obtained in the output above? BTW, I have no idea what Naive SE, Time-series SE means. Could not find much in help and pdfs to coda or Matrix, or in Google. Thanks in advance for any advice and hints to help-texts I have missed! Best regards, Henrik
Douglas Bates
2006-Aug-08 15:45 UTC
[R] fixed effects following lmer and mcmcsamp - which to present?
On 8/8/06, Henrik Parn <henrik.parn at bio.ntnu.no> wrote:> Dear all, > > I am running a mixed model using lmer. In order to obtain CI of > individual coefficients I use mcmcsamp. However, I need advice which > values that are most appropriate to present in result section of a > paper. I have not used mixed models and lmer so much before so my > question is probably very naive. However, to avoid to much problems with > journal editors and referees addicted to p-values, I would appreciate > advice of which values of the output for the fixed factor that would be > most appropriate to present in a result section, in order to convince > them of the p-value free 'lmer-mcmcsamp'-approach! > > Using the example from the help page on lmer: > > fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) > > ...I obtain the following for 'Days': > > > summary(fm1) > ... > Estimate Std. Error t value > > Days 10.4673 1.5458 6.771 > > > ...and from mcmcsamp: > > summary(mcmcsamp(fm1 , n = 10000)) > > 1. Empirical mean and standard deviation for each variable, > plus standard error of the mean: > > Mean SD Naive SE Time-series SE > Days 10.4695 1.7354 0.017354 0.015921 > > 2. Quantiles for each variable: > 2.5% 25% 50% 75% 97.5% > Days 7.0227 9.3395 10.4712 11.5719 13.957 > > > > The standard way of presenting coefficients following a 'non-lmer' > output is often (beta=..., SE=..., statistic=..., P=...). What would be > the best equivalent in a 'lmer-mcmcsamp-context'? (beta=..., CI=...) is > a good start I believe. But which beta? And what else? > > I assume that the a 95% CI in this case would be 7.0227-13.957 (please, > do correct me I have completely misunderstood!). But which would be the > corresponding beta? 10.4673?, 10.4695? 10.4712? Is the t-value worth > presenting or is it 'useless' without corresponding degrees of freedom > and P-value? If I present the mcmcsamp-CI, does it make sense to present > any of the three SE obtained in the output above? BTW, I have no idea > what Naive SE, Time-series SE means. Could not find much in help and > pdfs to coda or Matrix, or in Google.I would definitely report the REML value 10.4673 as the estimate. This is a reproducible estimate - the other two are stochastic in that you will get different values from different mcmcsamp runs. However, the reproducibility is high. Notice that they all round to 10.47 and two decimal places is quite enough precision for reporting this value when you consider that a 95% confidence interval is [7.02,13.96]. Another way of evaluating a confidence interval using the coda package is the HPDinterval function. (HPD stands for "Highest Posterior Density") It returns the shortest interval with a 95% probability content in the empirical distribution. Here are values from a sample that I evaluated> data(sleepstudy) > fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) > fs1 <- mcmcsamp(fm1, 10000) > library(coda) > summary(fs1)... 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE Days 10.4810 1.6910 0.016910 0.015762 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% Days 7.0975 9.3971 10.4851 11.5751 13.810> fixef(fm1)(Intercept) Days 251.40510 10.46729> HPDinterval(fs1)lower upper Days 7.038076 13.734493 attr(,"Probability") [1] 0.95 If you would like to use a t-distribution to calculate a confidence interval, I would argue that the degrees of freedom for the t should be somewhere between n - p = 178 in this case (n = number of observations, p = number of coefficients in the fixed effects or rank(X) where X is the model matrix for the fixed effects) and n - rank([Z:X]) = 144. (there are 36 random effects and 2 fixed effects but the rank of [Z:X] = 36) If we use the lower value of 144 we produce a confidence interval of> 10.4673 + c(-1,1) * qt(0.975, df = 144) * 1.5458[1] 7.41191 13.52269 Notice that this interval is contained in the HPD interval and in the interval obtained from the 2.5% and 97.5% quantiles of the empirical distribution. I attribute this to the fact that the standard errors are calculated conditional on the variance of the random effects. Thus the t-based confidence interval takes into account imprecision of the estimate of \sigma^2 but it assumes that the variance of the random effects is fixed at the estimated values. (That's not quite true - it is the relative variance that is assumed fixed - but the effect is the same.) The MCMC sample allows all the parameters to vary and I would claim is therefore a better measure of the marginal variability of this parameter. However, as stated above, the HPD interval is stochastic. I would create more than one sample to check the reproducibility of the intervals. In this case intervals based on a chain of length 10000 are not wonderfully consistent Days 6.9389037 13.851661 Days 6.9968306 13.768851 and I might go to chains of length 50000 to check further Days 7.0880422 13.903068 Days 7.0758948 13.965146 For the benefit of editors of a biological journal I think I would report both the interval derived from the t-distribution and a representative interval from the MCMC samples to show that the MCMC samples show variation in excess of that reported by the t interval. I hope this helps. --Doug