Dragonwalker
2012-Mar-27 13:50 UTC
[R] two lmer questions - formula with related variables and output interpretation
Hello, I have been attempting to set up a lme and have looked at numerous posts including 'R's lmer cheat-sheet' as well as reading a number of papers and other resources including R help, but I am still a little confused on how to write my model (I thought I had it). I have asked a number of questions on different forums; most of which have been resolved. My main concern right now is whether my model is correct. I studied broods of precocial chicks and watched each chick every other day for five minutes if possible. As chicks on the same day are completely non-independent the mean was found for each brood for each day. Variables that were recorded were the behaviours during that time and the habitats used. There were seven broods. Three at one site and four at the other site. Only one site had a brood that consistently used mudflats rather than oceanfront habitats. As none of the data within a brood is truly independent, along with the very small number of broods, it became impossible to use conventional statistics to test the hypotheses and so it was suggested that mixed-effects models would be the best option as it would not only allow for all data to be used with a random effect of Brood ID to negate the pseudo-replication but also let me look at partial use of mudflats in one of the other broods that only used it periodically. So, for this part of the analysis I would like to see which factors affect the amount of time feeding. I set up a global model with ten fixed variables plus (1|Brood). Site, tide.h.l, tide.inc.out, MF.vs.OF, Human Disturbance Rate (HDr), Human Disturbance proportion of time(HDp), non-Human Disturbance (two variables as for Human Disturbance) and Age and mean.foraging.rate. As so: gm1<-lmer(Feeding~Site+tide.level+MF.vs.OF+HDr+HDp+NHDr+NHDp+Age+mean.for.rate+(1|Brood), data=AllBrood, REML=TRUE) I wished to put all the factors together to explore which ones really did influence the time spent feeding and used 'dredge' command to run all possible combinations and then averaged the models with an AICc Delta<2. I was expecting that the proportion of time being disturbed (HDp and NHDp) would be the most relevant as by default the greater time in other behaviours the less time for feeding. However, MF.vs.OF had a larger effect than HDp and NHDp but this may be because MF observations did not experience HDp at all so this may push the effect of this habitat. Surprisingly non-human disturbance rates rather than time had a greater effect (but these are quite even among habitats. The results of the model.avg are as follows: Estimate Std. Error z value Pr(>|z|) (Intercept) 102.7190 5.5300 18.575 < 2e-16 *** HDr -1.5495 0.3451 4.490 7.11e-06 *** MF.vs.OF2 -7.6780 3.7507 2.047 0.04065 * NHDp -0.5145 0.2909 1.769 0.07695 . NHDr -1.4164 0.4663 3.037 0.00239 ** Site2 6.1477 2.7400 2.244 0.02485 * tide.h.l2 -7.2546 2.6914 2.695 0.00703 ** tide.inc.out2 -5.8486 2.6187 2.233 0.02553 * HDp -0.3773 0.2732 1.381 0.16731 mean.for.rate -0.3966 0.3220 1.232 0.21807 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Full model-averaged coefficients (with shrinkage): (Intercept) HDr MF.vs.OF2 NHDp NHDr Site2 tide.h.l2 tide.inc.out2 HDp 102.718962 -1.549499 -5.734171 -0.239550 -1.416373 5.336532 -7.254627 -5.848553 -0.044795 mean.for.rate -0.081734 Relative variable importance: (Intercept) Age HDp HDr mean.for.rate MF.vs.OF NHDp NHDr 1.00 0.00 0.12 1.00 0.21 0.75 0.47 1.00 Site tide.h.l tide.inc.out 0.87 1.00 1.00 I was wondering whether there would be a better way to formulate the model to allow for this effect, or could I just keep it as is and just infer that it may be partly affected by the amount of disturbance within these habitats but as it has a greater effect that other factors are at play which would then lead me onto the next model which is going to explore observations that do not include disturbance which would allow me to tease the natural factors affecting feeding behaviour? I was going to run this second model with site still as a fixed effect and then run it with (1|Site) to remove site effect (if one is found). I would prefer to keep it simple as I really want to use a lme, but don't have the understanding for more complex interactions. I has also asked a question, which is yet to be answered on stats stack exchange, in regards to the output of the model.avg. as follows: I have seen the Estimates described as the effect of the variable and this is discussed in results sections as an important value to report (in regards to the size of them and their direction (+ve/-ve). (the paper I was reading was stating that those with the bigger or smaller numbers had the greatest effect (even quoting that one was 48% lower than the other) However if this is what is reported and discussed, why would the relative variable importance vary in relation to the estimate? It seems that this should also be looked at but am not sure how the z and p values are calculated from a model. Therefore I would like to know which is more important when trying to discuss the findings. I admit that my knowledge is limited, but I would like to grasp this in simple terms if I could. As an additional note, the paper I am referring to also has a table showing the Estimates and the 95%CI. The title of the table however says "Model-averaged parameter estimates and relative importance values for variables affecting adult piping plover foraging rates in New Jersey,2007?2009." which does not seem to fit with what was actually shown, unless the RIV are inferred somehow from the CIs The link can be found here: It would allow you to look at the results and what I am talking about, but at the same time, if someone would be able to look at that question too I would appreciate it. Thank you all in advance. Rachel P.S. I know that some may wonder why I am running models if I don't know the ins and outs, but I really do understand what they represent, I just don't understand the intricacies between variables and if Estimates or relative variable importance is more important as the study that is similar to mine only used the former and I expected them to be correlated -- View this message in context: http://r.789695.n4.nabble.com/two-lmer-questions-formula-with-related-variables-and-output-interpretation-tp4508876p4508876.html Sent from the R help mailing list archive at Nabble.com.
Dragonwalker
2012-Mar-27 16:14 UTC
[R] two lmer questions - formula with related variables and output interpretation
I realised that I removed the link to the question but forgot to remove the text regarding it. Sorry. I am not sure if I am supposed to link to other forums, but I can add the links as needed (as the format is clearer). I actually have one more question though in regards to which data to use. If it is better to just report the estimates and CIs then should I use those with shrinkage instead, and if so, does anyone know how I can get the CIs for these rather than just the regular CIs. I apologise if I am asking too many questions within one post. Rachel -- View this message in context: http://r.789695.n4.nabble.com/two-lmer-questions-formula-with-related-variables-and-output-interpretation-tp4508876p4509334.html Sent from the R help mailing list archive at Nabble.com.