Murilo Peixoto
2012-Dec-13 03:06 UTC
[R] GLMM - lme4 - binomial family, quadrinomial data: Can one partition be response and another be dependent variable?
Hi there. At first glance it sounded to me as an obvious "no-no" question. But, for some reason, I ran some trials and results looked pretty intriguing. So, I checked 14 genotypes (8 plants from each randomly chosen in the field) on 4 different dates and measured them under 2 different temperatures. As a response, I have 4 different partition of how light is absorbed in the leaf and they all add up to 1 (part1 + part2 + part3 + part4 = 1). So I have a data frame with these colums: plant | genotype | date | temperature | part1 | part2 | part3 | part4 So the logic tells me to keep it as simple as this: *model01*<- lmer(part1,part2,part3,part4~genotype:date:temperature+(1|plant),data=data,family="binomial") However, I was just wondering how these partitions correlate. So I did a test for "Variance inflation factors" on them. Correlations of the variables Part1 part2 part3 part4 part1 1.0000000 -0.1035692 -0.3913199 0.3611188 part2 -0.1035692 1.0000000 -0.7542708 0.1309893 part3 -0.3913199 -0.7542708 1.0000000 -0.6597187 part4 0.3611188 0.1309893 -0.6597187 1.0000000 Variance inflation factors GVIF part1 3.881838 part2 16.648054 part3 29.613167 part4 7.335692 In general, the response variable is not included in this test. So, let's pretend I wanna use part2 as my response variable, so I exclude it from the analyses. I noticed that part2 and part3 have very high correlation (-0.75). In general, a high correlation between the response and dependent variable is seen as a good thing, but this is not true if the high correlation is between two dependent variables. Well, Let me exclude part2 which I am willing to use as a response variable. Correlations of the variables part1 part3 part4 part1 1.0000000 -0.3913199 0.3611188 part3 -0.3913199 1.0000000 -0.6597187 part4 0.3611188 -0.6597187 1.0000000 Variance inflation factors GVIF part1 1.207584 part3 1.859350 part4 1.810761 So, apart that part2 is a variable dependent on part1, part3 and part4, it look like there's no Collinearity problems in here. So, apart from this, there is no problem in doing this: *model02*<- lmer(part2 ~ genotype : date : temperature : part1 : part3 : part4 + (1|plant),data=data,family="binomial") On *model01* just "temperature" was significant. On the other hand, on*model02 *, just part3 (which is highly correlated with the response variable) was significant, temperature was not. It appears to me that the high correlation between part2 and part3 explains the variance on part3 much better than if any other factor is added. If I do now a model03 where I do not include part3: *model03*<- lmer(part1 ~ genotype : date : temperature : part1 : part4 + (1|plant),data=data,family="binomial") I get "temperature" as a significant factor as well as part4 and the interaction part1*part4. In this analysis "date" is also marginally significant, and the P values are much better. So, when we have partitions that adds up to 1, can we use one as response variable and the others as dependent variables? -- Murilo de Melo Peixoto PhD candidate- Botany Department of Ecology and Evolutionary Biology University of Toronto [[alternative HTML version deleted]]