Murilo Peixoto
2012-Dec-13 03:06 UTC
[R] GLMM - lme4 - binomial family, quadrinomial data: Can one partition be response and another be dependent variable?
Hi there. At first glance it sounded to me as an obvious "no-no"
question.
But, for some reason, I ran some trials and results looked pretty
intriguing.
So, I checked 14 genotypes (8 plants from each randomly chosen in the
field) on 4 different dates and measured them under 2 different
temperatures. As a response, I have 4 different partition of how light is
absorbed in the leaf and they all add up to 1 (part1 + part2 + part3 +
part4 = 1).
So I have a data frame with these colums:
plant | genotype | date | temperature | part1 | part2 | part3 | part4
So the logic tells me to keep it as simple as this:
*model01*<-
lmer(part1,part2,part3,part4~genotype:date:temperature+(1|plant),data=data,family="binomial")
However, I was just wondering how these partitions correlate. So I did a
test for "Variance inflation factors" on them.
Correlations of the variables
Part1 part2 part3 part4
part1 1.0000000 -0.1035692 -0.3913199 0.3611188
part2 -0.1035692 1.0000000 -0.7542708 0.1309893
part3 -0.3913199 -0.7542708 1.0000000 -0.6597187
part4 0.3611188 0.1309893 -0.6597187 1.0000000
Variance inflation factors
GVIF
part1 3.881838
part2 16.648054
part3 29.613167
part4 7.335692
In general, the response variable is not included in this test. So, let's
pretend I wanna use part2 as my response variable, so I exclude it from the
analyses. I noticed that part2 and part3 have very high correlation
(-0.75). In general, a high correlation between the response and dependent
variable is seen as a good thing, but this is not true if the high
correlation is between two dependent variables. Well, Let me exclude part2
which I am willing to use as a response variable.
Correlations of the variables
part1 part3 part4
part1 1.0000000 -0.3913199 0.3611188
part3 -0.3913199 1.0000000 -0.6597187
part4 0.3611188 -0.6597187 1.0000000
Variance inflation factors
GVIF
part1 1.207584
part3 1.859350
part4 1.810761
So, apart that part2 is a variable dependent on part1, part3 and part4, it
look like there's no Collinearity problems in here. So, apart from this,
there is no problem in doing this:
*model02*<- lmer(part2 ~ genotype : date : temperature : part1 : part3 :
part4 +
(1|plant),data=data,family="binomial")
On *model01* just "temperature" was significant. On the other hand,
on*model02
*, just part3 (which is highly correlated with the response variable) was
significant, temperature was not. It appears to me that the high
correlation between part2 and part3 explains the variance on part3 much
better than if any other factor is added.
If I do now a model03 where I do not include part3:
*model03*<- lmer(part1 ~ genotype : date : temperature : part1 : part4 +
(1|plant),data=data,family="binomial")
I get "temperature" as a significant factor as well as part4 and the
interaction part1*part4. In this analysis "date" is also marginally
significant, and the P values are much better.
So, when we have partitions that adds up to 1, can we use one as response
variable and the others as dependent variables?
--
Murilo de Melo Peixoto
PhD candidate- Botany
Department of Ecology and Evolutionary Biology
University of Toronto
[[alternative HTML version deleted]]
Possibly Parallel Threads
- CentOS Mirrored On RapidShare [Links Here]
- "ext2fs_check_if_mount: No such file or directory while determining whether" messages
- [Bug 13321] New: Rsync --copy-dest issue
- Paranormal Activity 2 (2010) DVDRip XvID DIAMOND
- joint multichannel coding (long message)
