Pedro Vaz
2018-Sep-03 13:21 UTC
[R] Account for a factor variability in a logistic GLMM in lme4
We did a field study in which we tried to understand which factors significantly explain the probability of a group of animals (5 species in total) crossing through 30 wildlife road-crossing structures. The response variable is binomial (yes=crossed; no = did not cross) and was recorded by animal species. We did about 30 visits to each crossing structure (our random factor) in which we recorded the binomial response by each animal species and the values of a few predictors. So, I have this (simplified for better understanding) mixed effects model: library (lme4) Mymodel <- glmer(cross.01 ~ stream.01 + width.m + grass.per + (1|structure.id), data = Mydata, family = binomial) stream is a factor with 2 levels; width.m is continuous; grass.per is a percentage This is the model in which I assessed crossings by all species combined (i.e., cross. 01 = 1 when an animal of any species crossed, cross.01 = 0 when no animal crossed). However, we did one model per species and those species-specific models highlight that different species exhibit different relationships between crossings and explanatory variables. My problem: This means that my model above suffers from an additional source of variation related to the species level without accounting for it. However I cannot recalibrate the above model adding the species level as random factor because, in my binomial response, the zero means no species crossed (all zeros would have "NA" or, say, "none" for species) and so that additional source of variation is only present when the response was 1. Just to confirm this, I did add species as a random factor: (1 | structure.id) + (1 | species) As expected, the message is "Error: Response is constant" How can I account for the species variability in my model in lme4? A few more details: A few more details: - I had 5 mammal species crossing through the 30 road-crossing structures. In 134 occasions (i.e., 134 of my records on individual crossing-structures), no animal crossed (so, @Dimitris Rizopoulos, no, I didn't have the species of the animals which did not cross. A "no cross" was a "zero" for that visit to the crossing-structure). In 498 occasions, at least one animal of a given species crossed the structure (these were my "ones" in my logistic response) - A side comment: This is to respond to a reviewer in a paper of mine, i.e., I did and presented species-specific and "all combined species" models in the draft reviewed but now the reviewer is asking me to control for the species variability in the "combined species model". He asked me to include a random factor but I realized that is not possible since all my zeros would have "none" for the species that crossed. So, is it possible to control for the species variability in my model in lme4 in another way? I know in nlme including a fitting of variance structures it's not that difficult... - Every time an animal crossed, the binary response was "one" and I recorded the animal species as well. Thus, I have variability between species in the "ones" but not in my "zeros" of my logistic model. [[alternative HTML version deleted]]
Bert Gunter
2018-Sep-03 14:46 UTC
[R] Account for a factor variability in a logistic GLMM in lme4
You should post this on the r-sig-mixed-models list, not here. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Sep 3, 2018 at 7:43 AM Pedro Vaz <zasvaz at gmail.com> wrote:> We did a field study in which we tried to understand which factors > significantly explain the probability of a group of animals (5 species in > total) crossing through 30 wildlife road-crossing structures. The response > variable is binomial (yes=crossed; no = did not cross) and was recorded by > animal species. We did about 30 visits to each crossing structure (our > random factor) in which we recorded the binomial response by each animal > species and the values of a few predictors. > > So, I have this (simplified for better understanding) mixed effects model: > library (lme4) > > Mymodel <- glmer(cross.01 ~ stream.01 + width.m + grass.per + (1| > structure.id), > data = Mydata, family = binomial) > > stream is a factor with 2 levels; width.m is continuous; grass.per is a > percentage > > This is the model in which I assessed crossings by all species combined > (i.e., cross. 01 = 1 when an animal of any species crossed, cross.01 = 0 > when no animal crossed). However, we did one model per species and those > species-specific models highlight that different species exhibit different > relationships between crossings and explanatory variables. > > My problem: This means that my model above suffers from an additional > source of variation related to the species level without accounting for it. > However I cannot recalibrate the above model adding the species level as > random factor because, in my binomial response, the zero means no species > crossed (all zeros would have "NA" or, say, "none" for species) and so that > additional source of variation is only present when the response was 1. > Just to confirm this, I did add species as a random factor: > > (1 | structure.id) + (1 | species) > > As expected, the message is "Error: Response is constant" > > How can I account for the species variability in my model in lme4? > > A few more details: > A few more details: > - I had 5 mammal species crossing through the 30 road-crossing structures. > In 134 occasions (i.e., 134 of my records on individual > crossing-structures), no animal crossed (so, @Dimitris Rizopoulos, no, I > didn't have the species of the animals which did not cross. A "no cross" > was a "zero" for that visit to the crossing-structure). In 498 occasions, > at least one animal of a given species crossed the structure (these were my > "ones" in my logistic response) > - A side comment: This is to respond to a reviewer in a paper of mine, > i.e., I did and presented species-specific and "all combined species" > models in the draft reviewed but now the reviewer is asking me to control > for the species variability in the "combined species model". He asked me to > include a random factor but I realized that is not possible since all my > zeros would have "none" for the species that crossed. So, is it possible to > control for the species variability in my model in lme4 in another way? I > know in nlme including a fitting of variance structures it's not that > difficult... > - Every time an animal crossed, the binary response was "one" and I > recorded the animal species as well. Thus, I have variability between > species in the "ones" but not in my "zeros" of my logistic model. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Jim Lemon
2018-Sep-03 22:36 UTC
[R] Account for a factor variability in a logistic GLMM in lme4
Hi Pedro, I have encountered similar situations in a number of areas. Great care is taken to record significant events of low probability, but not the non-occurrence of those events. Sometimes this is due to a problem with the definition of non-occurrence. To use your example, how close does an animal have to approach the crossing to be counted as not crossing? Perhaps it was just a failure to record the species of animals that didn't cross. In that case you have a problem, because the probability of crossing within species cannot be estimated from the data you describe. Jim On Tue, Sep 4, 2018 at 12:43 AM Pedro Vaz <zasvaz at gmail.com> wrote:> > We did a field study in which we tried to understand which factors > significantly explain the probability of a group of animals (5 species in > total) crossing through 30 wildlife road-crossing structures. The response > variable is binomial (yes=crossed; no = did not cross) and was recorded by > animal species. We did about 30 visits to each crossing structure (our > random factor) in which we recorded the binomial response by each animal > species and the values of a few predictors. > > So, I have this (simplified for better understanding) mixed effects model: > library (lme4) > > Mymodel <- glmer(cross.01 ~ stream.01 + width.m + grass.per + (1|structure.id), > data = Mydata, family = binomial) > > stream is a factor with 2 levels; width.m is continuous; grass.per is a > percentage > > This is the model in which I assessed crossings by all species combined > (i.e., cross. 01 = 1 when an animal of any species crossed, cross.01 = 0 > when no animal crossed). However, we did one model per species and those > species-specific models highlight that different species exhibit different > relationships between crossings and explanatory variables. > > My problem: This means that my model above suffers from an additional > source of variation related to the species level without accounting for it. > However I cannot recalibrate the above model adding the species level as > random factor because, in my binomial response, the zero means no species > crossed (all zeros would have "NA" or, say, "none" for species) and so that > additional source of variation is only present when the response was 1. > Just to confirm this, I did add species as a random factor: > > (1 | structure.id) + (1 | species) > > As expected, the message is "Error: Response is constant" > > How can I account for the species variability in my model in lme4? > > A few more details: > A few more details: > - I had 5 mammal species crossing through the 30 road-crossing structures. > In 134 occasions (i.e., 134 of my records on individual > crossing-structures), no animal crossed (so, @Dimitris Rizopoulos, no, I > didn't have the species of the animals which did not cross. A "no cross" > was a "zero" for that visit to the crossing-structure). In 498 occasions, > at least one animal of a given species crossed the structure (these were my > "ones" in my logistic response) > - A side comment: This is to respond to a reviewer in a paper of mine, > i.e., I did and presented species-specific and "all combined species" > models in the draft reviewed but now the reviewer is asking me to control > for the species variability in the "combined species model". He asked me to > include a random factor but I realized that is not possible since all my > zeros would have "none" for the species that crossed. So, is it possible to > control for the species variability in my model in lme4 in another way? I > know in nlme including a fitting of variance structures it's not that > difficult... > - Every time an animal crossed, the binary response was "one" and I > recorded the animal species as well. Thus, I have variability between > species in the "ones" but not in my "zeros" of my logistic model. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Martin Maechler
2018-Sep-04 09:28 UTC
[R] Account for a factor variability in a logistic GLMM in lme4
>>>>> Jim Lemon >>>>> on Tue, 4 Sep 2018 08:36:22 +1000 writes:> Hi Pedro, > I have encountered similar situations in a number of areas. Great care > is taken to record significant events of low probability, but not the > non-occurrence of those events. Sometimes this is due to a problem > with the definition of non-occurrence. To use your example, how close > does an animal have to approach the crossing to be counted as not > crossing? Perhaps it was just a failure to record the species of > animals that didn't cross. In that case you have a problem, because > the probability of crossing within species cannot be estimated from > the data you describe. > Jim Indeed! For those among us too young to remember: The 1986 Space shuttle Challenger catastrophe was co-caused by that mistake: Only considering the '1's and not considering the '0's in the data (visualised and shown to the decision making experts). See, e.g., https://priceonomics.com/the-space-shuttle-challenger-explosion-and-the-o/ (couldn't easily find a more academic / reliable source which *does* include the graphics) Martin Maechler ETH Zurich > On Tue, Sep 4, 2018 at 12:43 AM Pedro Vaz <zasvaz at gmail.com> wrote: >> >> We did a field study in which we tried to understand which factors >> significantly explain the probability of a group of animals (5 species in >> total) crossing through 30 wildlife road-crossing structures. The response >> variable is binomial (yes=crossed; no = did not cross) and was recorded by >> animal species. We did about 30 visits to each crossing structure (our >> random factor) in which we recorded the binomial response by each animal >> species and the values of a few predictors. >> >> So, I have this (simplified for better understanding) mixed effects model: >> library (lme4) >> >> Mymodel <- glmer(cross.01 ~ stream.01 + width.m + grass.per + (1|structure.id), >> data = Mydata, family = binomial) >> >> stream is a factor with 2 levels; width.m is continuous; grass.per is a >> percentage >> >> This is the model in which I assessed crossings by all species combined >> (i.e., cross. 01 = 1 when an animal of any species crossed, cross.01 = 0 >> when no animal crossed). However, we did one model per species and those >> species-specific models highlight that different species exhibit different >> relationships between crossings and explanatory variables. >> >> My problem: This means that my model above suffers from an additional >> source of variation related to the species level without accounting for it. >> However I cannot recalibrate the above model adding the species level as >> random factor because, in my binomial response, the zero means no species >> crossed (all zeros would have "NA" or, say, "none" for species) and so that >> additional source of variation is only present when the response was 1. >> Just to confirm this, I did add species as a random factor: >> >> (1 | structure.id) + (1 | species) >> >> As expected, the message is "Error: Response is constant" >> >> How can I account for the species variability in my model in lme4? >> >> A few more details: >> A few more details: >> - I had 5 mammal species crossing through the 30 road-crossing structures. >> In 134 occasions (i.e., 134 of my records on individual >> crossing-structures), no animal crossed (so, @Dimitris Rizopoulos, no, I >> didn't have the species of the animals which did not cross. A "no cross" >> was a "zero" for that visit to the crossing-structure). In 498 occasions, >> at least one animal of a given species crossed the structure (these were my >> "ones" in my logistic response) >> - A side comment: This is to respond to a reviewer in a paper of mine, >> i.e., I did and presented species-specific and "all combined species" >> models in the draft reviewed but now the reviewer is asking me to control >> for the species variability in the "combined species model". He asked me to >> include a random factor but I realized that is not possible since all my >> zeros would have "none" for the species that crossed. So, is it possible to >> control for the species variability in my model in lme4 in another way? I >> know in nlme including a fitting of variance structures it's not that >> difficult... >> - Every time an animal crossed, the binary response was "one" and I >> recorded the animal species as well. Thus, I have variability between >> species in the "ones" but not in my "zeros" of my logistic model. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.