I have a few questions relating to overdispersion in a sex ratio data set that I am working with (note that I already have an analysis with GLMMs for fixed effects, this is just to estimate dispersion). The response variable is binomial because nestlings can only be male or female. I have samples of 1-5 nestlings from each nest (individuals within a nest are not independent, so the response variable is the ratio of sons to daughters) and some females have multiple nests in the data set (so I need to include female identity as a random effect). Here is an example of what the three vectors used in the model look like (the real data set is much bigger, just to illustrate what I?m talking about): male_chick_no=c(2,4,1,0,3,5,2) female_chick_no=c(1,0,3,3,1,0,2) FemaleID=c(A,A,B,B,C,D,E) My first question relates to coding the test in R. I received this suggested R syntax from a reviewer: SexRatio = cbind(male_chick_no, female_chick_no) Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial) But when I try to use this in R I get the error: ?Error in glmer(formula ratio ~ 1 + (1 | femid), family = quasibinomial) : "quasi" families cannot be used in glmer?. I?ve tried playing around with some other mixed model functions but can?t seem to find one that will provide an estimate of dispersion and allow me to include my random effect. Is there some other function I should be using? Or is there a different syntax that I should use for lmer? My second question is more general: I understand that with binomial data overdispersion suggests that the observed data have a greater variance than expected given binomial errors (in my case this means that more nests would be all male/all female than expected if sex is random). So with binomial errors the expected estimate of dispersion is 1, if I find that dispersion is > 1 it suggests that my data are overdispersed. My question is, how much greater than 1 should that number be to conclude that the data are overdispersed? Is there a rule of thumb or does it just depend on the dataset? I was thinking of doing a randomization test with the same structure (nest size and female id) as my real data set but with sex ratio of each nest randomized with a 50:50 chance of individuals being sons or daughters and comparing my observed dispersion to the distribution of dispersions from the randomization test. Would this be a valid way to ask whether my data are overdispersed? Is it even necessary? Any help/advice that you can provide would be greatly appreciated. I am relatively new to R so explicit instructions (i.e. easy to follow) would be wonderful. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Question-on-overdispersion-tp3049898p3049898.html Sent from the R help mailing list archive at Nabble.com.
Dear Nameless, The quasi distribution can no longer be used in lme4 because a) the results were not very reliable b) there is an alternative to model overdispersion. The alternative is to expand your dataset to bernoulli trials. Then add a random effect with one level per observation. This random effect will model additive overdisperion. The quasi distributions model overdisperion multiplicative. In the example below, the random effect of RowID has 0 variance. Hence no overdispersion. dataset <- data.frame( male_chick_no = c(2,4,1,0,3,5,2), female_chick_no=c(1,0,3,3,1,0,2), FemaleID=c("A","A","B","B","C","D","E")) longFormat <- do.call(rbind, lapply(seq_len(nrow(dataset)), function(i){ with(dataset, data.frame(Sex = c(rep("M", male_chick_no[i]), rep("F", female_chick_no[i])), FemaleID = FemaleID[i])) })) longFormat$FemaleID <- factor(longFormat$FemaleID) longFormat$RowID <- factor(seq_len(nrow(longFormat))) longFormat$Male <- longFormat$Sex == "M" library(lme4) fit1 <- glmer(Male ~ (1|FemaleID), data = longFormat, family = binomial) fit2 <- glmer(Male ~ (1|FemaleID) + (1|RowID), data = longFormat, family = binomial) anova(fit1, fit2) Best regards, Thierry PS sig-mixed-models is a better mailinglist for this kind of questions. ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Namens cct663 > Verzonden: vrijdag 19 november 2010 5:39 > Aan: r-help at r-project.org > Onderwerp: [R] Question on overdispersion > > > I have a few questions relating to overdispersion in a sex > ratio data set that I am working with (note that I already > have an analysis with GLMMs for fixed effects, this is just > to estimate dispersion). The response variable is binomial > because nestlings can only be male or female. I have samples of > 1-5 nestlings from each nest (individuals within a nest are > not independent, so the response variable is the ratio of > sons to daughters) and some females have multiple nests in > the data set (so I need to include female identity as a > random effect). > > Here is an example of what the three vectors used in the > model look like (the real data set is much bigger, just to > illustrate what I'm talking > about): > > male_chick_no=c(2,4,1,0,3,5,2) > female_chick_no=c(1,0,3,3,1,0,2) > FemaleID=c(A,A,B,B,C,D,E) > > My first question relates to coding the test in R. I received > this suggested R syntax from a reviewer: > > SexRatio = cbind(male_chick_no, female_chick_no) > > Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial) > > But when I try to use this in R I get the error: "Error in > glmer(formula = ratio ~ 1 + (1 | femid), family = > quasibinomial) : "quasi" families cannot be used in glmer". > > I've tried playing around with some other mixed model > functions but can't seem to find one that will provide an > estimate of dispersion and allow me to include my random effect. > > Is there some other function I should be using? Or is there a > different syntax that I should use for lmer? > > My second question is more general: I understand that with > binomial data overdispersion suggests that the observed data > have a greater variance than expected given binomial errors > (in my case this means that more nests would be all male/all > female than expected if sex is random). So with binomial > errors the expected estimate of dispersion is 1, if I find > that dispersion is > 1 it suggests that my data are > overdispersed. My question is, how much greater than 1 should > that number be to conclude that the data are overdispersed? > Is there a rule of thumb or does it just depend on the dataset? > > I was thinking of doing a randomization test with the same > structure (nest size and female id) as my real data set but > with sex ratio of each nest randomized with a 50:50 chance of > individuals being sons or daughters and comparing my observed > dispersion to the distribution of dispersions from the > randomization test. Would this be a valid way to ask whether > my data are overdispersed? Is it even necessary? > > Any help/advice that you can provide would be greatly > appreciated. I am relatively new to R so explicit > instructions (i.e. easy to follow) would be wonderful. > > Thanks. > > -- > View this message in context: > http://r.789695.n4.nabble.com/Question-on-overdispersion-tp304 > 9898p3049898.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
cct663 <cct663 <at> gmail.com> writes:> I have a few questions relating to overdispersion in a sex ratio data set > that I am working with (note that I already have an analysis with GLMMs for > fixed effects, this is just to estimate dispersion). The response variable > is binomial because nestlings can only be male or female. I have samples of > 1-5 nestlings from each nest (individuals within a nest are not independent, > so the response variable is the ratio of sons to daughters) and some females > have multiple nests in the data set (so I need to include female identity as > a random effect). > > Here is an example of what the three vectors used in the model look like > (the real data set is much bigger, just to illustrate what I?m talking > about): > > male_chick_no=c(2,4,1,0,3,5,2) > female_chick_no=c(1,0,3,3,1,0,2) > FemaleID=c(A,A,B,B,C,D,E) > > My first question relates to coding the test in R. I received this suggested > R syntax from a reviewer: > > SexRatio = cbind(male_chick_no, female_chick_no) > > Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial) > > But when I try to use this in R I get the error: ?Error in glmer(formula > ratio ~ 1 + (1 | femid), family = quasibinomial) : "quasi" families cannot > be used in glmer?.After many discussions of the unreliability of quasi-likelihood estimation in lme4, Doug Bates added this error/warning in a fairly recent version. The recommended approach now is to use individual-level random effects: to continue your example above, indiv <- 1:length(FemaleID) Model <- lmer(SexRatio ~ 1 + (1|indiv) + (1|FemaleID), family=binomial) By the way, it is generally better practice to put all of your data into a dataframe and use the data= argument to lmer.> My second question is more general: I understand that with binomial data > overdispersion suggests that the observed data have a greater variance than > expected given binomial errors (in my case this means that more nests would > be all male/all female than expected if sex is random). So with binomial > errors the expected estimate of dispersion is 1, if I find that dispersion > is > 1 it suggests that my data are overdispersed. My question is, how much > greater than 1 should that number be to conclude that the data are > overdispersed? Is there a rule of thumb or does it just depend on the > dataset?Very generally/crudely, the (squared) Pearson residuals are expected to be chi-square distributed. Specifically, if you know the residual degrees of freedom (which can be a bit tricky/poorly defined for mixed models, but is approximately (# data points)-(# estimated parameters), then sum(residuals^2) should be chi-squared distributed with df equal to the residual degrees of freedom. Venables and Ripley have a useful discussion.> I was thinking of doing a randomization test with the same structure (nest > size and female id) as my real data set but with sex ratio of each nest > randomized with a 50:50 chance of individuals being sons or daughters and > comparing my observed dispersion to the distribution of dispersions from the > randomization test. Would this be a valid way to ask whether my data are > overdispersed? Is it even necessary?It seems reasonable. You could go even farther and simulate data from your estimated model with binomial errors (i.e. use the estimated sex ratios rather than assuming a 50/50 sex ratio). Followups should probably be sent to the r-sig-mixed-models list.