Dear all, I have a dataset where the interaction is more than obvious, but I was asked to give a p-value, so I ran a logistic regression using glm. Very funny, in the outcome the interaction term is NOT significant, although that's completely counterintuitive. There are 3 variables : spot (binary response), constr (gene construct) and vernalized (growth conditions). Only for the FLC construct after vernalization, the chance on spots should be lower. So in the model one would suspect the interaction term to be significant. Yet, only the two main terms are significant here. Can it be my data is too sparse to use these models? Am I using the wrong method? # data generation testdata <- matrix(c(rep(0:1,times=4),rep(c("FLC","FLC","free","free"),times=2), rep(c("no","yes"),each =4),3,42,1,44,27,20,3,42),ncol=4) colnames(testdata) <-c("spot","constr","vernalized","Freq") testdata <- as.data.frame(testdata) # model T0fit <- glm(spot~constr*vernalized, weights=Freq, data=testdata, family="binomial") anova(T0fit) Kind regards Joris [[alternative HTML version deleted]]
I think the interaction is not so strong anymore if you do what glm does: use a logit transformation. testdata <- matrix(c(rep(0:1,times=4),rep(c("FLC","FLC","free","free"),times=2), rep(c("no","yes"),each =4),3,42,1,44,27,20,3,42),ncol=4) colnames(testdata) <-c("spot","constr","vernalized","Freq") testdata <- as.data.frame(testdata) testdata$Freq <- as.numeric(as.character(testdata$Freq)) testdata$spot <- as.numeric(as.character(testdata$spot)) T2 <- reshape(testdata,v.names='Freq',timevar='spot',idvar=names(testdata)[c(2,3)],direction='wide') T2$Prop <- T2$Freq.0/(T2$Freq.0+T2$Freq.1) plot(log(T2$Prop/(1-T2$Prop)),x=interaction(T2$constr,T2$vernalized)) Kees joris meys wrote:> Dear all, > > I have a dataset where the interaction is more than obvious, but I was asked > to give a p-value, so I ran a logistic regression using glm. Very funny, in > the outcome the interaction term is NOT significant, although that's > completely counterintuitive. There are 3 variables : spot (binary response), > constr (gene construct) and vernalized (growth conditions). Only for the FLC > construct after vernalization, the chance on spots should be lower. So in > the model one would suspect the interaction term to be significant. > > Yet, only the two main terms are significant here. Can it be my data is too > sparse to use these models? Am I using the wrong method? > > # data generation > testdata <- > matrix(c(rep(0:1,times=4),rep(c("FLC","FLC","free","free"),times=2), > rep(c("no","yes"),each =4),3,42,1,44,27,20,3,42),ncol=4) > colnames(testdata) <-c("spot","constr","vernalized","Freq") > testdata <- as.data.frame(testdata) > > # model > T0fit <- glm(spot~constr*vernalized, weights=Freq, data=testdata, > family="binomial") > anova(T0fit) > > Kind regards > Joris > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thomas Lumley
2009-Mar-07 10:57 UTC
[R] Interaction term not significant when using glm???
On Fri, 6 Mar 2009, joris meys wrote:> Dear all, > > I have a dataset where the interaction is more than obvious, but I was asked > to give a p-value, so I ran a logistic regression using glm. Very funny, in > the outcome the interaction term is NOT significant, although that's > completely counterintuitive. There are 3 variables : spot (binary response), > constr (gene construct) and vernalized (growth conditions). Only for the FLC > construct after vernalization, the chance on spots should be lower. So in > the model one would suspect the interaction term to be significant. > > Yet, only the two main terms are significant here. Can it be my data is too > sparse to use these models? Am I using the wrong method?The point estimate for the interaction term is large: 1.79, or an odds ratio of nearly 6. The data are very strongly overdispersed (variance is 45 times larger than it should be), so they don't fit a binomial model well. If you used a quasibinomial model you would get no statistical significance for any of the terms. I would say the problem is partly combination of the overdispersion and the sample size. It doesn't help that the situation appears to be a difference between the FLC:yes cell and the other three cells, a difference that is spread out over the three parameters. -thomas> # data generation > testdata <- > matrix(c(rep(0:1,times=4),rep(c("FLC","FLC","free","free"),times=2), > rep(c("no","yes"),each =4),3,42,1,44,27,20,3,42),ncol=4) > colnames(testdata) <-c("spot","constr","vernalized","Freq") > testdata <- as.data.frame(testdata) > > # model > T0fit <- glm(spot~constr*vernalized, weights=Freq, data=testdata, > family="binomial") > anova(T0fit) > > Kind regards > Joris > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle