rnassar@duke.edu
2000-Jun-25 16:34 UTC
[Rd] possible bug, anova.glm(), family="gaussian" (PR#579)
Dear R team, I don't get what I think I should get when using anova.glm() with family="gaussian" -- please ignore this and forgive me if this turns out to be another example of a fundamental misunderstanding on my part (a highly likely event!) For example: S <- as.factor(rep(c(rep("m",2),rep("f",2)),2)) A <- as.factor(c(rep("a",4),rep("b",4))) y <- c(10,11,13,15,12,11,16,17) f.glm.1 <- glm(y~A*S) f.glm.2 <- glm(y~A+S) anova(f.glm.1,f.glm.2,test="F") # Analysis of Deviance Table # # Response: y # # # Resid. Df Resid. Dev Df Deviance F Pr(>F) # A + S + A:S 4 3.500 # A + S 5 4.625 -1 0.000 0.000 1 # compared with the same analysis using anova.lm() below: f.lm.1 <- aov(y~A*S) f.lm.2 <- aov(y~A+S) anova(f.lm.1,f.lm.2) # Analysis of Variance Table # # Model 1: y ~ A + S + A:S # Model 2: y ~ A + S # Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F) # 1 4 3.500 # 2 5 4.625 -1 -1.125 1.2857 0.3202 Thank you again for R and best regards, Rashid Nassar --please do not edit the information below-- Version: platform = i586-pc-linux-gnu arch = i586 os = linux-gnu system = i586, linux-gnu status = major = 1 minor = 1.0 year = 2000 month = June day = 15 language = R Search Path: .GlobalEnv, package:MASS, Autoloads, package:base -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Peter Dalgaard BSA
2000-Jun-25 17:42 UTC
[Rd] possible bug, anova.glm(), family="gaussian" (PR#579)
rnassar@duke.edu writes:> anova(f.glm.1,f.glm.2,test="F")> # Resid. Df Resid. Dev Df Deviance F Pr(>F) > # A + S + A:S 4 3.500 > # A + S 5 4.625 -1 0.000 0.000 1 > > anova(f.lm.1,f.lm.2)> # Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F) > # 1 4 3.500 > # 2 5 4.625 -1 -1.125 1.2857 0.3202Hum.. It is being done on purpose. anova.glmlist has this nugget inside: table <- data.frame(resdf, resdev, c(NA, -diff(resdf)), c(NA, pmax(0, -diff(resdev)))) This is obviously inconsistent with the lm version, but I wonder which one is wrong? There may be a point in requiring models to be in increasing order. However, something is clearly wrong with that too:> anova(f.glm.2,f.glm.1,test="F")... Resid. Df Resid. Dev Df Deviance F Pr(>F) A + S 5 4.6250 A + S + A:S 4 3.5000 1 1.1250 1.2857 0.3745 Notice the p value! This comes from an F with 2 df in the denominator instead of 4. Comes out of this bit: if (!is.null(test)) { bigmodel <- object[[(rdf <- order(resdf)[1])]] dispersion <- summary(bigmodel, dispersion dispersion)$dispersion df.dispersion <- if (dispersion == 1) Inf else rdf which is clearly wrong since rdf will be an integer between 1 and the number of models compared... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._