rnassar@duke.edu
2000-Jun-25 16:34 UTC
[Rd] possible bug, anova.glm(), family="gaussian" (PR#579)
Dear R team,
I don't get what I think I should get when using anova.glm() with
family="gaussian" -- please ignore this and forgive me if this turns
out
to be another example of a fundamental misunderstanding on my part (a
highly likely event!) For example:
S <- as.factor(rep(c(rep("m",2),rep("f",2)),2))
A <- as.factor(c(rep("a",4),rep("b",4)))
y <- c(10,11,13,15,12,11,16,17)
f.glm.1 <- glm(y~A*S)
f.glm.2 <- glm(y~A+S)
anova(f.glm.1,f.glm.2,test="F")
# Analysis of Deviance Table
#
# Response: y
#
#
# Resid. Df Resid. Dev Df Deviance F Pr(>F)
# A + S + A:S 4 3.500
# A + S 5 4.625 -1 0.000 0.000 1
# compared with the same analysis using anova.lm() below:
f.lm.1 <- aov(y~A*S)
f.lm.2 <- aov(y~A+S)
anova(f.lm.1,f.lm.2)
# Analysis of Variance Table
#
# Model 1: y ~ A + S + A:S
# Model 2: y ~ A + S
# Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
# 1 4 3.500
# 2 5 4.625 -1 -1.125 1.2857 0.3202
Thank you again for R and best regards,
Rashid Nassar
--please do not edit the information below--
Version:
platform = i586-pc-linux-gnu
arch = i586
os = linux-gnu
system = i586, linux-gnu
status =
major = 1
minor = 1.0
year = 2000
month = June
day = 15
language = R
Search Path:
.GlobalEnv, package:MASS, Autoloads, package:base
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Peter Dalgaard BSA
2000-Jun-25 17:42 UTC
[Rd] possible bug, anova.glm(), family="gaussian" (PR#579)
rnassar@duke.edu writes:> anova(f.glm.1,f.glm.2,test="F")> # Resid. Df Resid. Dev Df Deviance F Pr(>F) > # A + S + A:S 4 3.500 > # A + S 5 4.625 -1 0.000 0.000 1 > > anova(f.lm.1,f.lm.2)> # Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F) > # 1 4 3.500 > # 2 5 4.625 -1 -1.125 1.2857 0.3202Hum.. It is being done on purpose. anova.glmlist has this nugget inside: table <- data.frame(resdf, resdev, c(NA, -diff(resdf)), c(NA, pmax(0, -diff(resdev)))) This is obviously inconsistent with the lm version, but I wonder which one is wrong? There may be a point in requiring models to be in increasing order. However, something is clearly wrong with that too:> anova(f.glm.2,f.glm.1,test="F")... Resid. Df Resid. Dev Df Deviance F Pr(>F) A + S 5 4.6250 A + S + A:S 4 3.5000 1 1.1250 1.2857 0.3745 Notice the p value! This comes from an F with 2 df in the denominator instead of 4. Comes out of this bit: if (!is.null(test)) { bigmodel <- object[[(rdf <- order(resdf)[1])]] dispersion <- summary(bigmodel, dispersion dispersion)$dispersion df.dispersion <- if (dispersion == 1) Inf else rdf which is clearly wrong since rdf will be an integer between 1 and the number of models compared... -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._