William Dunlap
2013-May-14 19:23 UTC
[Rd] problem in add1's F statistic when data contains NAs?
Shouldn't the F statistic (and p value) for the x2 term in the following calls to anova() and add1() be the same? I think anova() gets it right and add1() does not.> d <- data.frame(y=1:10, x1=log(1:10), x2=replace(1/(1:10), 2:3, NA)) > anova(lm(y ~ x1 + x2, data=d))Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x1 1 52.905613 52.905613 1108.61455 4.5937e-07 *** x2 1 6.355775 6.355775 133.18256 8.5678e-05 *** Residuals 5 0.238611 0.047722 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> add1(lm(y ~ x1, data=d), y ~ x1 + x2, test="F")Single term additions Model: y ~ x1 Df Sum of Sq RSS AIC F value Pr(>F) <none> 6.5943869 2.4542182 x2 1 6.3557755 0.2386114 -22.0988844 186.45559 2.6604e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Warning message: In add1.lm(lm(y ~ x1, data = d), y ~ x1 + x2, test = "F") : using the 8/10 rows from a combined fit It looks like add1 is using 7 instead of 5 for the denominator degrees of freedom, 7 being the value in the original fit, before the 2 rows containing NA's in x2 were omitted.> (6.355775/1) / (0.238611/5)[1] 133.1827745> (6.355775/1) / (0.238611/7)[1] 186.4558843 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
Paul Johnson
2013-May-17 17:48 UTC
[Rd] problem in add1's F statistic when data contains NAs?
Bill I think you are correct, there's something funny in add1, but is it just degrees of freedom? Example below.. On Tue, May 14, 2013 at 2:23 PM, William Dunlap <wdunlap@tibco.com> wrote:> Shouldn't the F statistic (and p value) for the x2 term in the following > calls > to anova() and add1() be the same? I think anova() gets it right and > add1() > does not. > > > d <- data.frame(y=1:10, x1=log(1:10), x2=replace(1/(1:10), 2:3, NA)) > > anova(lm(y ~ x1 + x2, data=d)) > Analysis of Variance Table > > Response: y > Df Sum Sq Mean Sq F value Pr(>F) > x1 1 52.905613 52.905613 1108.61455 4.5937e-07 *** > x2 1 6.355775 6.355775 133.18256 8.5678e-05 *** > Residuals 5 0.238611 0.047722 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > add1(lm(y ~ x1, data=d), y ~ x1 + x2, test="F") > Single term additions > > Model: > y ~ x1 > Df Sum of Sq RSS AIC F value Pr(>F) > <none> 6.5943869 2.4542182 > x2 1 6.3557755 0.2386114 -22.0988844 186.45559 2.6604e-06 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > Warning message: > In add1.lm(lm(y ~ x1, data = d), y ~ x1 + x2, test = "F") : > using the 8/10 rows from a combined fit > > It looks like add1 is using 7 instead of 5 for the denominator degrees of > freedom, > 7 being the value in the original fit, before the 2 rows containing NA's > in x2 > were omitted. > > > (6.355775/1) / (0.238611/5) > [1] 133.1827745 > > (6.355775/1) / (0.238611/7) > [1] 186.4558843 > >Here's another view of the same problem.> m1 <- lm(y ~ x1, data = d) > m2 <- lm(y ~ x1 + x2, data = d) > anova(m2, m1, test = "F")Error in anova.lmlist(object, ...) : models were not all fitted to the same size of dataset # It is necessary to refit m1 on the smaller dataset, add1 claims it is doing it.> m3 <- lm(y ~ x1, data = model.frame(m2))> anova(m2, m3, test = "F")Analysis of Variance Table Model 1: y ~ x1 + x2 Model 2: y ~ x1 Res.Df RSS Df Sum of Sq F Pr(>F) 1 5 0.239 2 6 6.594 -1 -6.356 133.2 8.57e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## Now compare add1 against m1 and m3> add1(m1, y ~ x1 + x2, test = "F")Single term additions Model: y ~ x1 Df Sum of Sq RSS AIC F value Pr(>F) <none> 6.594 2.454 x2 1 6.356 0.239 -22.099 186.5 2.66e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning message: In add1.lm(m1, y ~ x1 + x2, test = "F") : using the 8/10 rows from a combined fit ## That F value is just wrong, isn't it?> add1(m3, y ~ x1 + x2, test = "F")Single term additions Model: y ~ x1 Df Sum of Sq RSS AIC F value Pr(>F) <none> 6.594 2.454 x2 1 6.356 0.239 -22.099 133.2 8.57e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 It seems to me that those last 2 uses of add1 ought to return the same thing, shouldn't they? pj> Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]]