thr3ads.net - R devel - [Rd] problem in add1's F statistic when data contains NAs? [May 2013]

If this information is useful, please help other people find it:
Share via:

William Dunlap

2013-May-14 19:23 UTC

[Rd] problem in add1's F statistic when data contains NAs?

Shouldn't the F statistic (and p value) for the x2 term in the following
calls
to anova() and add1() be the same?  I think anova() gets it right and add1()
does not.
> d <- data.frame(y=1:10, x1=log(1:10), x2=replace(1/(1:10), 2:3, NA))
> anova(lm(y ~ x1 + x2, data=d))Analysis of Variance Table

Response: y
          Df    Sum Sq   Mean Sq    F value     Pr(>F)    
x1         1 52.905613 52.905613 1108.61455 4.5937e-07 ***
x2         1  6.355775  6.355775  133.18256 8.5678e-05 ***
Residuals  5  0.238611  0.047722                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1 > add1(lm(y ~ x1, data=d), y ~ x1 + x2, test="F")Single term additions

Model:
y ~ x1
       Df Sum of Sq       RSS         AIC   F value     Pr(>F)    
<none>              6.5943869   2.4542182                         
x2      1 6.3557755 0.2386114 -22.0988844 186.45559 2.6604e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Warning message:
In add1.lm(lm(y ~ x1, data = d), y ~ x1 + x2, test = "F") :
  using the 8/10 rows from a combined fit

It looks like add1 is using 7 instead of 5 for the denominator degrees of
freedom,
7 being the value in the original fit, before the 2 rows containing NA's in
x2
were omitted.
> (6.355775/1) / (0.238611/5)
[1] 133.1827745> (6.355775/1) / (0.238611/7)[1] 186.4558843

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

Paul Johnson

2013-May-17 17:48 UTC

head link

[Rd] problem in add1's F statistic when data contains NAs?

Bill

I think you are correct, there's something funny in add1, but is it just
degrees of freedom? Example below..

On Tue, May 14, 2013 at 2:23 PM, William Dunlap <wdunlap@tibco.com> wrote:
> Shouldn't the F statistic (and p value) for the x2 term in the
following
> calls
> to anova() and add1() be the same?  I think anova() gets it right and
> add1()
> does not.
>
> > d <- data.frame(y=1:10, x1=log(1:10), x2=replace(1/(1:10), 2:3,
NA))
> > anova(lm(y ~ x1 + x2, data=d))
> Analysis of Variance Table
>
> Response: y
>           Df    Sum Sq   Mean Sq    F value     Pr(>F)
> x1         1 52.905613 52.905613 1108.61455 4.5937e-07 ***
> x2         1  6.355775  6.355775  133.18256 8.5678e-05 ***
> Residuals  5  0.238611  0.047722
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> > add1(lm(y ~ x1, data=d), y ~ x1 + x2, test="F")
> Single term additions
>
> Model:
> y ~ x1
>        Df Sum of Sq       RSS         AIC   F value     Pr(>F)
> <none>              6.5943869   2.4542182
> x2      1 6.3557755 0.2386114 -22.0988844 186.45559 2.6604e-06 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> Warning message:
> In add1.lm(lm(y ~ x1, data = d), y ~ x1 + x2, test = "F") :
>   using the 8/10 rows from a combined fit
>
> It looks like add1 is using 7 instead of 5 for the denominator degrees of
> freedom,
> 7 being the value in the original fit, before the 2 rows containing
NA's
> in x2
> were omitted.
>
> > (6.355775/1) / (0.238611/5)
> [1] 133.1827745
> > (6.355775/1) / (0.238611/7)
> [1] 186.4558843
>
>
Here's another view of the same problem.
> m1 <- lm(y ~ x1, data = d)
> m2 <- lm(y ~ x1 + x2, data = d)
> anova(m2, m1, test = "F")Error in anova.lmlist(object, ...) :
  models were not all fitted to the same size of dataset

# It is necessary to refit m1 on the smaller dataset, add1 claims it is
doing it.
> m3 <- lm(y ~ x1, data = model.frame(m2))
> anova(m2, m3, test = "F")Analysis of Variance Table

Model 1: y ~ x1 + x2
Model 2: y ~ x1
  Res.Df   RSS Df Sum of Sq     F   Pr(>F)
1      5 0.239
2      6 6.594 -1    -6.356 133.2 8.57e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## Now compare add1 against m1 and m3

> add1(m1, y ~ x1 + x2, test = "F")Single term additions

Model:
y ~ x1
       Df Sum of Sq   RSS     AIC F value   Pr(>F)
<none>              6.594   2.454
x2      1     6.356 0.239 -22.099   186.5 2.66e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In add1.lm(m1, y ~ x1 + x2, test = "F") :
  using the 8/10 rows from a combined fit

## That F value is just wrong, isn't it?

> add1(m3, y ~ x1 + x2, test = "F")Single term additions

Model:
y ~ x1
       Df Sum of Sq   RSS     AIC F value   Pr(>F)
<none>              6.594   2.454
x2      1     6.356 0.239 -22.099   133.2 8.57e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


It seems to me that those last 2 uses of add1 ought to return the same
thing, shouldn't they?

pj


> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Paul E. Johnson
Professor, Political Science      Assoc. Director
1541 Lilac Lane, Room 504      Center for Research Methods
University of Kansas                 University of Kansas
http://pj.freefaculty.org               http://quant.ku.edu

	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more apparently analagous threads

R devel - May 2013 - problem in add1's F statistic when data contains NAs?

[Rd] problem in add1's F statistic when data contains NAs?

[Rd] problem in add1's F statistic when data contains NAs?

Reasonably Related Threads