thr3ads.net - R help - [R] aggregate.formula implicitly removes rows containing NA [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Dickison, Daniel

2011-Jan-11 22:41 UTC

[R] aggregate.formula implicitly removes rows containing NA

The documentation for `aggregate` makes it sound like aggregate.formula should
behave identically to aggregate.data.frame (apart from the way the parameters
are passed).  But it looks like aggregate.formula is quietly removing rows where
any of the "output" variables (those on the LHS of the formula) are
NA.  This differs from how aggregate.data.frame works.  Is this expected
behavior?

Here are a couple of examples:
> d <- data.frame(a=rep(1:2, each=2),
+                 b=c(1,2,NA,3))> aggregate(d["b"], d["a"], mean)  a   b
1 1 1.5
2 2  NA> aggregate(b ~ a, d, mean)  a   b
1 1 1.5
2 2 3.0

It's removing whole rows even if just one of the columns is NA, i.e.:
> d <- data.frame(a=rep(1:2, each=2),+                 b=c(1,2,NA,3),
+                 c=c(NA,2,3,NA))> aggregate(cbind(b,c) ~ a, d, mean)  a b c
1 1 2 2

Daniel

David Winsemius

2011-Jan-11 23:56 UTC

head link

[R] aggregate.formula implicitly removes rows containing NA

On Jan 11, 2011, at 5:41 PM, Dickison, Daniel wrote:
> The documentation for `aggregate` makes it sound like  
> aggregate.formula should behave identically to aggregate.data.frame  
> (apart from the way the parameters are passed).  But it looks like  
> aggregate.formula is quietly removing rows where any of the
"output"
> variables (those on the LHS of the formula) are NA.  This differs  
> from how aggregate.data.frame works.  Is this expected behavior?
>
> Here are a couple of examples:
>
>> d <- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3))
>> aggregate(d["b"], d["a"], mean)
>  a   b
> 1 1 1.5
> 2 2  NA
>> aggregate(b ~ a, d, mean)
>  a   b
> 1 1 1.5
> 2 2 3.0
>
> It's removing whole rows even if just one of the columns is NA, i.e.:
>
>> d <- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3),
> +                 c=c(NA,2,3,NA))
>> aggregate(cbind(b,c) ~ a, d, mean)
>  a b c
> 1 1 2 2
>
The help page for aggregate gives the calling defaults for  
aggregate.formula as:
## S3 method for class 'formula' aggregate(formula, data, FUN, ...,  
subset, na.action = na.omit)
So the description you give seems to be adhering to what I would have  
expected (had I initially read the help page.)
-- 
David Winsemius, MD
West Hartford, CT

Peter Ehlers

2011-Jan-12 01:13 UTC

head link

[R] aggregate.formula implicitly removes rows containing NA

On 2011-01-11 14:41, Dickison, Daniel wrote:> The documentation for `aggregate` makes it sound like aggregate.formula
should behave identically to aggregate.data.frame (apart from the way the
parameters are passed).  But it looks like aggregate.formula is quietly removing
rows where any of the "output" variables (those on the LHS of the
formula) are NA.  This differs from how aggregate.data.frame works.  Is this
expected behavior?
>
> Here are a couple of examples:
>
>> d<- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3))
>> aggregate(d["b"], d["a"], mean)
>    a   b
> 1 1 1.5
> 2 2  NA
>> aggregate(b ~ a, d, mean)
>    a   b
> 1 1 1.5
> 2 2 3.0
>
> It's removing whole rows even if just one of the columns is NA, i.e.:
>
>> d<- data.frame(a=rep(1:2, each=2),
> +                 b=c(1,2,NA,3),
> +                 c=c(NA,2,3,NA))
>> aggregate(cbind(b,c) ~ a, d, mean)
>    a b c
> 1 1 2 2
>
> Daniel
Try setting na.acton = na.pass.

Peter Ehlers

Maybe Matching Threads

Search for more maybe matching threads

R help - Jan 2011 - aggregate.formula implicitly removes rows containing NA

[R] aggregate.formula implicitly removes rows containing NA

[R] aggregate.formula implicitly removes rows containing NA

[R] aggregate.formula implicitly removes rows containing NA

Maybe Matching Threads