thr3ads.net - R help - [R] data.frame and formula classes of aggregate [Nov 2010]

If this information is useful, please help other people find it:
Share via:

David Freedman

2010-Nov-29 14:35 UTC

[R] data.frame and formula classes of aggregate

Hi - I apologize for the 2nd post, but I think my question from a few weeks
ago may have been overlooked on a Friday afternoon.

I might be missing something very obvious, but is it widely known that the
aggregate function handles missing values differently depending if a data
frame or a formula is the first argument ?  For example, 

(d<- data.frame(sex=rep(0:1,each=3),
wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
x1<- aggregate(d, by = list(d$sex), FUN = mean); 
	names(x1)[3:4]<- c('mean.dfcl.wt','mean.dfcl.ht')
x2<- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d); 
	names(x2)[2:3]<- c('mean.formcl.wt','mean.formcl.ht')
cbind(x1,x2)[,c(2,3,6,4,7)]

The output from the data.frame class has an NA if there are missing values
in the group for the variable with missing values.  But, the formula class
output seems to delete the entire row (missing and non-missing values) if
there are any NAs.  Wouldn't one expect that the 2 forms (data frame vs
formula) of aggregate would give the same result? 

thanks very much
david freedman, atlanta




-- 
View this message in context:
r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3063668.html
Sent from the R help mailing list archive at Nabble.com.

David Winsemius

2010-Nov-29 14:49 UTC

head link

[R] data.frame and formula classes of aggregate

On Nov 29, 2010, at 9:35 AM, David Freedman wrote:
>
> Hi - I apologize for the 2nd post, but I think my question from a  
> few weeks
> ago may have been overlooked on a Friday afternoon.
>
> I might be missing something very obvious, but is it widely known  
> that the
> aggregate function handles missing values differently depending if a  
> data
> frame or a formula is the first argument ?
I'm not sure if it is widely known, but it is certainly suggested by  
the documentation for aggregate, since aggregate.data.frame  has  
different defaults than aggregate.formula. See the Usage section at  
the very top of ?aggregate.

>  For example,
>
> (d<- data.frame(sex=rep(0:1,each=3),
> wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
> x1<- aggregate(d, by = list(d$sex), FUN = mean);
> 	names(x1)[3:4]<- c('mean.dfcl.wt','mean.dfcl.ht')
> x2<- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d);
> 	names(x2)[2:3]<- c('mean.formcl.wt','mean.formcl.ht')
> cbind(x1,x2)[,c(2,3,6,4,7)]
>
> The output from the data.frame class has an NA if there are missing  
> values
> in the group for the variable with missing values.  But, the formula  
> class
> output seems to delete the entire row (missing and non-missing  
> values) if
> there are any NAs.  Wouldn't one expect that the 2 forms (data frame  
> vs
> formula) of aggregate would give the same result?
>
> thanks very much
> david freedman, atlanta
>
>-- 

David Winsemius, MD
West Hartford, CT

Peter Ehlers

2010-Nov-29 18:01 UTC

head link

[R] data.frame and formula classes of aggregate

On 2010-11-29 06:35, David Freedman wrote:>
> Hi - I apologize for the 2nd post, but I think my question from a few weeks
> ago may have been overlooked on a Friday afternoon.
>
> I might be missing something very obvious, but is it widely known that the
> aggregate function handles missing values differently depending if a data
> frame or a formula is the first argument ?  For example,
>
> (d<- data.frame(sex=rep(0:1,each=3),
> wt=c(100,110,120,200,210,NA),ht=c(10,20,NA,30,40,50)))
> x1<- aggregate(d, by = list(d$sex), FUN = mean);
> 	names(x1)[3:4]<- c('mean.dfcl.wt','mean.dfcl.ht')
> x2<- aggregate(cbind(wt,ht)~sex,FUN=mean,data=d);
> 	names(x2)[2:3]<- c('mean.formcl.wt','mean.formcl.ht')
> cbind(x1,x2)[,c(2,3,6,4,7)]
>
> The output from the data.frame class has an NA if there are missing values
> in the group for the variable with missing values.  But, the formula class
> output seems to delete the entire row (missing and non-missing values) if
> there are any NAs.  Wouldn't one expect that the 2 forms (data frame vs
> formula) of aggregate would give the same result?
>
Wasn't there some discussion of this not long ago? Maybe I'm getting
senile. Anyway, as David W. points out, the defaults differ. Here's
how you can get the same result from both methods:

1. use na.action = na.pass in aggregate.formula;
    this will duplicate your x1 result.

2. use d <- d[complete.cases(d), ] in your x1 calculation;
    this will duplicate your x2 result.

Peter Ehlers
> thanks very much
> david freedman, atlanta
>
>
>
>

David Freedman

2010-Nov-29 19:02 UTC

head link

[R] data.frame and formula classes of aggregate

Thanks for the information.  

There was a discussion of different results obtained with the formula and
data.frame methods for a paired t-test -- there are many threads, but one is
at 
r.789695.n4.nabble.com/Paired-t-tests-td2325956.html#a2326291

david freedman
-- 
View this message in context:
r.789695.n4.nabble.com/data-frame-and-formula-classes-of-aggregate-tp3063668p3064177.html
Sent from the R help mailing list archive at Nabble.com.

Maybe Matching Threads

Search for more seemingly similar threads

R help - Nov 2010 - data.frame and formula classes of aggregate

[R] data.frame and formula classes of aggregate

[R] data.frame and formula classes of aggregate

[R] data.frame and formula classes of aggregate

[R] data.frame and formula classes of aggregate

Maybe Matching Threads