thr3ads.net - R devel - [Rd] aggregate(empty data.frame) (PR#13167) [Oct 2008]

If this information is useful, please help other people find it:
Share via:

prokaj at cs.elte.hu

2008-Oct-15 07:05 UTC

[Rd] aggregate(empty data.frame) (PR#13167)

Full_Name: Vilmos Prokaj
Version: R-2..7.1
OS: Win XP
Submission from: (NULL) (157.181.227.218)


The 'aggregate' function on an empty data.frame generate an error,
however it
should return according to the documentation an empty data.frame.

e.g.
z<-data.frame(a=integer(0),b=numeric(0))
aggregate(z,by=z[1],FUN=sum)

In a more realistic situation 'z' is of the form z<-zz[cond,] where
cond is a
computed logical vector and zz is not empty data.frame.

Prof Brian Ripley

2008-Oct-15 12:15 UTC

head link

[Rd] aggregate(empty data.frame) (PR#13167)

On Wed, 15 Oct 2008, prokaj at cs.elte.hu wrote:
> Full_Name: Vilmos Prokaj
> Version: R-2..7.1
> OS: Win XP
> Submission from: (NULL) (157.181.227.218)
>
>
> The 'aggregate' function on an empty data.frame generate an error,
however it
> should return according to the documentation an empty data.frame.
Please explain that to me: I don't see it says so.

What I see is

      'aggregate.data.frame' is the data frame method.  If 'x'
is not a
      data frame, it is coerced to one.  Then, each of the variables
      (columns) in 'x' is split into subsets of cases (rows) of
      identical combinations of the components of 'by', and
'FUN' is
      applied to each such subset with further arguments in '...' passed
      to it. (I.e., 'tapply(VAR, by, FUN, ..., simplify = FALSE)' is
      done for each variable 'VAR' in 'x', conveniently wrapped
into one
      call to 'lapply()'.) Empty subsets are removed, and the result is
      reformatted into a data frame containing the variables in 'by' and
      'x'.

Since all the subsets are empty, there is no result to be reformatted. 
In particular the second and third columns of your example have types that 
can only be determined by running sum() and since all groups are empty, 
sum() is never run.  We can't create a data frame that would be consistent 
with that returned for one or more groups via the documented algorithm.

The error message could definitely be clearer, but I don't see an 
alternative to giving an error.
> e.g.
> z<-data.frame(a=integer(0),b=numeric(0))
> aggregate(z,by=z[1],FUN=sum)
>
> In a more realistic situation 'z' is of the form z<-zz[cond,]
where cond is a
> computed logical vector and zz is not empty data.frame.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Reasonably Related Threads

Search for more possibly parallel threads

R devel - Oct 2008 - aggregate(empty data.frame) (PR#13167)

[Rd] aggregate(empty data.frame) (PR#13167)

[Rd] aggregate(empty data.frame) (PR#13167)

Reasonably Related Threads