thr3ads.net - R help - [R] aggregate and names of factors [Dec 2003]

If this information is useful, please help other people find it:
Share via:

Christophe Pallier

2003-Dec-08 12:17 UTC

[R] aggregate and names of factors

Hello,

I use the function 'aggregate' a lot.

One small annoyance is that it is necessary to name the factors in the
'by' list to get the names in the resulting data.frame (else, they
appear as Group.1, Group.2...etc). For example, I am forced to
write:

aggregate(y,list(f1=f1,f2=f2),mean)

instead of aggregate(y,list(f1,f2),mean)

(for two factors with short names, it is not such a big deal, but I
ususally have about 8 factors with long names...)

I wrote a modified 'aggregate.data.frame' function (see the code
below) so that it parses the names of the factors and uses them in the 
output
data.frame. I can now typer aggregate(y,list(f1,f2),mean) ans the 
resulting data.frame
has variables with names 'f1' and 'f2'.

However, I have a few questions:

1. Is is a good idea at all? When expressions rather than variables are
   used as factors, this will probably result in a mess. Can one test
   if an argument within a list, is just a variable name or a more
   complex expression?). Is there a better way?

2. I would also like to keep the name of the data when it is a
   vector, and not a data.frame. The current version transforms it into
'x'.
   I have not managed to modify this behavior, so I am forced to use
    aggregate(data.frame(y),list(f1,f2),mean)

3. I would love to have yet another a version that handles formula so
   that I could type:

   aggregate(y~f1*f2)

   I have a provisory version (see below), but it does not work very
   well.  I would be grateful for any suggestions. In particular, I
   would love to have a 'subset' parameter, as in the lm
   function)

Here is the small piece of code fot the embryo of aggregate.formula:

my.aggregate.formula = function(formula,FUN=mean) {
{
    d=model.frame(formula)

    factor.names=lapply(names(d)[sapply(d,is.factor)],as.name)
    factor.list=lapply(factor.names,eval)
    names(factor.list)=factor.names
    aggregate(d[1],factor.list,FUN)
}



Christophe Pallier
http://www.pallier.org

---------------

HEre is the code for aggregate.data.frame that recovers the name sof the 
factors:

my.aggregate.data.frame <- function (x, by, FUN, ...)
{
 
   if (!is.data.frame(x)) {
        x <- as.data.frame(x)
      }
        
    if (!is.list(by))
        stop("`by' must be a list")

    if (is.null(names(by))) {
      #  names(by) <- paste("Group", seq(along = by), sep =
".")
        names(by)=lapply(substitute(by)[-1],deparse)
    }
    else {
        nam <- names(by)
        ind <- which(nchar(nam) == 0)
        if (any(ind)) {
          names(by)[ind] <- lapply(substitute(by)[c(-1,-(ind))],deparse)
        }
    }
    y <- lapply(x, tapply, by, FUN, ..., simplify = FALSE)
    if (any(sapply(unlist(y, recursive = FALSE), length) > 1))
        stop("`FUN' must always return a scalar")
    z <- y[[1]]
    d <- dim(z)
    w <- NULL
    for (i in seq(along = d)) {
        j <- rep(rep(seq(1:d[i]), prod(d[seq(length = i - 1)]) *
            rep(1, d[i])), prod(d[seq(from = i + 1, length = length(d) -
            i)]))
        w <- cbind(w, dimnames(z)[[i]][j])
    }
    w <- w[which(!unlist(lapply(z, is.null))), ]
    y <- data.frame(w, lapply(y, unlist, use.names = FALSE))
    names(y) <- c(names(by), names(x))
    y
}

Peter Dalgaard

2003-Dec-08 13:13 UTC

head link

[R] aggregate and names of factors

Christophe Pallier <pallier at lscp.ehess.fr> writes:
> Hello,
> 
> I use the function 'aggregate' a lot.
> 
> One small annoyance is that it is necessary to name the factors in the
> 'by' list to get the names in the resulting data.frame (else, they
> appear as Group.1, Group.2...etc). For example, I am forced to
> write:
> 
> aggregate(y,list(f1=f1,f2=f2),mean)
> 
> instead of aggregate(y,list(f1,f2),mean)
> 
> (for two factors with short names, it is not such a big deal, but I
> ususally have about 8 factors with long names...)
> 
> I wrote a modified 'aggregate.data.frame' function (see the code
> below) so that it parses the names of the factors and uses them in the
> output
> data.frame. I can now typer aggregate(y,list(f1,f2),mean) ans the
> resulting data.frame
> has variables with names 'f1' and 'f2'.
> 
> However, I have a few questions:
> 
> 1. Is is a good idea at all? When expressions rather than variables are
>    used as factors, this will probably result in a mess. Can one test
>    if an argument within a list, is just a variable name or a more
>    complex expression?). Is there a better way?
This issue is not just relevant for aggregate. There are a couple of
other places where you want a named list to get names on output -
lapply(list(foo,bar,baz) function(x) lm(x~age)), say. One option that
I've been toying around with is to clone the code from data.frame and
have a function namedList() or nlist() which automagically supplies
names by deparsing the call. Now where did I put that code sketch...

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Seemingly Similar Threads

Search for more reasonably related threads

R help - Dec 2003 - aggregate and names of factors

[R] aggregate and names of factors

[R] aggregate and names of factors

Seemingly Similar Threads