kavaumail-r at yahoo.com
2006-Apr-28 22:19 UTC
[R] aggregating columns in a data frame in different ways
I would like to use aggregate() to combine statistics for several days in a data frame. My data frame looks similar to this: date type count value 1 2006-04-01 A 10 99.6 2 2006-04-01 B 4 33.2 3 2006-04-02 A 22 43.2 4 2006-04-02 B 8 44.9 5 2006-04-03 A 12 12.4 6 2006-04-03 B 14 18.5 ('date' is a factor, and my actual data frame has about 100 different 'types', not just two) I would like to sum up the 'counts' per 'type', and get an average of the 'values' per 'type'. In other words, I would like my results to look like this: type count value 1 A 44 51.73333 2 B 26 32.2 The way I'm doing this now is to tear the table apart into its individual columns, then apply aggregate() to each column individually (using the 'type' column for the 'by' parameter), and finally putting everything back together, like this:> A.count = aggregate(A$count, list(type=A$type), sum) > A.value = aggregate(A$value, list(type=A$type),mean)> B = data.frame(type=A.count$type, count=A.count$x,value=A.value$x) My actual table is a bit more involved than in this simple example, however, so this becomes quite tedious. I am hoping that there is a simpler way for doing this, for example by providing different FUN parameters for each column to the aggregate() function. I would appreciate any suggestions. Thanks Klaus
jim holtman
2006-Apr-28 22:29 UTC
[R] aggregating columns in a data frame in different ways
Does this do what you want?> xdate type count value 1 2006-04-01 A 10 99.6 2 2006-04-01 B 4 33.2 3 2006-04-02 A 22 43.2 4 2006-04-02 B 8 44.9 5 2006-04-03 A 12 12.4 6 2006-04-03 B 14 18.5> y <- lapply(split(1:nrow(x), x$type), function(.ind){+ c(count=sum(x$count[.ind]), value=mean(x$value[.ind])) + })> do.call('rbind', y)count value A 44 51.73333 B 26 32.20000>On 4/28/06, kavaumail-r@yahoo.com <kavaumail-r@yahoo.com> wrote:> > I would like to use aggregate() to combine statistics > for several days in a data frame. My data frame looks > similar to this: > > date type count value > 1 2006-04-01 A 10 99.6 > 2 2006-04-01 B 4 33.2 > 3 2006-04-02 A 22 43.2 > 4 2006-04-02 B 8 44.9 > 5 2006-04-03 A 12 12.4 > 6 2006-04-03 B 14 18.5 > > ('date' is a factor, and my actual data frame has > about 100 different 'types', not just two) > > I would like to sum up the 'counts' per 'type', and > get an average of the 'values' per 'type'. In other > words, I would like my results to look like this: > > type count value > 1 A 44 51.73333 > 2 B 26 32.2 > > The way I'm doing this now is to tear the table apart > into its individual columns, then apply aggregate() to > each column individually (using the 'type' column for > the 'by' parameter), and finally putting everything > back together, like this: > > > A.count = aggregate(A$count, list(type=A$type), sum) > > A.value = aggregate(A$value, list(type=A$type), > mean) > > B = data.frame(type=A.count$type, count=A.count$x, > value=A.value$x) > > My actual table is a bit more involved than in this > simple example, however, so this becomes quite > tedious. > > I am hoping that there is a simpler way for doing > this, for example by providing different FUN > parameters for each column to the aggregate() > function. > > I would appreciate any suggestions. > Thanks > Klaus > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >-- Jim Holtman Cincinnati, OH +1 513 646 9390 (Cell) +1 513 247 0281 (Home) What the problem you are trying to solve? [[alternative HTML version deleted]]
Gabor Grothendieck
2006-Apr-28 22:57 UTC
[R] aggregating columns in a data frame in different ways
Here are three possibilities: 1. aggregate on the columns that you want to sum and aggregate on the columns that you want to average and then merge them: By <- A[, 2, drop = FALSE] merge(aggregate(A[, 3, drop = FALSE], By, sum), aggregate(A[, 4, drop = FALSE], By, mean)) 2. use by: f <- function(x) with(x, c(count = sum(count), value = mean(value))) do.call("rbind", by(A[, 3:4], A[, 2, drop = FALSE], f)) 3. use summaryBy in the doBy package picking off the appropriate columns in the output: library(doBy) summaryBy(. ~ type, A[, -1], FUN = c(sum, mean))[, c(1, 2, 5)] On 4/28/06, kavaumail-r at yahoo.com <kavaumail-r at yahoo.com> wrote:> I would like to use aggregate() to combine statistics > for several days in a data frame. My data frame looks > similar to this: > > date type count value > 1 2006-04-01 A 10 99.6 > 2 2006-04-01 B 4 33.2 > 3 2006-04-02 A 22 43.2 > 4 2006-04-02 B 8 44.9 > 5 2006-04-03 A 12 12.4 > 6 2006-04-03 B 14 18.5 > > ('date' is a factor, and my actual data frame has > about 100 different 'types', not just two) > > I would like to sum up the 'counts' per 'type', and > get an average of the 'values' per 'type'. In other > words, I would like my results to look like this: > > type count value > 1 A 44 51.73333 > 2 B 26 32.2 > > The way I'm doing this now is to tear the table apart > into its individual columns, then apply aggregate() to > each column individually (using the 'type' column for > the 'by' parameter), and finally putting everything > back together, like this: > > > A.count = aggregate(A$count, list(type=A$type), sum) > > A.value = aggregate(A$value, list(type=A$type), > mean) > > B = data.frame(type=A.count$type, count=A.count$x, > value=A.value$x) > > My actual table is a bit more involved than in this > simple example, however, so this becomes quite > tedious. > > I am hoping that there is a simpler way for doing > this, for example by providing different FUN > parameters for each column to the aggregate() > function. > > I would appreciate any suggestions. > Thanks > Klaus > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
I have to be making a riddiculously silly ommission. when I run the fillowing i get the cloud plot ok. But I cant figure out what I am missing out when I call wireframe. Any help would be appreciated. x<-runif(100) y<-rnorm(100) z<-runif(100) temp <-data.frame(x,y,z) wireframe(x~y*z,temp) cloud(x~y*z,temp)