thr3ads.net - R help - [R] how to make aggregation in R ? [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Ferry

2009-Mar-20 01:09 UTC

[R] how to make aggregation in R ?

Hi,

I am trying to aggregate the sum of my test data.frame as follow:

testDF <- data.frame(v1 = c("a", "a", "a",
"a", "a", "b", "b", "b",
"b",
"b", "c", "c", "c", "c",
"c", "d", "d", "d", "d",
"d"),
                     v2 = c("a1", "a1", "a1",
"a2", "a3", "b1", "b1", "b2",
"b2", "b2", "c1", "c2", "c2",
"c2", "c3", "d1", "d2", "d3",
"d4", "d4"),
                     n1 = 1:20,
                     n2 = 21:40 )

testDF <- orderBy( ~ v1+v2, data = testDF)
rownames(testDF) <- NULL
> testDF   v1 v2 n1 n2
1   a a1  1 21
2   a a1  2 22
3   a a1  3 23
4   a a2  4 24
5   a a3  5 25
6   b b1  6 26
7   b b1  7 27
8   b b2  8 28
9   b b2  9 29
10  b b2 10 30
11  c c1 11 31
12  c c2 12 32
13  c c2 13 33
14  c c2 14 34
15  c c3 15 35
16  d d1 16 36
17  d d2 17 37
18  d d3 18 38
19  d d4 19 39
20  d d4 20 40>
testDF.result <- aggregate(list(testDF$n1, testDF$n2), by = list(testDF$v1,
testDF$v2), FUN = sum)
> testDF.result   Group.1 Group.2 X1.20 X21.40
1        a      a1     6     66
2        a      a2     4     24
3        a      a3     5     25
4        b      b1    13     53
5        b      b2    27     87
6        c      c1    11     31
7        c      c2    39     99
8        c      c3    15     35
9        d      d1    16     36
10       d      d2    17     37
11       d      d3    18     38
12       d      d4    39     79>

However, when I applied it to my real data, it failed. It seems that
aggregate require more memory that I have currently (I am using WinXP,
R2.8.0, 2GB RAM).

Basically I want to perform aggregate sum on my numeric fields (in the above
case, n1 and n2) based on condition of v1 and v2.

Problem is, I have a lot more of than just two numerics and conditioning
fields.

In SQL, I would do:
select v1, v2, sum(n1), sum(n2) from myData
group by v1, v2;

Am I using a wrong function / library (or even wrong approach)? If so, can
you suggest which one?

Any pointer is really appreciated.

Thanks,

Ferry

	[[alternative HTML version deleted]]

Ferry

2009-Mar-20 01:18 UTC

head link

[R] how to make aggregation in R ?

Hi,

I think I found the solution.
Using doBy library, I got:

testDF.result2 <- summaryBy(n1+n2 ~ v1+v2, data = testDF, FUN=sum)
> testDF.result2   v1 v2 n1.sum n2.sum
1   a a1      6     66
2   a a2      4     24
3   a a3      5     25
4   b b1     13     53
5   b b2     27     87
6   c c1     11     31
7   c c2     39     99
8   c c3     15     35
9   d d1     16     36
10  d d2     17     37
11  d d3     18     38
12  d d4     39     79

In any case, did I do something wrong using the aggregate function?

Thanks,

Ferry

On Thu, Mar 19, 2009 at 6:09 PM, Ferry <fmi.mlist@gmail.com> wrote:
> Hi,
>
> I am trying to aggregate the sum of my test data.frame as follow:
>
> testDF <- data.frame(v1 = c("a", "a", "a",
"a", "a", "b", "b", "b",
"b",
> "b", "c", "c", "c", "c",
"c", "d", "d", "d", "d",
"d"),
>                      v2 = c("a1", "a1", "a1",
"a2", "a3", "b1", "b1", "b2",
> "b2", "b2", "c1", "c2",
"c2", "c2", "c3", "d1", "d2",
"d3", "d4", "d4"),
>                      n1 = 1:20,
>                      n2 = 21:40 )
>
> testDF <- orderBy( ~ v1+v2, data = testDF)
> rownames(testDF) <- NULL
>
> > testDF
>    v1 v2 n1 n2
> 1   a a1  1 21
> 2   a a1  2 22
> 3   a a1  3 23
> 4   a a2  4 24
> 5   a a3  5 25
> 6   b b1  6 26
> 7   b b1  7 27
> 8   b b2  8 28
> 9   b b2  9 29
> 10  b b2 10 30
> 11  c c1 11 31
> 12  c c2 12 32
> 13  c c2 13 33
> 14  c c2 14 34
> 15  c c3 15 35
> 16  d d1 16 36
> 17  d d2 17 37
> 18  d d3 18 38
> 19  d d4 19 39
> 20  d d4 20 40
> >
>
> testDF.result <- aggregate(list(testDF$n1, testDF$n2), by =
list(testDF$v1,
> testDF$v2), FUN = sum)
>
> > testDF.result
>    Group.1 Group.2 X1.20 X21.40
> 1        a      a1     6     66
> 2        a      a2     4     24
> 3        a      a3     5     25
> 4        b      b1    13     53
> 5        b      b2    27     87
> 6        c      c1    11     31
> 7        c      c2    39     99
> 8        c      c3    15     35
> 9        d      d1    16     36
> 10       d      d2    17     37
> 11       d      d3    18     38
> 12       d      d4    39     79
> >
>
>
> However, when I applied it to my real data, it failed. It seems that
> aggregate require more memory that I have currently (I am using WinXP,
> R2.8.0, 2GB RAM).
>
> Basically I want to perform aggregate sum on my numeric fields (in the
> above case, n1 and n2) based on condition of v1 and v2.
>
> Problem is, I have a lot more of than just two numerics and conditioning
> fields.
>
> In SQL, I would do:
> select v1, v2, sum(n1), sum(n2) from myData
> group by v1, v2;
>
> Am I using a wrong function / library (or even wrong approach)? If so, can
> you suggest which one?
>
> Any pointer is really appreciated.
>
> Thanks,
>
> Ferry
>
	[[alternative HTML version deleted]]

Gabor Grothendieck

2009-Mar-20 01:37 UTC

head link

[R] how to make aggregation in R ?

Here are two solutions:
> aggregate(testDF[c("n1", "n2")],
testDF[c("v1", "v2")], sum)   v1 v2 n1 n2
1   a a1  6 66
2   a a2  4 24
3   a a3  5 25
4   b b1 13 53
5   b b2 27 87
6   c c1 11 31
7   c c2 39 99
8   c c3 15 35
9   d d1 16 36
10  d d2 17 37
11  d d3 18 38
12  d d4 39 79
> library(sqldf)
> sqldf("select v1, v2, sum(n1), sum(n2) from testDF group by v1,
v2")   v1 v2 sum(n1) sum(n2)
1   a a1       6      66
2   a a2       4      24
3   a a3       5      25
4   b b1      13      53
5   b b2      27      87
6   c c1      11      31
7   c c2      39      99
8   c c3      15      35
9   d d1      16      36
10  d d2      17      37
11  d d3      18      38
12  d d4      39      79



On Thu, Mar 19, 2009 at 9:18 PM, Ferry <fmi.mlist at gmail.com>
wrote:> Hi,
>
> I think I found the solution.
> Using doBy library, I got:
>
> testDF.result2 <- summaryBy(n1+n2 ~ v1+v2, data = testDF, FUN=sum)
>
>> testDF.result2
> ? v1 v2 n1.sum n2.sum
> 1 ? a a1 ? ? ?6 ? ? 66
> 2 ? a a2 ? ? ?4 ? ? 24
> 3 ? a a3 ? ? ?5 ? ? 25
> 4 ? b b1 ? ? 13 ? ? 53
> 5 ? b b2 ? ? 27 ? ? 87
> 6 ? c c1 ? ? 11 ? ? 31
> 7 ? c c2 ? ? 39 ? ? 99
> 8 ? c c3 ? ? 15 ? ? 35
> 9 ? d d1 ? ? 16 ? ? 36
> 10 ?d d2 ? ? 17 ? ? 37
> 11 ?d d3 ? ? 18 ? ? 38
> 12 ?d d4 ? ? 39 ? ? 79
>
> In any case, did I do something wrong using the aggregate function?
>
> Thanks,
>
> Ferry
>
> On Thu, Mar 19, 2009 at 6:09 PM, Ferry <fmi.mlist at gmail.com>
wrote:
>
>> Hi,
>>
>> I am trying to aggregate the sum of my test data.frame as follow:
>>
>> testDF <- data.frame(v1 = c("a", "a",
"a", "a", "a", "b", "b",
"b", "b",
>> "b", "c", "c", "c",
"c", "c", "d", "d", "d",
"d", "d"),
>> ? ? ? ? ? ? ? ? ? ? ?v2 = c("a1", "a1",
"a1", "a2", "a3", "b1", "b1",
"b2",
>> "b2", "b2", "c1", "c2",
"c2", "c2", "c3", "d1", "d2",
"d3", "d4", "d4"),
>> ? ? ? ? ? ? ? ? ? ? ?n1 = 1:20,
>> ? ? ? ? ? ? ? ? ? ? ?n2 = 21:40 )
>>
>> testDF <- orderBy( ~ v1+v2, data = testDF)
>> rownames(testDF) <- NULL
>>
>> > testDF
>> ? ?v1 v2 n1 n2
>> 1 ? a a1 ?1 21
>> 2 ? a a1 ?2 22
>> 3 ? a a1 ?3 23
>> 4 ? a a2 ?4 24
>> 5 ? a a3 ?5 25
>> 6 ? b b1 ?6 26
>> 7 ? b b1 ?7 27
>> 8 ? b b2 ?8 28
>> 9 ? b b2 ?9 29
>> 10 ?b b2 10 30
>> 11 ?c c1 11 31
>> 12 ?c c2 12 32
>> 13 ?c c2 13 33
>> 14 ?c c2 14 34
>> 15 ?c c3 15 35
>> 16 ?d d1 16 36
>> 17 ?d d2 17 37
>> 18 ?d d3 18 38
>> 19 ?d d4 19 39
>> 20 ?d d4 20 40
>> >
>>
>> testDF.result <- aggregate(list(testDF$n1, testDF$n2), by =
list(testDF$v1,
>> testDF$v2), FUN = sum)
>>
>> > testDF.result
>> ? ?Group.1 Group.2 X1.20 X21.40
>> 1 ? ? ? ?a ? ? ?a1 ? ? 6 ? ? 66
>> 2 ? ? ? ?a ? ? ?a2 ? ? 4 ? ? 24
>> 3 ? ? ? ?a ? ? ?a3 ? ? 5 ? ? 25
>> 4 ? ? ? ?b ? ? ?b1 ? ?13 ? ? 53
>> 5 ? ? ? ?b ? ? ?b2 ? ?27 ? ? 87
>> 6 ? ? ? ?c ? ? ?c1 ? ?11 ? ? 31
>> 7 ? ? ? ?c ? ? ?c2 ? ?39 ? ? 99
>> 8 ? ? ? ?c ? ? ?c3 ? ?15 ? ? 35
>> 9 ? ? ? ?d ? ? ?d1 ? ?16 ? ? 36
>> 10 ? ? ? d ? ? ?d2 ? ?17 ? ? 37
>> 11 ? ? ? d ? ? ?d3 ? ?18 ? ? 38
>> 12 ? ? ? d ? ? ?d4 ? ?39 ? ? 79
>> >
>>
>>
>> However, when I applied it to my real data, it failed. It seems that
>> aggregate require more memory that I have currently (I am using WinXP,
>> R2.8.0, 2GB RAM).
>>
>> Basically I want to perform aggregate sum on my numeric fields (in the
>> above case, n1 and n2) based on condition of v1 and v2.
>>
>> Problem is, I have a lot more of than just two numerics and
conditioning
>> fields.
>>
>> In SQL, I would do:
>> select v1, v2, sum(n1), sum(n2) from myData
>> group by v1, v2;
>>
>> Am I using a wrong function / library (or even wrong approach)? If so,
can
>> you suggest which one?
>>
>> Any pointer is really appreciated.
>>
>> Thanks,
>>
>> Ferry
>>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

jim holtman

2009-Mar-20 01:40 UTC

head link

[R] how to make aggregation in R ?

Try this technique.  I use it with large data objects since it is
sometime faster, and uses less memory, by using indices:

x <- read.table(textConnection("  v1 v2 n1 n2
1   a a1  1 21
2   a a1  2 22
3   a a1  3 23
4   a a2  4 24
5   a a3  5 25
6   b b1  6 26
7   b b1  7 27
8   b b2  8 28
9   b b2  9 29
10  b b2 10 30
11  c c1 11 31
12  c c2 12 32
13  c c2 13 33
14  c c2 14 34
15  c c3 15 35
16  d d1 16 36
17  d d2 17 37
18  d d3 18 38
19  d d4 19 39
20  d d4 20 40"), header=TRUE)
closeAllConnections()
# use indices to reduce memory
x.ind <- split(seq(nrow(x)), list(x$v1, x$v2), drop=TRUE)
# now aggregate using the indices
x.agg <- do.call(rbind, lapply(x.ind, function(.seg){
    data.frame(v1=x$v1[.seg[1]], v2=x$v2[.seg[1]],
        n1=sum(x$n1[.seg]), n2=sum(x$n2[.seg]))
}))




On Thu, Mar 19, 2009 at 9:09 PM, Ferry <fmi.mlist at gmail.com>
wrote:> Hi,
>
> I am trying to aggregate the sum of my test data.frame as follow:
>
> testDF <- data.frame(v1 = c("a", "a", "a",
"a", "a", "b", "b", "b",
"b",
> "b", "c", "c", "c", "c",
"c", "d", "d", "d", "d",
"d"),
> ? ? ? ? ? ? ? ? ? ? v2 = c("a1", "a1", "a1",
"a2", "a3", "b1", "b1", "b2",
> "b2", "b2", "c1", "c2",
"c2", "c2", "c3", "d1", "d2",
"d3", "d4", "d4"),
> ? ? ? ? ? ? ? ? ? ? n1 = 1:20,
> ? ? ? ? ? ? ? ? ? ? n2 = 21:40 )
>
> testDF <- orderBy( ~ v1+v2, data = testDF)
> rownames(testDF) <- NULL
>
>> testDF
> ? v1 v2 n1 n2
> 1 ? a a1 ?1 21
> 2 ? a a1 ?2 22
> 3 ? a a1 ?3 23
> 4 ? a a2 ?4 24
> 5 ? a a3 ?5 25
> 6 ? b b1 ?6 26
> 7 ? b b1 ?7 27
> 8 ? b b2 ?8 28
> 9 ? b b2 ?9 29
> 10 ?b b2 10 30
> 11 ?c c1 11 31
> 12 ?c c2 12 32
> 13 ?c c2 13 33
> 14 ?c c2 14 34
> 15 ?c c3 15 35
> 16 ?d d1 16 36
> 17 ?d d2 17 37
> 18 ?d d3 18 38
> 19 ?d d4 19 39
> 20 ?d d4 20 40
>>
>
> testDF.result <- aggregate(list(testDF$n1, testDF$n2), by =
list(testDF$v1,
> testDF$v2), FUN = sum)
>
>> testDF.result
> ? Group.1 Group.2 X1.20 X21.40
> 1 ? ? ? ?a ? ? ?a1 ? ? 6 ? ? 66
> 2 ? ? ? ?a ? ? ?a2 ? ? 4 ? ? 24
> 3 ? ? ? ?a ? ? ?a3 ? ? 5 ? ? 25
> 4 ? ? ? ?b ? ? ?b1 ? ?13 ? ? 53
> 5 ? ? ? ?b ? ? ?b2 ? ?27 ? ? 87
> 6 ? ? ? ?c ? ? ?c1 ? ?11 ? ? 31
> 7 ? ? ? ?c ? ? ?c2 ? ?39 ? ? 99
> 8 ? ? ? ?c ? ? ?c3 ? ?15 ? ? 35
> 9 ? ? ? ?d ? ? ?d1 ? ?16 ? ? 36
> 10 ? ? ? d ? ? ?d2 ? ?17 ? ? 37
> 11 ? ? ? d ? ? ?d3 ? ?18 ? ? 38
> 12 ? ? ? d ? ? ?d4 ? ?39 ? ? 79
>>
>
>
> However, when I applied it to my real data, it failed. It seems that
> aggregate require more memory that I have currently (I am using WinXP,
> R2.8.0, 2GB RAM).
>
> Basically I want to perform aggregate sum on my numeric fields (in the
above
> case, n1 and n2) based on condition of v1 and v2.
>
> Problem is, I have a lot more of than just two numerics and conditioning
> fields.
>
> In SQL, I would do:
> select v1, v2, sum(n1), sum(n2) from myData
> group by v1, v2;
>
> Am I using a wrong function / library (or even wrong approach)? If so, can
> you suggest which one?
>
> Any pointer is really appreciated.
>
> Thanks,
>
> Ferry
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

hadley wickham

2009-Mar-20 04:15 UTC

head link

[R] how to make aggregation in R ?

On Thu, Mar 19, 2009 at 8:40 PM, jim holtman <jholtman at gmail.com>
wrote:> Try this technique. ?I use it with large data objects since it is
> sometime faster, and uses less memory, by using indices:
>
> x <- read.table(textConnection(" ?v1 v2 n1 n2
> 1 ? a a1 ?1 21
> 2 ? a a1 ?2 22
> 3 ? a a1 ?3 23
> 4 ? a a2 ?4 24
> 5 ? a a3 ?5 25
> 6 ? b b1 ?6 26
> 7 ? b b1 ?7 27
> 8 ? b b2 ?8 28
> 9 ? b b2 ?9 29
> 10 ?b b2 10 30
> 11 ?c c1 11 31
> 12 ?c c2 12 32
> 13 ?c c2 13 33
> 14 ?c c2 14 34
> 15 ?c c3 15 35
> 16 ?d d1 16 36
> 17 ?d d2 17 37
> 18 ?d d3 18 38
> 19 ?d d4 19 39
> 20 ?d d4 20 40"), header=TRUE)
> closeAllConnections()
> # use indices to reduce memory
> x.ind <- split(seq(nrow(x)), list(x$v1, x$v2), drop=TRUE)
> # now aggregate using the indices
> x.agg <- do.call(rbind, lapply(x.ind, function(.seg){
> ? ?data.frame(v1=x$v1[.seg[1]], v2=x$v2[.seg[1]],
> ? ? ? ?n1=sum(x$n1[.seg]), n2=sum(x$n2[.seg]))
> }))
This is basically the approach that the plyr package,
http://had.co.nz/plyr, uses behind a user-friendly interface.

Hadley


-- 
http://had.co.nz/

Apparently Analagous Threads

Search for more reasonably related threads

R help - Mar 2009 - how to make aggregation in R ?

[R] how to make aggregation in R ?

[R] how to make aggregation in R ?

[R] how to make aggregation in R ?

[R] how to make aggregation in R ?

[R] how to make aggregation in R ?

Apparently Analagous Threads