Tobin, Jared
2007-Sep-13 20:20 UTC
[R] Collapsing data frame; aggregate() or better function?
Hello r-help, I am trying to collapse or aggregate 'some' of a data frame. A very simplified version of my data frame looks like:> testertrip set num sex lfs1 lfs2 1 313 15 5 M 2 3 2 313 15 3 F 1 2 3 313 17 1 M 0 1 4 313 17 2 F 1 1 5 313 17 1 U 1 0 And I want to omit sex from the picture and just get an addition of num, lfs1, and lfs2 for each unique trip/set combination. Using aggregate() works fine here,> test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum) > testtrip set num lfs1 lfs2 1 313 15 8 3 5 2 313 17 4 2 2 But I'm having trouble getting the same function to work on my actual data frame which is considerably larger.> dim(lf1.turbot)[1] 16468 217> test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],sum) Error in vector("list", prod(extent)) : vector size specified is too large In addition: Warning messages: 1: NAs produced by integer overflow in: ngroup * (as.integer(index) - one) 2: NAs produced by integer overflow in: group + ngroup * (as.integer(index) - one) 3: NAs produced by integer overflow in: ngroup * nlevels(index) I'm guessing that either aggregate() can't handle a data frame of this size OR that there is an issue with 'omitting' more than one variable (in the same way I've omitted sex in the above example). Can anyone clarify and/or recommend any relatively simple alternative procedure to accomplish this? I plan on trying variants of by() and tapply() tomorrow morning, but I'm about to head home for the day. Thanks, -- jared tobin, student research assistant fisheries and oceans canada tobinjr at dfo-mpo.gc.ca
jim holtman
2007-Sep-13 21:18 UTC
[R] Collapsing data frame; aggregate() or better function?
The second argument for aggregate is supposed to be a list, so try (notice the missing comma before "1:8"): test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum) On 9/13/07, Tobin, Jared <TobinJR at dfo-mpo.gc.ca> wrote:> Hello r-help, > > I am trying to collapse or aggregate 'some' of a data frame. A very > simplified version of my data frame looks like: > > > tester > trip set num sex lfs1 lfs2 > 1 313 15 5 M 2 3 > 2 313 15 3 F 1 2 > 3 313 17 1 M 0 1 > 4 313 17 2 F 1 1 > 5 313 17 1 U 1 0 > > And I want to omit sex from the picture and just get an addition of num, > lfs1, and lfs2 for each unique trip/set combination. Using aggregate() > works fine here, > > > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum) > > test > trip set num lfs1 lfs2 > 1 313 15 8 3 5 > 2 313 17 4 2 2 > > But I'm having trouble getting the same function to work on my actual > data frame which is considerably larger. > > > dim(lf1.turbot) > [1] 16468 217 > > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8], > sum) > Error in vector("list", prod(extent)) : vector size specified is too > large > In addition: Warning messages: > 1: NAs produced by integer overflow in: ngroup * (as.integer(index) - > one) > 2: NAs produced by integer overflow in: group + ngroup * > (as.integer(index) - one) > 3: NAs produced by integer overflow in: ngroup * nlevels(index) > > I'm guessing that either aggregate() can't handle a data frame of this > size OR that there is an issue with 'omitting' more than one variable > (in the same way I've omitted sex in the above example). Can anyone > clarify and/or recommend any relatively simple alternative procedure to > accomplish this? > > I plan on trying variants of by() and tapply() tomorrow morning, but I'm > about to head home for the day. > > Thanks, > > -- > > jared tobin, student research assistant > fisheries and oceans canada > tobinjr at dfo-mpo.gc.ca > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?