thr3ads.net - R help - [R] Collapsing data frame; aggregate() or better function? [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Tobin, Jared

2007-Sep-13 20:20 UTC

[R] Collapsing data frame; aggregate() or better function?

Hello r-help,

I am trying to collapse or aggregate 'some' of a data frame.  A very
simplified version of my data frame looks like:
> tester  trip set num sex lfs1 lfs2
1  313  15   5   M    2    3
2  313  15   3   F    1    2
3  313  17   1   M    0    1
4  313  17   2   F    1    1
5  313  17   1   U    1    0

And I want to omit sex from the picture and just get an addition of num,
lfs1, and lfs2 for each unique trip/set combination.  Using aggregate()
works fine here,
> test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum)
> test  trip set num lfs1 lfs2
1  313  15   8    3    5
2  313  17   4    2    2 

But I'm having trouble getting the same function to work on my actual
data frame which is considerably larger.
> dim(lf1.turbot)
[1] 16468   217> test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],sum)
Error in vector("list", prod(extent)) : vector size specified is too
large
In addition: Warning messages:
1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
one) 
2: NAs produced by integer overflow in: group + ngroup *
(as.integer(index) - one) 
3: NAs produced by integer overflow in: ngroup * nlevels(index) 

I'm guessing that either aggregate() can't handle a data frame of this
size OR that there is an issue with 'omitting' more than one variable
(in the same way I've omitted sex in the above example).  Can anyone
clarify and/or recommend any relatively simple alternative procedure to
accomplish this?

I plan on trying variants of by() and tapply() tomorrow morning, but I'm
about to head home for the day.

Thanks,

--

jared tobin, student research assistant
fisheries and oceans canada
tobinjr at dfo-mpo.gc.ca

jim holtman

2007-Sep-13 21:18 UTC

head link

[R] Collapsing data frame; aggregate() or better function?

The second argument for aggregate is supposed to be a list, so try
(notice the missing comma before "1:8"):

test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[1:8],sum)


On 9/13/07, Tobin, Jared <TobinJR at dfo-mpo.gc.ca>
wrote:> Hello r-help,
>
> I am trying to collapse or aggregate 'some' of a data frame.  A
very
> simplified version of my data frame looks like:
>
> > tester
>  trip set num sex lfs1 lfs2
> 1  313  15   5   M    2    3
> 2  313  15   3   F    1    2
> 3  313  17   1   M    0    1
> 4  313  17   2   F    1    1
> 5  313  17   1   U    1    0
>
> And I want to omit sex from the picture and just get an addition of num,
> lfs1, and lfs2 for each unique trip/set combination.  Using aggregate()
> works fine here,
>
> > test <- aggregate(tester[,c(3,5:6)], tester[,1:2], sum)
> > test
>  trip set num lfs1 lfs2
> 1  313  15   8    3    5
> 2  313  17   4    2    2
>
> But I'm having trouble getting the same function to work on my actual
> data frame which is considerably larger.
>
> > dim(lf1.turbot)
> [1] 16468   217
> > test <- aggregate(lf1.turbot[,c(11, 12, 17:217)], lf1.turbot[,1:8],
> sum)
> Error in vector("list", prod(extent)) : vector size specified is
too
> large
> In addition: Warning messages:
> 1: NAs produced by integer overflow in: ngroup * (as.integer(index) -
> one)
> 2: NAs produced by integer overflow in: group + ngroup *
> (as.integer(index) - one)
> 3: NAs produced by integer overflow in: ngroup * nlevels(index)
>
> I'm guessing that either aggregate() can't handle a data frame of
this
> size OR that there is an issue with 'omitting' more than one
variable
> (in the same way I've omitted sex in the above example).  Can anyone
> clarify and/or recommend any relatively simple alternative procedure to
> accomplish this?
>
> I plan on trying variants of by() and tapply() tomorrow morning, but
I'm
> about to head home for the day.
>
> Thanks,
>
> --
>
> jared tobin, student research assistant
> fisheries and oceans canada
> tobinjr at dfo-mpo.gc.ca
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Sep 2007 - Collapsing data frame; aggregate() or better function?

[R] Collapsing data frame; aggregate() or better function?

[R] Collapsing data frame; aggregate() or better function?

Seemingly Similar Threads