I have> dfquantity branch client date name 1 10 1 1 2010-01-01 one 2 20 2 1 2010-01-01 one 3 30 3 2 2010-01-01 two 4 15 4 1 2010-01-01 one 5 10 5 2 2010-01-01 two 6 20 6 3 2010-01-01 three 7 1000 1 1 2011-01-01 one 8 2000 2 1 2011-01-01 one 9 3000 3 2 2011-01-01 two 10 1500 4 1 2011-01-01 one 11 1000 5 2 2011-01-01 two 12 2000 6 3 2011-01-01 three I want to aggregate away the branch. I followed a suggestion by Gabor (thanks) and did> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum)client date quantity 1 1 2010-01-01 45 2 2 2010-01-01 40 3 3 2010-01-01 20 4 1 2011-01-01 4500 5 2 2011-01-01 4000 6 3 2011-01-01 2000 I want df$name also in the output and did what looked obvious:> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum)client date name quantity 1 1 2010-01-01 one 45 2 1 2011-01-01 one 4500 3 3 2010-01-01 three 20 4 3 2011-01-01 three 2000 5 2 2010-01-01 two 40 6 2 2011-01-01 two 4000 It seems to work, but slows down tremendously for a dataframe with around a 1000 rows. Could anyone explain what is going on and suggest a way out? Thanks.
If you have a large data frame, one option is package data.table. Try the
following:
library(data.table)
dt <- data.table(df)
dt[, list(qsum = sum(quantity)), by ='client, date, name']
client date name qsum
[1,] 1 2010-01-01 one 45
[2,] 1 2011-01-01 one 4500
[3,] 2 2010-01-01 two 40
[4,] 2 2011-01-01 two 4000
[5,] 3 2010-01-01 three 20
[6,] 3 2011-01-01 three 2000
BTW, the leading comma after the opening bracket is not a typo :)
For R versions >= 2.11.x, aggregate() has a formula interface that saves a
fair bit of typing:
aggregate(quantity ~ client + date + name, data = df, FUN = sum)
client date name quantity
1 1 2010-01-01 one 45
2 1 2011-01-01 one 4500
3 3 2010-01-01 three 20
4 3 2011-01-01 three 2000
5 2 2010-01-01 two 40
6 2 2011-01-01 two 4000
A third option is package plyr and function ddply():
library(plyr)
ddply(df, .(client, date, name), summarise, qsum = sum(quantity))
# same output as data.table
Hopefully one or more of these will improve your processing time.
Dennis
On Wed, Jan 26, 2011 at 2:39 AM, analyst41@hotmail.com <
analyst41@hotmail.com> wrote:
> I have
>
> > df
> quantity branch client date name
> 1 10 1 1 2010-01-01 one
> 2 20 2 1 2010-01-01 one
> 3 30 3 2 2010-01-01 two
> 4 15 4 1 2010-01-01 one
> 5 10 5 2 2010-01-01 two
> 6 20 6 3 2010-01-01 three
> 7 1000 1 1 2011-01-01 one
> 8 2000 2 1 2011-01-01 one
> 9 3000 3 2 2011-01-01 two
> 10 1500 4 1 2011-01-01 one
> 11 1000 5 2 2011-01-01 two
> 12 2000 6 3 2011-01-01 three
>
> I want to aggregate away the branch. I followed a suggestion by Gabor
> (thanks) and did
>
> >
>
aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum)
> client date quantity
> 1 1 2010-01-01 45
> 2 2 2010-01-01 40
> 3 3 2010-01-01 20
> 4 1 2011-01-01 4500
> 5 2 2011-01-01 4000
> 6 3 2011-01-01 2000
>
> I want df$name also in the output and did what looked obvious:
>
> >
>
aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum)
> client date name quantity
> 1 1 2010-01-01 one 45
> 2 1 2011-01-01 one 4500
> 3 3 2010-01-01 three 20
> 4 3 2011-01-01 three 2000
> 5 2 2010-01-01 two 40
> 6 2 2011-01-01 two 4000
>
> It seems to work, but slows down tremendously for a dataframe with
> around a 1000 rows.
>
> Could anyone explain what is going on and suggest a way out?
>
> Thanks.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Try this: unique(transform(df, quantity = ave(quantity, client, date, name, FUN sum), branch = NULL)) On Wed, Jan 26, 2011 at 8:39 AM, analyst41@hotmail.com < analyst41@hotmail.com> wrote:> I have > > > df > quantity branch client date name > 1 10 1 1 2010-01-01 one > 2 20 2 1 2010-01-01 one > 3 30 3 2 2010-01-01 two > 4 15 4 1 2010-01-01 one > 5 10 5 2 2010-01-01 two > 6 20 6 3 2010-01-01 three > 7 1000 1 1 2011-01-01 one > 8 2000 2 1 2011-01-01 one > 9 3000 3 2 2011-01-01 two > 10 1500 4 1 2011-01-01 one > 11 1000 5 2 2011-01-01 two > 12 2000 6 3 2011-01-01 three > > I want to aggregate away the branch. I followed a suggestion by Gabor > (thanks) and did > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum) > client date quantity > 1 1 2010-01-01 45 > 2 2 2010-01-01 40 > 3 3 2010-01-01 20 > 4 1 2011-01-01 4500 > 5 2 2011-01-01 4000 > 6 3 2011-01-01 2000 > > I want df$name also in the output and did what looked obvious: > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum) > client date name quantity > 1 1 2010-01-01 one 45 > 2 1 2011-01-01 one 4500 > 3 3 2010-01-01 three 20 > 4 3 2011-01-01 three 2000 > 5 2 2010-01-01 two 40 > 6 2 2011-01-01 two 4000 > > It seems to work, but slows down tremendously for a dataframe with > around a 1000 rows. > > Could anyone explain what is going on and suggest a way out? > > Thanks. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]