I have> dfquantity branch client date name 1 10 1 1 2010-01-01 one 2 20 2 1 2010-01-01 one 3 30 3 2 2010-01-01 two 4 15 4 1 2010-01-01 one 5 10 5 2 2010-01-01 two 6 20 6 3 2010-01-01 three 7 1000 1 1 2011-01-01 one 8 2000 2 1 2011-01-01 one 9 3000 3 2 2011-01-01 two 10 1500 4 1 2011-01-01 one 11 1000 5 2 2011-01-01 two 12 2000 6 3 2011-01-01 three I want to aggregate away the branch. I followed a suggestion by Gabor (thanks) and did> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum)client date quantity 1 1 2010-01-01 45 2 2 2010-01-01 40 3 3 2010-01-01 20 4 1 2011-01-01 4500 5 2 2011-01-01 4000 6 3 2011-01-01 2000 I want df$name also in the output and did what looked obvious:> aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum)client date name quantity 1 1 2010-01-01 one 45 2 1 2011-01-01 one 4500 3 3 2010-01-01 three 20 4 3 2011-01-01 three 2000 5 2 2010-01-01 two 40 6 2 2011-01-01 two 4000 It seems to work, but slows down tremendously for a dataframe with around a 1000 rows. Could anyone explain what is going on and suggest a way out? Thanks.
If you have a large data frame, one option is package data.table. Try the following: library(data.table) dt <- data.table(df) dt[, list(qsum = sum(quantity)), by ='client, date, name'] client date name qsum [1,] 1 2010-01-01 one 45 [2,] 1 2011-01-01 one 4500 [3,] 2 2010-01-01 two 40 [4,] 2 2011-01-01 two 4000 [5,] 3 2010-01-01 three 20 [6,] 3 2011-01-01 three 2000 BTW, the leading comma after the opening bracket is not a typo :) For R versions >= 2.11.x, aggregate() has a formula interface that saves a fair bit of typing: aggregate(quantity ~ client + date + name, data = df, FUN = sum) client date name quantity 1 1 2010-01-01 one 45 2 1 2011-01-01 one 4500 3 3 2010-01-01 three 20 4 3 2011-01-01 three 2000 5 2 2010-01-01 two 40 6 2 2011-01-01 two 4000 A third option is package plyr and function ddply(): library(plyr) ddply(df, .(client, date, name), summarise, qsum = sum(quantity)) # same output as data.table Hopefully one or more of these will improve your processing time. Dennis On Wed, Jan 26, 2011 at 2:39 AM, analyst41@hotmail.com < analyst41@hotmail.com> wrote:> I have > > > df > quantity branch client date name > 1 10 1 1 2010-01-01 one > 2 20 2 1 2010-01-01 one > 3 30 3 2 2010-01-01 two > 4 15 4 1 2010-01-01 one > 5 10 5 2 2010-01-01 two > 6 20 6 3 2010-01-01 three > 7 1000 1 1 2011-01-01 one > 8 2000 2 1 2011-01-01 one > 9 3000 3 2 2011-01-01 two > 10 1500 4 1 2011-01-01 one > 11 1000 5 2 2011-01-01 two > 12 2000 6 3 2011-01-01 three > > I want to aggregate away the branch. I followed a suggestion by Gabor > (thanks) and did > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum) > client date quantity > 1 1 2010-01-01 45 > 2 2 2010-01-01 40 > 3 3 2010-01-01 20 > 4 1 2011-01-01 4500 > 5 2 2011-01-01 4000 > 6 3 2011-01-01 2000 > > I want df$name also in the output and did what looked obvious: > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum) > client date name quantity > 1 1 2010-01-01 one 45 > 2 1 2011-01-01 one 4500 > 3 3 2010-01-01 three 20 > 4 3 2011-01-01 three 2000 > 5 2 2010-01-01 two 40 > 6 2 2011-01-01 two 4000 > > It seems to work, but slows down tremendously for a dataframe with > around a 1000 rows. > > Could anyone explain what is going on and suggest a way out? > > Thanks. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Try this: unique(transform(df, quantity = ave(quantity, client, date, name, FUN sum), branch = NULL)) On Wed, Jan 26, 2011 at 8:39 AM, analyst41@hotmail.com < analyst41@hotmail.com> wrote:> I have > > > df > quantity branch client date name > 1 10 1 1 2010-01-01 one > 2 20 2 1 2010-01-01 one > 3 30 3 2 2010-01-01 two > 4 15 4 1 2010-01-01 one > 5 10 5 2 2010-01-01 two > 6 20 6 3 2010-01-01 three > 7 1000 1 1 2011-01-01 one > 8 2000 2 1 2011-01-01 one > 9 3000 3 2 2011-01-01 two > 10 1500 4 1 2011-01-01 one > 11 1000 5 2 2011-01-01 two > 12 2000 6 3 2011-01-01 three > > I want to aggregate away the branch. I followed a suggestion by Gabor > (thanks) and did > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date),sum) > client date quantity > 1 1 2010-01-01 45 > 2 2 2010-01-01 40 > 3 3 2010-01-01 20 > 4 1 2011-01-01 4500 > 5 2 2011-01-01 4000 > 6 3 2011-01-01 2000 > > I want df$name also in the output and did what looked obvious: > > > > aggregate(list(quantity=df$quantity),list(client=df$client,date=df$date,name=df$name),sum) > client date name quantity > 1 1 2010-01-01 one 45 > 2 1 2011-01-01 one 4500 > 3 3 2010-01-01 three 20 > 4 3 2011-01-01 three 2000 > 5 2 2010-01-01 two 40 > 6 2 2011-01-01 two 4000 > > It seems to work, but slows down tremendously for a dataframe with > around a 1000 rows. > > Could anyone explain what is going on and suggest a way out? > > Thanks. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]