Chris Conner wrote on 09/26/2011 11:33:05 AM:>
> Help-Rs
>
> As someone who is newer to R and trying to make the transition from
> Access into R, there is a frequetnly used function that I'd like to
> try and duplicate in the R world. It involved creating an aggregate
> table of the top (n) orders for an item by sum of cost over a
> select period of time.
>
> So, take the following example :
>
> group <- c(rep(1,10), rep(2,10), rep(3,10))
> product <- c(rep("itema", 4), rep("itemb", 2),
rep("itemc", 1), rep
> ("itemc" , 3), rep("itema", 4), rep("itemb",
2), rep("itemc", 1),
> rep("itemc", 3),rep("itema", 4), rep("itemb",
2), rep("itemc", 1),
> rep("itemc", 3))
> cost <- round (rnorm(30, mean = 100, sd = 30), 2)
> DF <- data.frame(group, product, cost)
> agglist <- list(DF$product, DF$group)
> col1<- aggregate(DF [,3], by = agglist, sum)
> col2<- aggregate(aggDF [,3], by = agglist, length)
> (table <- cbind(col1, col2))
>
> My question would be, how about if you wanted a table that retained
> only the top 1 product (e.g., item c for group 2) by group... or for
> that matter the top n=2 or n=3 or n=5? While with this example DF
> the answer would be easy to find, I'm dealing with millions of orders.
> THX!
> Chris
This may not be the most elegant way, but it works.
group <- rep(1:3, rep(10, 3))
product <- c("itema", "itemb",
"itemc")[rep(rep(1:3, c(4, 2, 4)), 3)]
cost <- round(rnorm(30, mean=100, sd=30), 2)
DF <- data.frame(group, product, cost)
# create a data frame with total cost
totcost <- aggregate(data.frame(cost.sum=DF$cost), by=DF[,
c("product",
"group")], sum)
totcost$n <- aggregate(DF$cost, by=DF[, c("product",
"group")], length)$x
totcost$rank <- unlist(tapply(totcost$cost.sum, totcost$group, rank))
# rank the products by total cost within a group
totcost.ordered <- totcost[order(totcost$group, totcost$rank), ]
# to see the top 2 ranked products by cost (the two cheapest)
totcost.ordered[totcost.ordered$rank <= 2, ]
If you want the most expensive (instead of the cheapest), redefine rank
as:
totcost$rank <- unlist(tapply(-totcost$cost.sum, totcost$group, rank))
Jean
[[alternative HTML version deleted]]