Hi All, I'm looking for ways to compute aggregate statistics (with the aggregate function) but with an option for sorting and selecting a subset of the data frame. For example, I have would like to turn this : aggregate(myDataframe$TargetValue,list(SomeFactor myDataframe$SomeFactor),mean) into something like aggregate(myDataframe$TargetValue,list(SomeFactor myDataframe$SomeFactor),mean, sort=DESCENDING, subset=0.33) where sort would sort TargetValue per factor level and subset would be (for example) a value between 0 and 1. The example above would give me the mean for the top third of TargetValue per factor. Any way of doing this without having to use temporary variables to stuff my vectors, use length(), etc ? -- View this message in context: http://www.nabble.com/Partial-aggregate-on-sorted-data-tf4683988.html#a13384556 Sent from the R help mailing list archive at Nabble.com.
Is this something like you want:> set.seed(1) > test <- data.frame(value=runif(100), fact=sample(LETTERS[1:5], 100, TRUE)) > result <- tapply(test$value, test$fact, function(x, sort, subset){+ x <- x[order(x, decreasing=(sort == "DECENDING"))] + mean(head(x, length(x) * subset)) + }, sort="DECENDING", subset=.33)> resultA B C D E 0.8302502 0.8583468 0.7461504 0.7594074 0.9143997 On 10/24/07, Yves Moisan <ymoisan at groupesm.com> wrote:> > Hi All, > > I'm looking for ways to compute aggregate statistics (with the aggregate > function) but with an option for sorting and selecting a subset of the data > frame. For example, I have would like to turn this : > > aggregate(myDataframe$TargetValue,list(SomeFactor > myDataframe$SomeFactor),mean) > > into something like > > aggregate(myDataframe$TargetValue,list(SomeFactor > myDataframe$SomeFactor),mean, sort=DESCENDING, subset=0.33) > > where sort would sort TargetValue per factor level and subset would be (for > example) a value between 0 and 1. The example above would give me the mean > for the top third of TargetValue per factor. > > Any way of doing this without having to use temporary variables to stuff my > vectors, use length(), etc ? > -- > View this message in context: http://www.nabble.com/Partial-aggregate-on-sorted-data-tf4683988.html#a13384556 > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Yves, why not: aggregate(myDataframe$TargetValue,list(SomeFactor myDataframe$SomeFactor),function(x) mean(x[x>quantile(x,.66)])) Op Wed, 24 Oct 2007 15:30:10 +0200 schreef Yves Moisan <ymoisan at groupesm.com>:> > Hi All, > > I'm looking for ways to compute aggregate statistics (with the aggregate > function) but with an option for sorting and selecting a subset of the > data > frame. For example, I have would like to turn this : > > aggregate(myDataframe$TargetValue,list(SomeFactor > myDataframe$SomeFactor),mean) > > into something like > > aggregate(myDataframe$TargetValue,list(SomeFactor > myDataframe$SomeFactor),mean, sort=DESCENDING, subset=0.33) > > where sort would sort TargetValue per factor level and subset would be > (for > example) a value between 0 and 1. The example above would give me the > mean > for the top third of TargetValue per factor. > > Any way of doing this without having to use temporary variables to stuff > my > vectors, use length(), etc ?-- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
>why not: >aggregate(myDataframe$TargetValue,list(SomeFactor myDataframe$SomeFactor),function(x) mean(x[x>quantile(x,.66)]))Great stuff. Just what I was looking for ! Thanx a lot !! -- View this message in context: http://www.nabble.com/Partial-aggregate-on-sorted-data-tf4683988.html#a13393223 Sent from the R help mailing list archive at Nabble.com.