Hi R-help, I am trying to find a way to select five highest values in data frame according some variable. I will demonstrate: c X1 X2 1 1 1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1 7 8 1 8 9 1 9 10 1 10 11 2 11 12 2 12 13 2 13 14 2 14 15 2 15 16 2 16 17 2 17 18 2 18 19 2 19 20 2 20 21 2 21 22 2 22 23 2 23 24 2 24 25 2 25 So I would like to select a rows with higest values of X2 inside X1. Expected result should be: X1 X2 1 10 1 9 1 8 1 7 1 6 2 25 2 24 2 23 2 22 2 21 I first oreded the data frame using c=c[with(c,order(X1,-X2)),] but I need a help to select highes five. It is easy to select when I have just 2 unique values of X1 but what is if I have 500 unique values in X1? Thanks Andrija [[alternative HTML version deleted]]
try this:> do.call(rbind, lapply(split(x, x$X1), function(.grp){+ .ord <- .grp[order(.grp$X2, decreasing = TRUE),] + .ord[seq(min(5, nrow(.grp))),] + })) X1 X2 1.10 1 10 1.9 1 9 1.8 1 8 1.7 1 7 1.6 1 6 2.25 2 25 2.24 2 24 2.23 2 23 2.22 2 22 2.21 2 21 On Fri, Dec 10, 2010 at 9:18 AM, andrija djurovic <djandrija at gmail.com> wrote:> Hi R-help, > > > > I am trying to find a way to select five highest values in data frame > according some variable. I will demonstrate: > > c > > ? X1 X2 > > 1 ? 1 ?1 > > 2 ? 1 ?2 > > 3 ? 1 ?3 > > 4 ? 1 ?4 > > 5 ? 1 ?5 > > 6 ? 1 ?6 > > 7 ? 1 ?7 > > 8 ? 1 ?8 > > 9 ? 1 ?9 > > 10 ?1 10 > > 11 ?2 11 > > 12 ?2 12 > > 13 ?2 13 > > 14 ?2 14 > > 15 ?2 15 > > 16 ?2 16 > > 17 ?2 17 > > 18 ?2 18 > > 19 ?2 19 > > 20 ?2 20 > > 21 ?2 21 > > 22 ?2 22 > > 23 ?2 23 > > 24 ?2 24 > > 25 ?2 25 > > > > So I would like to select a rows with higest values of X2 inside X1. > Expected result should be: > > X1 X2 > > ? 1 ?10 > > ? 1 ?9 > > ? 1 ?8 > > ? 1 ?7 > > ? 1 ?6 > > ? 2 ?25 > > ? 2 ?24 > > ? 2 ?23 > > ? 2 ?22 > > ? 2 ?21 > > > > I first oreded the data frame using > > c=c[with(c,order(X1,-X2)),] > > but I need a help to select highes five. It is easy to select when I have > just 2 unique values of X1 but what is if I have 500 unique values in X1? > > > > Thanks > > Andrija > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
Andrija, You should be able to extract the data that you want using a call like this (AD substituted for your c) with(AD, tapply(X2, X1, function(x) sort(x, dec=T)[1:5])) That returns a list like this: $`1` [1] 10 9 8 7 6 $`2` [1] 25 24 23 22 21 Just package it the way that you want. Dave From: andrija djurovic <djandrija@gmail.com> To: r-help@r-project.org Date: 12/10/2010 08:21 AM Subject: [R] (no subject) Sent by: r-help-bounces@r-project.org Hi R-help, I am trying to find a way to select five highest values in data frame according some variable. I will demonstrate: c X1 X2 1 1 1 2 1 2 3 1 3 4 1 4 5 1 5 6 1 6 7 1 7 8 1 8 9 1 9 10 1 10 11 2 11 12 2 12 13 2 13 14 2 14 15 2 15 16 2 16 17 2 17 18 2 18 19 2 19 20 2 20 21 2 21 22 2 22 23 2 23 24 2 24 25 2 25 So I would like to select a rows with higest values of X2 inside X1. Expected result should be: X1 X2 1 10 1 9 1 8 1 7 1 6 2 25 2 24 2 23 2 22 2 21 I first oreded the data frame using c=c[with(c,order(X1,-X2)),] but I need a help to select highes five. It is easy to select when I have just 2 unique values of X1 but what is if I have 500 unique values in X1? Thanks Andrija [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hi: Here's a plyr solution: library(plyr) dg <- data.frame(x1 = rep(1:2, c(10, 15)), x2 = 1:25) f <- function(x) head(rev(sort(x)), 5)> ddply(dg, 'x1', summarise, x2 = f(x2))x1 x2 1 1 10 2 1 9 3 1 8 4 1 7 5 1 6 6 2 25 7 2 24 8 2 23 9 2 22 10 2 21 HTH, Dennis On Fri, Dec 10, 2010 at 6:18 AM, andrija djurovic <djandrija@gmail.com>wrote:> Hi R-help, > > > > I am trying to find a way to select five highest values in data frame > according some variable. I will demonstrate: > > c > > X1 X2 > > 1 1 1 > > 2 1 2 > > 3 1 3 > > 4 1 4 > > 5 1 5 > > 6 1 6 > > 7 1 7 > > 8 1 8 > > 9 1 9 > > 10 1 10 > > 11 2 11 > > 12 2 12 > > 13 2 13 > > 14 2 14 > > 15 2 15 > > 16 2 16 > > 17 2 17 > > 18 2 18 > > 19 2 19 > > 20 2 20 > > 21 2 21 > > 22 2 22 > > 23 2 23 > > 24 2 24 > > 25 2 25 > > > > So I would like to select a rows with higest values of X2 inside X1. > Expected result should be: > > X1 X2 > > 1 10 > > 1 9 > > 1 8 > > 1 7 > > 1 6 > > 2 25 > > 2 24 > > 2 23 > > 2 22 > > 2 21 > > > > I first oreded the data frame using > > c=c[with(c,order(X1,-X2)),] > > but I need a help to select highes five. It is easy to select when I have > just 2 unique values of X1 but what is if I have 500 unique values in X1? > > > > Thanks > > Andrija > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]