Hi, is there an R function like sql's TOP key word? I have a dataframe that has 3 columns: company, person, salary How do I get top 5 highest paid person for each company, and if I have fewer than 5 people for a company, just return all of them? Thanks, Richard [[alternative HTML version deleted]]
RICHARD M. HEIBERGER
2010-Sep-16 15:53 UTC
[R] get top n rows group by a column from a dataframe
> tmp <- data.frame(matrix(rnorm(30), 10, 3,dimnames=list(letters[1:10], c("company", "person", "salary"))))> tmpcompany person salary a -1.04590176 -0.7841855 1.07150503 b -1.06643101 0.6545647 0.43920454 c 0.72894531 -1.3812867 0.41313659 d -0.39265263 -0.3871271 0.69404325 e 0.54028124 0.7124772 0.66630904 f -1.46931714 -0.3823353 0.03069797 g -0.33283666 -0.6351862 0.37920017 h -0.79977129 0.2605315 0.92373900 i 0.80614119 0.3727227 -1.16560563 j 0.03165012 0.4690400 -0.81966285> order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))][1] 5 2 10 9 8> tmp[order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))],]company person salary e 0.54028124 0.7124772 0.6663090 b -1.06643101 0.6545647 0.4392045 j 0.03165012 0.4690400 -0.8196628 i 0.80614119 0.3727227 -1.1656056 h -0.79977129 0.2605315 0.9237390 You can easily write a function for that. top <- function(DF, varname, howmany) {} On Thu, Sep 16, 2010 at 11:39 AM, Tan, Richard <RTan@panagora.com> wrote:> Hi, is there an R function like sql's TOP key word? > > I have a dataframe that has 3 columns: company, person, salary > > How do I get top 5 highest paid person for each company, and if I have > fewer than 5 people for a company, just return all of them? > > Thanks, > > Richard >[[alternative HTML version deleted]]
William Dunlap
2010-Sep-16 18:18 UTC
[R] get top n rows group by a column from a dataframe
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Tan, Richard > Sent: Thursday, September 16, 2010 8:39 AM > To: r-help at r-project.org > Subject: [R] get top n rows group by a column from a dataframe > > Hi, is there an R function like sql's TOP key word? > > > > I have a dataframe that has 3 columns: company, person, salary > > > > How do I get top 5 highest paid person for each company, and if I have > fewer than 5 people for a company, just return all of them?You could use ave() to assign a within-company ranking to each person and later extract the rows with the ranks you want (or sort by rank, etc.):> set.seed(1) > DF <-data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE), Person=LETTERS[1:18],Salary=runif(18)*1e5)> DF <- within(DF, rank <- ave(Salary, Company,FUN=function(x)rev(order(x))))> DFCompany Person Salary rank 1 Ford A 38003.518 4 2 Toyota B 77744.522 2 3 Toyota C 93470.523 1 4 GM D 21214.252 6 5 Ford E 65167.377 2 6 GM F 12555.510 7 7 GM G 26722.067 5 8 Toyota H 38611.409 5 9 Toyota I 1339.033 3 10 Ford J 38238.796 3 11 Ford K 86969.085 1 12 Ford L 34034.900 5 13 GM M 48208.012 4 14 Toyota N 59956.583 6 15 GM O 49354.131 3 16 Toyota P 18621.760 4 17 GM Q 82737.332 1 18 GM R 66846.674 2> subset(DF, rank==1)Company Person Salary rank 3 Toyota C 93470.52 1 11 Ford K 86969.08 1 17 GM Q 82737.33 1> subset(DF, rank<=5)Company Person Salary rank 1 Ford A 38003.518 4 2 Toyota B 77744.522 2 3 Toyota C 93470.523 1 5 Ford E 65167.377 2 7 GM G 26722.067 5 8 Toyota H 38611.409 5 9 Toyota I 1339.033 3 10 Ford J 38238.796 3 11 Ford K 86969.085 1 12 Ford L 34034.900 5 13 GM M 48208.012 4 15 GM O 49354.131 3 16 Toyota P 18621.760 4 17 GM Q 82737.332 1 18 GM R 66846.674 2 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > > > Thanks, > > Richard > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi: You already have some good solutions; here is another using the plyr package, based on a slight modification in salary from Phil Spector's test data: sdata = data.frame(company=sample(LETTERS[1:8],1000,replace=TRUE), person=1:1000, salary= trunc(rnorm(1000, mean = 50000, sd = 10000))) library(plyr) # Function to pick out the top salaries f <- function(df) { if(nrow(df) == 0L) return(NULL) d <- df[order(df['salary']), ] tail(d, min(nrow(d), 5)) } topguys <- ddply(sdata, 'company', f) This gives you both the top five salaries and the people who earned them, which might be helpful (or not, if confidentiality is a concern, in which case you put -2 in the column entry of d in the function f. HTH, Dennis On Thu, Sep 16, 2010 at 8:39 AM, Tan, Richard <RTan@panagora.com> wrote:> Hi, is there an R function like sql's TOP key word? > > > > I have a dataframe that has 3 columns: company, person, salary > > > > How do I get top 5 highest paid person for each company, and if I have > fewer than 5 people for a company, just return all of them? > > > > Thanks, > > Richard > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]