thr3ads.net - R help - [R] get top n rows group by a column from a dataframe [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Tan, Richard

2010-Sep-16 15:39 UTC

[R] get top n rows group by a column from a dataframe

Hi, is there an R function like sql's TOP key word?

 

I have a dataframe that has 3 columns: company, person, salary

 

How do I get top 5 highest paid person for each company, and if I have
fewer than 5 people for a company, just return all of them?

 

Thanks,

Richard

 


	[[alternative HTML version deleted]]

RICHARD M. HEIBERGER

2010-Sep-16 15:53 UTC

head link

[R] get top n rows group by a column from a dataframe

> tmp <- data.frame(matrix(rnorm(30), 10, 3,                           dimnames=list(letters[1:10],
                                         c("company",
"person", "salary"))))> tmp      company     person      salary
a -1.04590176 -0.7841855  1.07150503
b -1.06643101  0.6545647  0.43920454
c  0.72894531 -1.3812867  0.41313659
d -0.39265263 -0.3871271  0.69404325
e  0.54028124  0.7124772  0.66630904
f -1.46931714 -0.3823353  0.03069797
g -0.33283666 -0.6351862  0.37920017
h -0.79977129  0.2605315  0.92373900
i  0.80614119  0.3727227 -1.16560563
j  0.03165012  0.4690400 -0.81966285> order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))]
[1]  5  2 10  9  8> tmp[order(tmp$person, decreasing=TRUE)[1:min(5, length(tmp$person))],]      company    person     salary
e  0.54028124 0.7124772  0.6663090
b -1.06643101 0.6545647  0.4392045
j  0.03165012 0.4690400 -0.8196628
i  0.80614119 0.3727227 -1.1656056
h -0.79977129 0.2605315  0.9237390

You can easily write a function for that.
top <- function(DF, varname, howmany) {}


On Thu, Sep 16, 2010 at 11:39 AM, Tan, Richard <RTan@panagora.com> wrote:
> Hi, is there an R function like sql's TOP key word?
>
> I have a dataframe that has 3 columns: company, person, salary
>
> How do I get top 5 highest paid person for each company, and if I have
> fewer than 5 people for a company, just return all of them?
>
> Thanks,
>
> Richard
>
	[[alternative HTML version deleted]]

William Dunlap

2010-Sep-16 18:18 UTC

head link

[R] get top n rows group by a column from a dataframe

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tan, Richard
> Sent: Thursday, September 16, 2010 8:39 AM
> To: r-help at r-project.org
> Subject: [R] get top n rows group by a column from a dataframe
> 
> Hi, is there an R function like sql's TOP key word?
> 
>  
> 
> I have a dataframe that has 3 columns: company, person, salary
> 
>  
> 
> How do I get top 5 highest paid person for each company, and if I have
> fewer than 5 people for a company, just return all of them?
You could use ave() to assign a within-company ranking
to each person and later extract the rows with the ranks
you want (or sort by rank, etc.):
> set.seed(1)
> DF <-data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)> DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))> DF   Company Person    Salary rank
1     Ford      A 38003.518    4
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
4       GM      D 21214.252    6
5     Ford      E 65167.377    2
6       GM      F 12555.510    7
7       GM      G 26722.067    5
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
10    Ford      J 38238.796    3
11    Ford      K 86969.085    1
12    Ford      L 34034.900    5
13      GM      M 48208.012    4
14  Toyota      N 59956.583    6
15      GM      O 49354.131    3
16  Toyota      P 18621.760    4
17      GM      Q 82737.332    1
18      GM      R 66846.674    2> subset(DF, rank==1)   Company Person   Salary rank
3   Toyota      C 93470.52    1
11    Ford      K 86969.08    1
17      GM      Q 82737.33    1> subset(DF, rank<=5)   Company Person    Salary rank
1     Ford      A 38003.518    4
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
5     Ford      E 65167.377    2
7       GM      G 26722.067    5
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
10    Ford      J 38238.796    3
11    Ford      K 86969.085    1
12    Ford      L 34034.900    5
13      GM      M 48208.012    4
15      GM      O 49354.131    3
16  Toyota      P 18621.760    4
17      GM      Q 82737.332    1
18      GM      R 66846.674    2

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com > 
>  
> 
> Thanks,
> 
> Richard
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Dennis Murphy

2010-Sep-16 19:27 UTC

head link

[R] get top n rows group by a column from a dataframe

Hi:

You already have some good solutions; here is another using the plyr
package, based on a slight modification in salary from Phil Spector's test
data:

sdata = data.frame(company=sample(LETTERS[1:8],1000,replace=TRUE),
                   person=1:1000,
                   salary= trunc(rnorm(1000, mean = 50000, sd = 10000)))
library(plyr)

# Function to pick out the top salaries
f <- function(df) {
     if(nrow(df) == 0L) return(NULL)
     d <- df[order(df['salary']), ]
     tail(d, min(nrow(d), 5))
    }
topguys <- ddply(sdata, 'company', f)

This gives you both the top five salaries and the people who earned them,
which might be helpful (or not, if confidentiality is a concern, in which
case you put -2 in the column entry of d in the function f.

HTH,
Dennis


On Thu, Sep 16, 2010 at 8:39 AM, Tan, Richard <RTan@panagora.com> wrote:
> Hi, is there an R function like sql's TOP key word?
>
>
>
> I have a dataframe that has 3 columns: company, person, salary
>
>
>
> How do I get top 5 highest paid person for each company, and if I have
> fewer than 5 people for a company, just return all of them?
>
>
>
> Thanks,
>
> Richard
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more possibly parallel threads

R help - Sep 2010 - get top n rows group by a column from a dataframe

[R] get top n rows group by a column from a dataframe

[R] get top n rows group by a column from a dataframe

[R] get top n rows group by a column from a dataframe

[R] get top n rows group by a column from a dataframe

Maybe Matching Threads