Hopefully this is an easy problem...
I'm trying to add a partitioned rank column to a data frame where the
rank is calculated separately across a partition by categories, the
way you could easily do in SQL. I found this solution in the archives
that looked like it might work:
http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
The example has a data frame with several car companies, and employee
salaries within them. A column is then added to the data.frame which
should give the descending rank for each employee, partitioned by
company.
But when I implemented it, the results weren't the expected rankings.
What am I doing wrong?
set.seed(1)
DF <-
data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))
# Then checking each category manually
DF[DF$Company == "Ford",]
DF[DF$Company == "GM",]
DF[DF$Company == "Toyota",]
# My results show that it works for Ford and GM, but not
Toyota> DF[DF$Company == "Ford",]
Company Person Salary rank
1 Ford A 38003.52 4
5 Ford E 65167.38 2
10 Ford J 38238.80 3
11 Ford K 86969.08 1
12 Ford L 34034.90 5> DF[DF$Company == "GM",]
Company Person Salary rank
4 GM D 21214.25 6
6 GM F 12555.51 7
7 GM G 26722.07 5
13 GM M 48208.01 4
15 GM O 49354.13 3
17 GM Q 82737.33 1
18 GM R 66846.67 2> DF[DF$Company == "Toyota",]
Company Person Salary rank
2 Toyota B 77744.522 2
3 Toyota C 93470.523 1
8 Toyota H 38611.409 5
9 Toyota I 1339.033 3
14 Toyota N 59956.583 6
16 Toyota P 18621.760 4
For reference, I'm using R 2.11.1 on a Windows 7 machine.
Can anyone provide insight into how I am implementing this
incorrectly, or give an alternate way to add such a partitioned rank
column to a data frame?
Thanks in advance,
Brigid
Hi, Couldn't find any problems if I understand what you mean. A.K.> DF <-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x)))) > DF?? Company Person??? Salary rank 1?????? GM????? A? 7067.905??? 4 2???? Ford????? B? 9946.616??? 3 3?????? GM????? C 31627.171??? 8 4?? Toyota????? D 51863.426??? 1 5?????? GM????? E 66200.508??? 7 6?? Toyota????? F 40683.019??? 4 7?????? GM????? G 91287.592??? 3 8?? Toyota????? H 29360.337??? 2 9?? Toyota????? I 45906.573??? 7 10????? GM????? J 33239.467??? 6 11??? Ford????? K 65087.047??? 2 12? Toyota????? L 25801.678??? 3 13????? GM????? M 47854.525??? 5 14????? GM????? N 76631.067??? 2 15? Toyota????? O? 8424.691??? 5 16????? GM????? P 87532.133??? 1 17? Toyota????? Q 33907.294??? 6 18??? Ford????? R 83944.035??? 1> DF[DF$Company=="Ford",]?? Company Person??? Salary rank 2???? Ford????? B? 9946.616??? 3 11??? Ford????? K 65087.047??? 2 18??? Ford????? R 83944.035??? 1> DF[DF$Company=="Toyota",]?? Company Person??? Salary rank 4?? Toyota????? D 51863.426??? 1 6?? Toyota????? F 40683.019??? 4 8?? Toyota????? H 29360.337??? 2 9?? Toyota????? I 45906.573??? 7 12? Toyota????? L 25801.678??? 3 15? Toyota????? O? 8424.691??? 5 17? Toyota????? Q 33907.294??? 6> DF[DF$Company=="GM",]?? Company Person??? Salary rank 1?????? GM????? A? 7067.905??? 4 3?????? GM????? C 31627.171??? 8 5?????? GM????? E 66200.508??? 7 7?????? GM????? G 91287.592??? 3 10????? GM????? J 33239.467??? 6 13????? GM????? M 47854.525??? 5 14????? GM????? N 76631.067??? 2 16????? GM????? P 87532.133??? 1 ----- Original Message ----- From: Brigid Mooney <bkmooney at gmail.com> To: r-help at r-project.org Cc: Sent: Friday, June 1, 2012 11:23 AM Subject: [R] Add rank column to data frame as in SQL... Hopefully this is an easy problem... I'm trying to add a partitioned rank column to a data frame where the rank is calculated separately across a partition by categories, the way you could easily do in SQL.? I found this solution in the archives that looked like it might work: http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html The example has a data frame with several car companies, and employee salaries within them.? A column is then added to the data.frame which should give the descending rank for each employee, partitioned by company. But when I implemented it, the results weren't the expected rankings. What am I doing wrong? set.seed(1) DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE), Person=LETTERS[1:18],Salary=runif(18)*1e5) DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x)))) # Then checking each category manually DF[DF$Company == "Ford",] DF[DF$Company == "GM",] DF[DF$Company == "Toyota",] # My results show that it works for Ford and GM, but not Toyota> DF[DF$Company == "Ford",]? Company Person? Salary rank 1? ? Ford? ? ? A 38003.52? ? 4 5? ? Ford? ? ? E 65167.38? ? 2 10? ? Ford? ? ? J 38238.80? ? 3 11? ? Ford? ? ? K 86969.08? ? 1 12? ? Ford? ? ? L 34034.90? ? 5> DF[DF$Company == "GM",]? Company Person? Salary rank 4? ? ? GM? ? ? D 21214.25? ? 6 6? ? ? GM? ? ? F 12555.51? ? 7 7? ? ? GM? ? ? G 26722.07? ? 5 13? ? ? GM? ? ? M 48208.01? ? 4 15? ? ? GM? ? ? O 49354.13? ? 3 17? ? ? GM? ? ? Q 82737.33? ? 1 18? ? ? GM? ? ? R 66846.67? ? 2> DF[DF$Company == "Toyota",]? Company Person? ? Salary rank 2? Toyota? ? ? B 77744.522? ? 2 3? Toyota? ? ? C 93470.523? ? 1 8? Toyota? ? ? H 38611.409? ? 5 9? Toyota? ? ? I? 1339.033? ? 3 14? Toyota? ? ? N 59956.583? ? 6 16? Toyota? ? ? P 18621.760? ? 4 For reference, I'm using R 2.11.1 on a Windows 7 machine. Can anyone provide insight into how I am implementing this incorrectly, or give an alternate way to add such a partitioned rank column to a data frame? Thanks in advance, Brigid ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Try using the rank function instead of the order function.
rank(x) is order(order(x)) if there are no ties.
Since you want reverse ranks do either rank(-x) or length(x)+1-rank(x).
E.g.,
> DF <- within(DF, rank2 <- ave(Salary, Company,
FUN=function(x)rank(-x)))
> DF[DF$Company=="Toyota",]
Company Person Salary rank rank2
2 Toyota B 77744.522 2 2
3 Toyota C 93470.523 1 1
8 Toyota H 38611.409 5 4
9 Toyota I 1339.033 3 6
14 Toyota N 59956.583 6 3
16 Toyota P 18621.760 4 5
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Brigid Mooney
> Sent: Friday, June 01, 2012 5:23 PM
> To: r-help at r-project.org
> Subject: [R] Add rank column to data frame as in SQL...
>
> Hopefully this is an easy problem...
>
> I'm trying to add a partitioned rank column to a data frame where the
> rank is calculated separately across a partition by categories, the
> way you could easily do in SQL. I found this solution in the archives
> that looked like it might work:
>
> http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
>
> The example has a data frame with several car companies, and employee
> salaries within them. A column is then added to the data.frame which
> should give the descending rank for each employee, partitioned by
> company.
>
>
> But when I implemented it, the results weren't the expected rankings.
> What am I doing wrong?
>
> set.seed(1)
> DF <-
data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
> Person=LETTERS[1:18],Salary=runif(18)*1e5)
> DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))
>
> # Then checking each category manually
> DF[DF$Company == "Ford",]
> DF[DF$Company == "GM",]
> DF[DF$Company == "Toyota",]
>
> # My results show that it works for Ford and GM, but not Toyota
> > DF[DF$Company == "Ford",]
> Company Person Salary rank
> 1 Ford A 38003.52 4
> 5 Ford E 65167.38 2
> 10 Ford J 38238.80 3
> 11 Ford K 86969.08 1
> 12 Ford L 34034.90 5
> > DF[DF$Company == "GM",]
> Company Person Salary rank
> 4 GM D 21214.25 6
> 6 GM F 12555.51 7
> 7 GM G 26722.07 5
> 13 GM M 48208.01 4
> 15 GM O 49354.13 3
> 17 GM Q 82737.33 1
> 18 GM R 66846.67 2
> > DF[DF$Company == "Toyota",]
> Company Person Salary rank
> 2 Toyota B 77744.522 2
> 3 Toyota C 93470.523 1
> 8 Toyota H 38611.409 5
> 9 Toyota I 1339.033 3
> 14 Toyota N 59956.583 6
> 16 Toyota P 18621.760 4
>
>
> For reference, I'm using R 2.11.1 on a Windows 7 machine.
>
> Can anyone provide insight into how I am implementing this
> incorrectly, or give an alternate way to add such a partitioned rank
> column to a data frame?
>
> Thanks in advance,
> Brigid
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
On Fri, Jun 1, 2012 at 11:23 AM, Brigid Mooney <bkmooney at gmail.com> wrote:> Hopefully this is an easy problem... > > I'm trying to add a partitioned rank column to a data frame where the > rank is calculated separately across a partition by categories, the > way you could easily do in SQL. ?I found this solution in the archives > that looked like it might work: > and provide commented, minimal, self-contained, reproducible code.See the sqldf home page (http://sqldf.googlecode.com) and in particular this part: https://code.google.com/p/sqldf/#12._How_does_one_use_sqldf_with_PostgreSQL? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com