thr3ads.net - R help - [R] Add rank column to data frame as in SQL... [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Brigid Mooney

2012-Jun-01 15:23 UTC

[R] Add rank column to data frame as in SQL...

Hopefully this is an easy problem...

I'm trying to add a partitioned rank column to a data frame where the
rank is calculated separately across a partition by categories, the
way you could easily do in SQL.  I found this solution in the archives
that looked like it might work:

http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html

The example has a data frame with several car companies, and employee
salaries within them.  A column is then added to the data.frame which
should give the descending rank for each employee, partitioned by
company.


But when I implemented it, the results weren't the expected rankings.
 What am I doing wrong?

set.seed(1)
DF <-
data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))

# Then checking each category manually
DF[DF$Company == "Ford",]
DF[DF$Company == "GM",]
DF[DF$Company == "Toyota",]

# My results show that it works for Ford and GM, but not
Toyota> DF[DF$Company == "Ford",]   Company Person   Salary rank
1     Ford      A 38003.52    4
5     Ford      E 65167.38    2
10    Ford      J 38238.80    3
11    Ford      K 86969.08    1
12    Ford      L 34034.90    5> DF[DF$Company == "GM",]   Company Person   Salary rank
4       GM      D 21214.25    6
6       GM      F 12555.51    7
7       GM      G 26722.07    5
13      GM      M 48208.01    4
15      GM      O 49354.13    3
17      GM      Q 82737.33    1
18      GM      R 66846.67    2> DF[DF$Company == "Toyota",]   Company Person    Salary rank
2   Toyota      B 77744.522    2
3   Toyota      C 93470.523    1
8   Toyota      H 38611.409    5
9   Toyota      I  1339.033    3
14  Toyota      N 59956.583    6
16  Toyota      P 18621.760    4


For reference, I'm using R 2.11.1 on a Windows 7 machine.

Can anyone provide insight into how I am implementing this
incorrectly, or give an alternate way to add such a partitioned rank
column to a data frame?

Thanks in advance,
Brigid

arun

2012-Jun-01 17:48 UTC

head link

[R] Add rank column to data frame as in SQL...

Hi,

Couldn't find any problems if I understand what you mean.

A.K.


> DF
<-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x))))
> DF?? Company Person??? Salary rank
1?????? GM????? A? 7067.905??? 4
2???? Ford????? B? 9946.616??? 3
3?????? GM????? C 31627.171??? 8
4?? Toyota????? D 51863.426??? 1
5?????? GM????? E 66200.508??? 7
6?? Toyota????? F 40683.019??? 4
7?????? GM????? G 91287.592??? 3
8?? Toyota????? H 29360.337??? 2
9?? Toyota????? I 45906.573??? 7
10????? GM????? J 33239.467??? 6
11??? Ford????? K 65087.047??? 2
12? Toyota????? L 25801.678??? 3
13????? GM????? M 47854.525??? 5
14????? GM????? N 76631.067??? 2
15? Toyota????? O? 8424.691??? 5
16????? GM????? P 87532.133??? 1
17? Toyota????? Q 33907.294??? 6
18??? Ford????? R 83944.035??? 1> DF[DF$Company=="Ford",]?? Company Person??? Salary rank
2???? Ford????? B? 9946.616??? 3
11??? Ford????? K 65087.047??? 2
18??? Ford????? R 83944.035??? 1> DF[DF$Company=="Toyota",]?? Company Person??? Salary rank
4?? Toyota????? D 51863.426??? 1
6?? Toyota????? F 40683.019??? 4
8?? Toyota????? H 29360.337??? 2
9?? Toyota????? I 45906.573??? 7
12? Toyota????? L 25801.678??? 3
15? Toyota????? O? 8424.691??? 5
17? Toyota????? Q 33907.294??? 6> DF[DF$Company=="GM",]?? Company Person??? Salary rank
1?????? GM????? A? 7067.905??? 4
3?????? GM????? C 31627.171??? 8
5?????? GM????? E 66200.508??? 7
7?????? GM????? G 91287.592??? 3
10????? GM????? J 33239.467??? 6
13????? GM????? M 47854.525??? 5
14????? GM????? N 76631.067??? 2
16????? GM????? P 87532.133??? 1




----- Original Message -----
From: Brigid Mooney <bkmooney at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, June 1, 2012 11:23 AM
Subject: [R] Add rank column to data frame as in SQL...

Hopefully this is an easy problem...

I'm trying to add a partitioned rank column to a data frame where the
rank is calculated separately across a partition by categories, the
way you could easily do in SQL.? I found this solution in the archives
that looked like it might work:

http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html

The example has a data frame with several car companies, and employee
salaries within them.? A column is then added to the data.frame which
should give the descending rank for each employee, partitioned by
company.


But when I implemented it, the results weren't the expected rankings.
What am I doing wrong?

set.seed(1)
DF <-
data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
Person=LETTERS[1:18],Salary=runif(18)*1e5)
DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))

# Then checking each category manually
DF[DF$Company == "Ford",]
DF[DF$Company == "GM",]
DF[DF$Company == "Toyota",]

# My results show that it works for Ford and GM, but not
Toyota> DF[DF$Company == "Ford",]?  Company Person?  Salary rank
1? ?  Ford? ? ? A 38003.52? ? 4
5? ?  Ford? ? ? E 65167.38? ? 2
10? ? Ford? ? ? J 38238.80? ? 3
11? ? Ford? ? ? K 86969.08? ? 1
12? ? Ford? ? ? L 34034.90? ? 5> DF[DF$Company == "GM",]?  Company Person?  Salary rank
4? ? ?  GM? ? ? D 21214.25? ? 6
6? ? ?  GM? ? ? F 12555.51? ? 7
7? ? ?  GM? ? ? G 26722.07? ? 5
13? ? ? GM? ? ? M 48208.01? ? 4
15? ? ? GM? ? ? O 49354.13? ? 3
17? ? ? GM? ? ? Q 82737.33? ? 1
18? ? ? GM? ? ? R 66846.67? ? 2> DF[DF$Company == "Toyota",]?  Company Person? ? Salary rank
2?  Toyota? ? ? B 77744.522? ? 2
3?  Toyota? ? ? C 93470.523? ? 1
8?  Toyota? ? ? H 38611.409? ? 5
9?  Toyota? ? ? I? 1339.033? ? 3
14? Toyota? ? ? N 59956.583? ? 6
16? Toyota? ? ? P 18621.760? ? 4


For reference, I'm using R 2.11.1 on a Windows 7 machine.

Can anyone provide insight into how I am implementing this
incorrectly, or give an alternate way to add such a partitioned rank
column to a data frame?

Thanks in advance,
Brigid

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

William Dunlap

2012-Jun-01 20:59 UTC

head link

[R] Add rank column to data frame as in SQL...

Try using the rank function instead of the order function.
rank(x) is order(order(x)) if there are no ties.
Since you want reverse ranks do either rank(-x) or length(x)+1-rank(x).
E.g.,
  > DF <- within(DF, rank2 <- ave(Salary, Company,
FUN=function(x)rank(-x)))
  > DF[DF$Company=="Toyota",]
     Company Person    Salary rank rank2
  2   Toyota      B 77744.522    2     2
  3   Toyota      C 93470.523    1     1
  8   Toyota      H 38611.409    5     4
  9   Toyota      I  1339.033    3     6
  14  Toyota      N 59956.583    6     3
  16  Toyota      P 18621.760    4     5

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Brigid Mooney
> Sent: Friday, June 01, 2012 5:23 PM
> To: r-help at r-project.org
> Subject: [R] Add rank column to data frame as in SQL...
> 
> Hopefully this is an easy problem...
> 
> I'm trying to add a partitioned rank column to a data frame where the
> rank is calculated separately across a partition by categories, the
> way you could easily do in SQL.  I found this solution in the archives
> that looked like it might work:
> 
> http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html
> 
> The example has a data frame with several car companies, and employee
> salaries within them.  A column is then added to the data.frame which
> should give the descending rank for each employee, partitioned by
> company.
> 
> 
> But when I implemented it, the results weren't the expected rankings.
>  What am I doing wrong?
> 
> set.seed(1)
> DF <-
data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE),
> Person=LETTERS[1:18],Salary=runif(18)*1e5)
> DF <- within(DF, rank <- ave(Salary, Company,
FUN=function(x)rev(order(x))))
> 
> # Then checking each category manually
> DF[DF$Company == "Ford",]
> DF[DF$Company == "GM",]
> DF[DF$Company == "Toyota",]
> 
> # My results show that it works for Ford and GM, but not Toyota
> > DF[DF$Company == "Ford",]
>    Company Person   Salary rank
> 1     Ford      A 38003.52    4
> 5     Ford      E 65167.38    2
> 10    Ford      J 38238.80    3
> 11    Ford      K 86969.08    1
> 12    Ford      L 34034.90    5
> > DF[DF$Company == "GM",]
>    Company Person   Salary rank
> 4       GM      D 21214.25    6
> 6       GM      F 12555.51    7
> 7       GM      G 26722.07    5
> 13      GM      M 48208.01    4
> 15      GM      O 49354.13    3
> 17      GM      Q 82737.33    1
> 18      GM      R 66846.67    2
> > DF[DF$Company == "Toyota",]
>    Company Person    Salary rank
> 2   Toyota      B 77744.522    2
> 3   Toyota      C 93470.523    1
> 8   Toyota      H 38611.409    5
> 9   Toyota      I  1339.033    3
> 14  Toyota      N 59956.583    6
> 16  Toyota      P 18621.760    4
> 
> 
> For reference, I'm using R 2.11.1 on a Windows 7 machine.
> 
> Can anyone provide insight into how I am implementing this
> incorrectly, or give an alternate way to add such a partitioned rank
> column to a data frame?
> 
> Thanks in advance,
> Brigid
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2012-Jun-02 11:17 UTC

head link

[R] Add rank column to data frame as in SQL...

On Fri, Jun 1, 2012 at 11:23 AM, Brigid Mooney <bkmooney at gmail.com>
wrote:> Hopefully this is an easy problem...
>
> I'm trying to add a partitioned rank column to a data frame where the
> rank is calculated separately across a partition by categories, the
> way you could easily do in SQL. ?I found this solution in the archives
> that looked like it might work:
> and provide commented, minimal, self-contained, reproducible code.
See the sqldf home page (http://sqldf.googlecode.com) and in
particular this part:

https://code.google.com/p/sqldf/#12._How_does_one_use_sqldf_with_PostgreSQL?


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Jun 2012 - Add rank column to data frame as in SQL...

[R] Add rank column to data frame as in SQL...

[R] Add rank column to data frame as in SQL...

[R] Add rank column to data frame as in SQL...

[R] Add rank column to data frame as in SQL...

Apparently Analagous Threads