Hopefully this is an easy problem... I'm trying to add a partitioned rank column to a data frame where the rank is calculated separately across a partition by categories, the way you could easily do in SQL. I found this solution in the archives that looked like it might work: http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html The example has a data frame with several car companies, and employee salaries within them. A column is then added to the data.frame which should give the descending rank for each employee, partitioned by company. But when I implemented it, the results weren't the expected rankings. What am I doing wrong? set.seed(1) DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE), Person=LETTERS[1:18],Salary=runif(18)*1e5) DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x)))) # Then checking each category manually DF[DF$Company == "Ford",] DF[DF$Company == "GM",] DF[DF$Company == "Toyota",] # My results show that it works for Ford and GM, but not Toyota> DF[DF$Company == "Ford",]Company Person Salary rank 1 Ford A 38003.52 4 5 Ford E 65167.38 2 10 Ford J 38238.80 3 11 Ford K 86969.08 1 12 Ford L 34034.90 5> DF[DF$Company == "GM",]Company Person Salary rank 4 GM D 21214.25 6 6 GM F 12555.51 7 7 GM G 26722.07 5 13 GM M 48208.01 4 15 GM O 49354.13 3 17 GM Q 82737.33 1 18 GM R 66846.67 2> DF[DF$Company == "Toyota",]Company Person Salary rank 2 Toyota B 77744.522 2 3 Toyota C 93470.523 1 8 Toyota H 38611.409 5 9 Toyota I 1339.033 3 14 Toyota N 59956.583 6 16 Toyota P 18621.760 4 For reference, I'm using R 2.11.1 on a Windows 7 machine. Can anyone provide insight into how I am implementing this incorrectly, or give an alternate way to add such a partitioned rank column to a data frame? Thanks in advance, Brigid
Hi, Couldn't find any problems if I understand what you mean. A.K.> DF <-within(DF,rank<-ave(Salary,Company,FUN=function(x)rev(order(x)))) > DF?? Company Person??? Salary rank 1?????? GM????? A? 7067.905??? 4 2???? Ford????? B? 9946.616??? 3 3?????? GM????? C 31627.171??? 8 4?? Toyota????? D 51863.426??? 1 5?????? GM????? E 66200.508??? 7 6?? Toyota????? F 40683.019??? 4 7?????? GM????? G 91287.592??? 3 8?? Toyota????? H 29360.337??? 2 9?? Toyota????? I 45906.573??? 7 10????? GM????? J 33239.467??? 6 11??? Ford????? K 65087.047??? 2 12? Toyota????? L 25801.678??? 3 13????? GM????? M 47854.525??? 5 14????? GM????? N 76631.067??? 2 15? Toyota????? O? 8424.691??? 5 16????? GM????? P 87532.133??? 1 17? Toyota????? Q 33907.294??? 6 18??? Ford????? R 83944.035??? 1> DF[DF$Company=="Ford",]?? Company Person??? Salary rank 2???? Ford????? B? 9946.616??? 3 11??? Ford????? K 65087.047??? 2 18??? Ford????? R 83944.035??? 1> DF[DF$Company=="Toyota",]?? Company Person??? Salary rank 4?? Toyota????? D 51863.426??? 1 6?? Toyota????? F 40683.019??? 4 8?? Toyota????? H 29360.337??? 2 9?? Toyota????? I 45906.573??? 7 12? Toyota????? L 25801.678??? 3 15? Toyota????? O? 8424.691??? 5 17? Toyota????? Q 33907.294??? 6> DF[DF$Company=="GM",]?? Company Person??? Salary rank 1?????? GM????? A? 7067.905??? 4 3?????? GM????? C 31627.171??? 8 5?????? GM????? E 66200.508??? 7 7?????? GM????? G 91287.592??? 3 10????? GM????? J 33239.467??? 6 13????? GM????? M 47854.525??? 5 14????? GM????? N 76631.067??? 2 16????? GM????? P 87532.133??? 1 ----- Original Message ----- From: Brigid Mooney <bkmooney at gmail.com> To: r-help at r-project.org Cc: Sent: Friday, June 1, 2012 11:23 AM Subject: [R] Add rank column to data frame as in SQL... Hopefully this is an easy problem... I'm trying to add a partitioned rank column to a data frame where the rank is calculated separately across a partition by categories, the way you could easily do in SQL.? I found this solution in the archives that looked like it might work: http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html The example has a data frame with several car companies, and employee salaries within them.? A column is then added to the data.frame which should give the descending rank for each employee, partitioned by company. But when I implemented it, the results weren't the expected rankings. What am I doing wrong? set.seed(1) DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE), Person=LETTERS[1:18],Salary=runif(18)*1e5) DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x)))) # Then checking each category manually DF[DF$Company == "Ford",] DF[DF$Company == "GM",] DF[DF$Company == "Toyota",] # My results show that it works for Ford and GM, but not Toyota> DF[DF$Company == "Ford",]? Company Person? Salary rank 1? ? Ford? ? ? A 38003.52? ? 4 5? ? Ford? ? ? E 65167.38? ? 2 10? ? Ford? ? ? J 38238.80? ? 3 11? ? Ford? ? ? K 86969.08? ? 1 12? ? Ford? ? ? L 34034.90? ? 5> DF[DF$Company == "GM",]? Company Person? Salary rank 4? ? ? GM? ? ? D 21214.25? ? 6 6? ? ? GM? ? ? F 12555.51? ? 7 7? ? ? GM? ? ? G 26722.07? ? 5 13? ? ? GM? ? ? M 48208.01? ? 4 15? ? ? GM? ? ? O 49354.13? ? 3 17? ? ? GM? ? ? Q 82737.33? ? 1 18? ? ? GM? ? ? R 66846.67? ? 2> DF[DF$Company == "Toyota",]? Company Person? ? Salary rank 2? Toyota? ? ? B 77744.522? ? 2 3? Toyota? ? ? C 93470.523? ? 1 8? Toyota? ? ? H 38611.409? ? 5 9? Toyota? ? ? I? 1339.033? ? 3 14? Toyota? ? ? N 59956.583? ? 6 16? Toyota? ? ? P 18621.760? ? 4 For reference, I'm using R 2.11.1 on a Windows 7 machine. Can anyone provide insight into how I am implementing this incorrectly, or give an alternate way to add such a partitioned rank column to a data frame? Thanks in advance, Brigid ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Try using the rank function instead of the order function. rank(x) is order(order(x)) if there are no ties. Since you want reverse ranks do either rank(-x) or length(x)+1-rank(x). E.g., > DF <- within(DF, rank2 <- ave(Salary, Company, FUN=function(x)rank(-x))) > DF[DF$Company=="Toyota",] Company Person Salary rank rank2 2 Toyota B 77744.522 2 2 3 Toyota C 93470.523 1 1 8 Toyota H 38611.409 5 4 9 Toyota I 1339.033 3 6 14 Toyota N 59956.583 6 3 16 Toyota P 18621.760 4 5 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Brigid Mooney > Sent: Friday, June 01, 2012 5:23 PM > To: r-help at r-project.org > Subject: [R] Add rank column to data frame as in SQL... > > Hopefully this is an easy problem... > > I'm trying to add a partitioned rank column to a data frame where the > rank is calculated separately across a partition by categories, the > way you could easily do in SQL. I found this solution in the archives > that looked like it might work: > > http://tolstoy.newcastle.edu.au/R/e11/help/10/09/8675.html > > The example has a data frame with several car companies, and employee > salaries within them. A column is then added to the data.frame which > should give the descending rank for each employee, partitioned by > company. > > > But when I implemented it, the results weren't the expected rankings. > What am I doing wrong? > > set.seed(1) > DF <- data.frame(Company=sample(c("Ford","Toyota","GM"),size=18,replace=TRUE), > Person=LETTERS[1:18],Salary=runif(18)*1e5) > DF <- within(DF, rank <- ave(Salary, Company, FUN=function(x)rev(order(x)))) > > # Then checking each category manually > DF[DF$Company == "Ford",] > DF[DF$Company == "GM",] > DF[DF$Company == "Toyota",] > > # My results show that it works for Ford and GM, but not Toyota > > DF[DF$Company == "Ford",] > Company Person Salary rank > 1 Ford A 38003.52 4 > 5 Ford E 65167.38 2 > 10 Ford J 38238.80 3 > 11 Ford K 86969.08 1 > 12 Ford L 34034.90 5 > > DF[DF$Company == "GM",] > Company Person Salary rank > 4 GM D 21214.25 6 > 6 GM F 12555.51 7 > 7 GM G 26722.07 5 > 13 GM M 48208.01 4 > 15 GM O 49354.13 3 > 17 GM Q 82737.33 1 > 18 GM R 66846.67 2 > > DF[DF$Company == "Toyota",] > Company Person Salary rank > 2 Toyota B 77744.522 2 > 3 Toyota C 93470.523 1 > 8 Toyota H 38611.409 5 > 9 Toyota I 1339.033 3 > 14 Toyota N 59956.583 6 > 16 Toyota P 18621.760 4 > > > For reference, I'm using R 2.11.1 on a Windows 7 machine. > > Can anyone provide insight into how I am implementing this > incorrectly, or give an alternate way to add such a partitioned rank > column to a data frame? > > Thanks in advance, > Brigid > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, Jun 1, 2012 at 11:23 AM, Brigid Mooney <bkmooney at gmail.com> wrote:> Hopefully this is an easy problem... > > I'm trying to add a partitioned rank column to a data frame where the > rank is calculated separately across a partition by categories, the > way you could easily do in SQL. ?I found this solution in the archives > that looked like it might work: > and provide commented, minimal, self-contained, reproducible code.See the sqldf home page (http://sqldf.googlecode.com) and in particular this part: https://code.google.com/p/sqldf/#12._How_does_one_use_sqldf_with_PostgreSQL? -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com