Hi, I have a data.frame which is ordered by score, and has a factor column: Browse[1]> wc[c("report","score")] report score 9 ADEA 0.96 8 ADEA 0.90 11 Asylum_FED9 0.86 3 ADEA 0.75 14 Asylum_FED9 0.60 5 ADEA 0.56 13 Asylum_FED9 0.51 16 Asylum_FED9 0.51 2 ADEA 0.42 7 ADEA 0.31 17 Asylum_FED9 0.27 1 ADEA 0.17 4 ADEA 0.17 6 ADEA 0.12 10 ADEA 0.11 12 Asylum_FED9 0.10 15 Asylum_FED9 0.09 18 Asylum_FED9 0.07 Browse[1]> I need to add a column indicating rank within each factor group, which I currently accomplish like so: wc$rank <- 0 for(report in as.character(unique(wc$report))) { wc[wc$report==report,]$rank <- 1:sum(wc$report==report) } I have to wonder whether there's a better way, something that gets rid of the for() loop using tapply() or by() or similar. But I haven't come up with anything. I've tried these: by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)}) by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <- 1:nrow(pr)}) But in both cases the effect of the assignment is lost, there's no $rank column generated for wc. Any suggestions? -Ken
Look at ?ave and try something like:> wc$rank <- ave( wc$score, wc$report, FUN=rank )This works even if the dataframe is not pre sorted. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ken Williams > Sent: Thursday, July 12, 2007 12:09 PM > To: R-help at stat.math.ethz.ch > Subject: [R] Compute rank within factor groups > > Hi, > > I have a data.frame which is ordered by score, and has a > factor column: > > Browse[1]> wc[c("report","score")] > report score > 9 ADEA 0.96 > 8 ADEA 0.90 > 11 Asylum_FED9 0.86 > 3 ADEA 0.75 > 14 Asylum_FED9 0.60 > 5 ADEA 0.56 > 13 Asylum_FED9 0.51 > 16 Asylum_FED9 0.51 > 2 ADEA 0.42 > 7 ADEA 0.31 > 17 Asylum_FED9 0.27 > 1 ADEA 0.17 > 4 ADEA 0.17 > 6 ADEA 0.12 > 10 ADEA 0.11 > 12 Asylum_FED9 0.10 > 15 Asylum_FED9 0.09 > 18 Asylum_FED9 0.07 > Browse[1]> > > I need to add a column indicating rank within each factor > group, which I currently accomplish like so: > > wc$rank <- 0 > for(report in as.character(unique(wc$report))) { > wc[wc$report==report,]$rank <- 1:sum(wc$report==report) > } > > I have to wonder whether there's a better way, something that > gets rid of the for() loop using tapply() or by() or similar. > But I haven't come up with anything. > > I've tried these: > > by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)}) > > by(wc, wc$report, FUN=function(pr){wc[wc$report %in% > pr$report,]$rank <- > 1:nrow(pr)}) > > But in both cases the effect of the assignment is lost, > there's no $rank column generated for wc. > > Any suggestions? > > -Ken > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Is this what you are looking for:> xreport score 9 ADEA 0.96 8 ADEA 0.90 11 Asylum_FED9 0.86 3 ADEA 0.75 14 Asylum_FED9 0.60 5 ADEA 0.56 13 Asylum_FED9 0.51 16 Asylum_FED9 0.51 2 ADEA 0.42 7 ADEA 0.31 17 Asylum_FED9 0.27 1 ADEA 0.17 4 ADEA 0.17 6 ADEA 0.12 10 ADEA 0.11 12 Asylum_FED9 0.10 15 Asylum_FED9 0.09 18 Asylum_FED9 0.07> x$rank <- ave(x$score, x$report, FUN=rank) > xreport score rank 9 ADEA 0.96 10.0 8 ADEA 0.90 9.0 11 Asylum_FED9 0.86 8.0 3 ADEA 0.75 8.0 14 Asylum_FED9 0.60 7.0 5 ADEA 0.56 7.0 13 Asylum_FED9 0.51 5.5 16 Asylum_FED9 0.51 5.5 2 ADEA 0.42 6.0 7 ADEA 0.31 5.0 17 Asylum_FED9 0.27 4.0 1 ADEA 0.17 3.5 4 ADEA 0.17 3.5 6 ADEA 0.12 2.0 10 ADEA 0.11 1.0 12 Asylum_FED9 0.10 3.0 15 Asylum_FED9 0.09 2.0 18 Asylum_FED9 0.07 1.0>On 7/12/07, Ken Williams <ken.williams at thomson.com> wrote:> Hi, > > I have a data.frame which is ordered by score, and has a factor column: > > Browse[1]> wc[c("report","score")] > report score > 9 ADEA 0.96 > 8 ADEA 0.90 > 11 Asylum_FED9 0.86 > 3 ADEA 0.75 > 14 Asylum_FED9 0.60 > 5 ADEA 0.56 > 13 Asylum_FED9 0.51 > 16 Asylum_FED9 0.51 > 2 ADEA 0.42 > 7 ADEA 0.31 > 17 Asylum_FED9 0.27 > 1 ADEA 0.17 > 4 ADEA 0.17 > 6 ADEA 0.12 > 10 ADEA 0.11 > 12 Asylum_FED9 0.10 > 15 Asylum_FED9 0.09 > 18 Asylum_FED9 0.07 > Browse[1]> > > I need to add a column indicating rank within each factor group, which I > currently accomplish like so: > > wc$rank <- 0 > for(report in as.character(unique(wc$report))) { > wc[wc$report==report,]$rank <- 1:sum(wc$report==report) > } > > I have to wonder whether there's a better way, something that gets rid of > the for() loop using tapply() or by() or similar. But I haven't come up > with anything. > > I've tried these: > > by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)}) > > by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <- > 1:nrow(pr)}) > > But in both cases the effect of the assignment is lost, there's no $rank > column generated for wc. > > Any suggestions? > > -Ken > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Why are you using order instead of rank? If the data is pre sorted then they tend to give the same result (unless there are ties), but if your data is not presorted, then the results will be different. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: Ken Williams [mailto:ken.williams at thomson.com] > Sent: Thursday, July 12, 2007 2:50 PM > To: Greg Snow; R-help at stat.math.ethz.ch > Subject: Re: [R] Compute rank within factor groups > > > > > On 7/12/07 3:42 PM, "Ken Williams" <ken.williams at thomson.com> wrote: > > > I ended up using: > > > > wc$rank <- ave( wc$score, wc$report, > > FUN=function(x) order(x, decreasing=TRUE) ) > > > > Which gives me the 1-based rank integers I was looking for. > > Of course, immediately after sending I realized a simpler way: > > wc$rank <- ave( -wc$score, wc$report, FUN=order ) > > And as a newbie I think I get to be blissfully ignorant of > which one is faster. =) > > > -- > Ken Williams > Research Scientist > The Thomson Corporation > Eagan, MN > >
Ken Williams wrote:> Hi, > > I have a data.frame which is ordered by score, and has a factor column: > > Browse[1]> wc[c("report","score")] > report score > 9 ADEA 0.96 > 8 ADEA 0.90 > 11 Asylum_FED9 0.86 > 3 ADEA 0.75 > 14 Asylum_FED9 0.60 > 5 ADEA 0.56 > 13 Asylum_FED9 0.51 > 16 Asylum_FED9 0.51 > 2 ADEA 0.42 > 7 ADEA 0.31 > 17 Asylum_FED9 0.27 > 1 ADEA 0.17 > 4 ADEA 0.17 > 6 ADEA 0.12 > 10 ADEA 0.11 > 12 Asylum_FED9 0.10 > 15 Asylum_FED9 0.09 > 18 Asylum_FED9 0.07 > Browse[1]> > > I need to add a column indicating rank within each factor group, which I > currently accomplish like so: > > wc$rank <- 0 > for(report in as.character(unique(wc$report))) { > wc[wc$report==report,]$rank <- 1:sum(wc$report==report) > } > > I have to wonder whether there's a better way, something that gets rid of > the for() loop using tapply() or by() or similar. But I haven't come up > with anything. > > I've tried these: > > by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)}) > > by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <- > 1:nrow(pr)}) > > But in both cases the effect of the assignment is lost, there's no $rank > column generated for wc. > > Any suggestions? >There's a little known and somewhat unfortunately named function called ave() which does just that sort of thing. > ave(wc$score, wc$report, FUN=rank) [1] 10.0 9.0 8.0 8.0 7.0 7.0 5.5 5.5 6.0 5.0 4.0 3.5 3.5 2.0 1.0 [16] 3.0 2.0 1.0
The order and rank functions do something of an inverse of each other. The rank function tells you the rank of each element of the vector (if the first element is the 2nd smallest, then the first element of the return in 2). The order function tells you what order to put the vector in to sort it (if the smallest element of the vector is in position 3, then the first element of the returned vector will be 3). Doing something like:> order(order(x))Is similar to rank, except in how it deals with ties. When the data is presorted, then rank and order both give you the same as seq(along=x) which is just the set of integers from 1 to the length of the vector. Most computations in R will switch between integer and double automatically, but if you really need a vector to be integer, then use the as.integer function. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: Ken Williams [mailto:ken.williams at thomson.com] > Sent: Thursday, July 12, 2007 3:50 PM > To: Greg Snow; R-help at stat.math.ethz.ch > Subject: Re: [R] Compute rank within factor groups > > > > > On 7/12/07 4:28 PM, "Greg Snow" > <Greg.Snow at intermountainmail.org> wrote: > > > Why are you using order instead of rank? > > > > If the data is pre sorted then they tend to give the same result > > (unless there are ties), but if your data is not presorted, > then the > > results will be different. > > Indeed, thanks for the catch. I switched to order because > rank was giving me floats instead of integers, which I now > see was probably because it defaults to ties.method=average > and I wanted ties.method=first. > > My data was indeed pre-sorted so I didn't notice the > difference (are they giving inverse permutations or > something? Can't quite follow...), but perhaps it won't always be. > > > -- > Ken Williams > Research Scientist > The Thomson Corporation > Eagan, MN > >