thr3ads.net - R help - [R] Compute rank within factor groups [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Ken Williams

2007-Jul-12 18:08 UTC

[R] Compute rank within factor groups

Hi,

I have a data.frame which is ordered by score, and has a factor column:

  Browse[1]> wc[c("report","score")]
          report score
  9         ADEA  0.96
  8         ADEA  0.90
  11 Asylum_FED9  0.86
  3         ADEA  0.75
  14 Asylum_FED9  0.60
  5         ADEA  0.56
  13 Asylum_FED9  0.51
  16 Asylum_FED9  0.51
  2         ADEA  0.42
  7         ADEA  0.31
  17 Asylum_FED9  0.27
  1         ADEA  0.17
  4         ADEA  0.17
  6         ADEA  0.12
  10        ADEA  0.11
  12 Asylum_FED9  0.10
  15 Asylum_FED9  0.09
  18 Asylum_FED9  0.07
  Browse[1]> 

I need to add a column indicating rank within each factor group, which I
currently accomplish like so:

  wc$rank <- 0
  for(report in as.character(unique(wc$report))) {
    wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
  }

I have to wonder whether there's a better way, something that gets rid of
the for() loop using tapply() or by() or similar.  But I haven't come up
with anything.

I've tried these:

  by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})

  by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank <-
1:nrow(pr)})

But in both cases the effect of the assignment is lost, there's no $rank
column generated for wc.

Any suggestions?

 -Ken

Greg Snow

2007-Jul-12 18:58 UTC

head link

[R] Compute rank within factor groups

Look at ?ave and try something like:
> wc$rank <- ave( wc$score, wc$report, FUN=rank )
This works even if the dataframe is not pre sorted.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ken Williams
> Sent: Thursday, July 12, 2007 12:09 PM
> To: R-help at stat.math.ethz.ch
> Subject: [R] Compute rank within factor groups
> 
> Hi,
> 
> I have a data.frame which is ordered by score, and has a 
> factor column:
> 
>   Browse[1]> wc[c("report","score")]
>           report score
>   9         ADEA  0.96
>   8         ADEA  0.90
>   11 Asylum_FED9  0.86
>   3         ADEA  0.75
>   14 Asylum_FED9  0.60
>   5         ADEA  0.56
>   13 Asylum_FED9  0.51
>   16 Asylum_FED9  0.51
>   2         ADEA  0.42
>   7         ADEA  0.31
>   17 Asylum_FED9  0.27
>   1         ADEA  0.17
>   4         ADEA  0.17
>   6         ADEA  0.12
>   10        ADEA  0.11
>   12 Asylum_FED9  0.10
>   15 Asylum_FED9  0.09
>   18 Asylum_FED9  0.07
>   Browse[1]> 
> 
> I need to add a column indicating rank within each factor 
> group, which I currently accomplish like so:
> 
>   wc$rank <- 0
>   for(report in as.character(unique(wc$report))) {
>     wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
>   }
> 
> I have to wonder whether there's a better way, something that 
> gets rid of the for() loop using tapply() or by() or similar. 
>  But I haven't come up with anything.
> 
> I've tried these:
> 
>   by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
> 
>   by(wc, wc$report, FUN=function(pr){wc[wc$report %in% 
> pr$report,]$rank <-
> 1:nrow(pr)})
> 
> But in both cases the effect of the assignment is lost, 
> there's no $rank column generated for wc.
> 
> Any suggestions?
> 
>  -Ken
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

jim holtman

2007-Jul-12 19:34 UTC

head link

[R] Compute rank within factor groups

Is this what you are looking for:
> x        report score
9         ADEA  0.96
8         ADEA  0.90
11 Asylum_FED9  0.86
3         ADEA  0.75
14 Asylum_FED9  0.60
5         ADEA  0.56
13 Asylum_FED9  0.51
16 Asylum_FED9  0.51
2         ADEA  0.42
7         ADEA  0.31
17 Asylum_FED9  0.27
1         ADEA  0.17
4         ADEA  0.17
6         ADEA  0.12
10        ADEA  0.11
12 Asylum_FED9  0.10
15 Asylum_FED9  0.09
18 Asylum_FED9  0.07> x$rank <- ave(x$score, x$report, FUN=rank)
> x        report score rank
9         ADEA  0.96 10.0
8         ADEA  0.90  9.0
11 Asylum_FED9  0.86  8.0
3         ADEA  0.75  8.0
14 Asylum_FED9  0.60  7.0
5         ADEA  0.56  7.0
13 Asylum_FED9  0.51  5.5
16 Asylum_FED9  0.51  5.5
2         ADEA  0.42  6.0
7         ADEA  0.31  5.0
17 Asylum_FED9  0.27  4.0
1         ADEA  0.17  3.5
4         ADEA  0.17  3.5
6         ADEA  0.12  2.0
10        ADEA  0.11  1.0
12 Asylum_FED9  0.10  3.0
15 Asylum_FED9  0.09  2.0
18 Asylum_FED9  0.07  1.0>

On 7/12/07, Ken Williams <ken.williams at thomson.com>
wrote:> Hi,
>
> I have a data.frame which is ordered by score, and has a factor column:
>
>  Browse[1]> wc[c("report","score")]
>          report score
>  9         ADEA  0.96
>  8         ADEA  0.90
>  11 Asylum_FED9  0.86
>  3         ADEA  0.75
>  14 Asylum_FED9  0.60
>  5         ADEA  0.56
>  13 Asylum_FED9  0.51
>  16 Asylum_FED9  0.51
>  2         ADEA  0.42
>  7         ADEA  0.31
>  17 Asylum_FED9  0.27
>  1         ADEA  0.17
>  4         ADEA  0.17
>  6         ADEA  0.12
>  10        ADEA  0.11
>  12 Asylum_FED9  0.10
>  15 Asylum_FED9  0.09
>  18 Asylum_FED9  0.07
>  Browse[1]>
>
> I need to add a column indicating rank within each factor group, which I
> currently accomplish like so:
>
>  wc$rank <- 0
>  for(report in as.character(unique(wc$report))) {
>    wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
>  }
>
> I have to wonder whether there's a better way, something that gets rid
of
> the for() loop using tapply() or by() or similar.  But I haven't come
up
> with anything.
>
> I've tried these:
>
>  by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
>
>  by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank
<-
> 1:nrow(pr)})
>
> But in both cases the effect of the assignment is lost, there's no
$rank
> column generated for wc.
>
> Any suggestions?
>
>  -Ken
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Greg Snow

2007-Jul-12 21:28 UTC

head link

[R] Compute rank within factor groups

Why are you using order instead of rank?

If the data is pre sorted then they tend to give the same result (unless
there are ties), but if your data is not presorted, then the results
will be different.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 
> -----Original Message-----
> From: Ken Williams [mailto:ken.williams at thomson.com] 
> Sent: Thursday, July 12, 2007 2:50 PM
> To: Greg Snow; R-help at stat.math.ethz.ch
> Subject: Re: [R] Compute rank within factor groups
> 
> 
> 
> 
> On 7/12/07 3:42 PM, "Ken Williams" <ken.williams at
thomson.com> wrote:
> 
> > I ended up using:
> > 
> >  wc$rank <- ave( wc$score, wc$report,
> >                  FUN=function(x) order(x, decreasing=TRUE) )
> > 
> > Which gives me the 1-based rank integers I was looking for.
> 
> Of course, immediately after sending I realized a simpler way:
> 
>   wc$rank <- ave( -wc$score, wc$report, FUN=order )
> 
> And as a newbie I think I get to be blissfully ignorant of 
> which one is faster. =)
> 
> 
> --
> Ken Williams
> Research Scientist
> The Thomson Corporation
> Eagan, MN
> 
>

Peter Dalgaard

2007-Jul-12 23:10 UTC

head link

[R] Compute rank within factor groups

Ken Williams wrote:> Hi,
>
> I have a data.frame which is ordered by score, and has a factor column:
>
>   Browse[1]> wc[c("report","score")]
>           report score
>   9         ADEA  0.96
>   8         ADEA  0.90
>   11 Asylum_FED9  0.86
>   3         ADEA  0.75
>   14 Asylum_FED9  0.60
>   5         ADEA  0.56
>   13 Asylum_FED9  0.51
>   16 Asylum_FED9  0.51
>   2         ADEA  0.42
>   7         ADEA  0.31
>   17 Asylum_FED9  0.27
>   1         ADEA  0.17
>   4         ADEA  0.17
>   6         ADEA  0.12
>   10        ADEA  0.11
>   12 Asylum_FED9  0.10
>   15 Asylum_FED9  0.09
>   18 Asylum_FED9  0.07
>   Browse[1]> 
>
> I need to add a column indicating rank within each factor group, which I
> currently accomplish like so:
>
>   wc$rank <- 0
>   for(report in as.character(unique(wc$report))) {
>     wc[wc$report==report,]$rank <- 1:sum(wc$report==report)
>   }
>
> I have to wonder whether there's a better way, something that gets rid
of
> the for() loop using tapply() or by() or similar.  But I haven't come
up
> with anything.
>
> I've tried these:
>
>   by(wc, wc$report, FUN=function(pr){pr$rank <- 1:nrow(pr)})
>
>   by(wc, wc$report, FUN=function(pr){wc[wc$report %in% pr$report,]$rank
<-
> 1:nrow(pr)})
>
> But in both cases the effect of the assignment is lost, there's no
$rank
> column generated for wc.
>
> Any suggestions?
>   There's a little known and somewhat unfortunately named function called 
ave() which does just that sort of thing.

 > ave(wc$score, wc$report, FUN=rank)
 [1] 10.0  9.0  8.0  8.0  7.0  7.0  5.5  5.5  6.0  5.0  4.0  3.5  3.5  
2.0  1.0
[16]  3.0  2.0  1.0

Greg Snow

2007-Jul-16 16:35 UTC

head link

[R] Compute rank within factor groups

The order and rank functions do something of an inverse of each other.
The rank function tells you the rank of each element of the vector (if
the first element is the 2nd smallest, then the first element of the
return in 2).  The order function tells you what order to put the vector
in to sort it (if the smallest element of the vector is in position 3,
then the first element of the returned vector will be 3).  Doing
something like:
> order(order(x))
Is similar to rank, except in how it deals with ties.

When the data is presorted, then rank and order both give you the same
as seq(along=x) which is just the set of integers from 1 to the length
of the vector.  

Most computations in R will switch between integer and double
automatically, but if you really need a vector to be integer, then use
the as.integer function.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 
> -----Original Message-----
> From: Ken Williams [mailto:ken.williams at thomson.com] 
> Sent: Thursday, July 12, 2007 3:50 PM
> To: Greg Snow; R-help at stat.math.ethz.ch
> Subject: Re: [R] Compute rank within factor groups
> 
> 
> 
> 
> On 7/12/07 4:28 PM, "Greg Snow" 
> <Greg.Snow at intermountainmail.org> wrote:
> 
> > Why are you using order instead of rank?
> > 
> > If the data is pre sorted then they tend to give the same result 
> > (unless there are ties), but if your data is not presorted, 
> then the 
> > results will be different.
> 
> Indeed, thanks for the catch.  I switched to order because 
> rank was giving me floats instead of integers, which I now 
> see was probably because it defaults to ties.method=average 
> and I wanted ties.method=first.
> 
> My data was indeed pre-sorted so I didn't notice the 
> difference (are they giving inverse permutations or 
> something? Can't quite follow...), but perhaps it won't always be.
> 
> 
> --
> Ken Williams
> Research Scientist
> The Thomson Corporation
> Eagan, MN
> 
>

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jul 2007 - Compute rank within factor groups

[R] Compute rank within factor groups

[R] Compute rank within factor groups

[R] Compute rank within factor groups

[R] Compute rank within factor groups

[R] Compute rank within factor groups

[R] Compute rank within factor groups

Reasonably Related Threads