thr3ads.net - R help - [R] percent rank by an index key? [Nov 2005]

If this information is useful, please help other people find it:
Share via:

t c

2005-Nov-01 17:09 UTC

[R] percent rank by an index key?

What is the easiest way to calculate a percent rank “by” an index key?

 

Foe example, I have a dataset with 3 fields:

 

Year,    State,   Income ,

 

I wish to calculate the rank, by year, by state.

I also wish to calculate the “percent rank”, where I define percent rank as
rank/n.

 

(n is the number of numeric data points within each date-state grouping.)

 

 

This is what I am currently doing:

 

1.  I create a “group by” field by using the paste function to combine date and
state into a field called date_state.   I then use the rank function to
calculate the rank by date, by state.

 

2. I then add a field called “one” that I set to 1 if the value in income is
numeric and to 0 if it is not.

 

3. I then take an aggregate sum of “one”.  This gives me a count (n) for each
date-state grouping.

 

 

4. I next use merge to add this count to the table.

 

5. Finally, I calculate the percent rank.

 

Pr<-rank/n

 

The merge takes quite a bit of time to process. 

 

Is there an easier/more efficient way to calculate the percent rank?

 

 


		
---------------------------------

	[[alternative HTML version deleted]]

Sundar Dorai-Raj

2005-Nov-01 17:25 UTC

head link

[R] percent rank by an index key?

t c wrote:> What is the easiest way to calculate a percent rank ?by? an index key?
> 
>  
> 
> Foe example, I have a dataset with 3 fields:
> 
>  
> 
> Year,    State,   Income ,
> 
>  
> 
> I wish to calculate the rank, by year, by state.
> 
> I also wish to calculate the ?percent rank?, where I define percent rank as
rank/n.
> 
>  
> 
> (n is the number of numeric data points within each date-state grouping.)
> 
>  
> 
>  
> 
> This is what I am currently doing:
> 
>  
> 
> 1.  I create a ?group by? field by using the paste function to combine date
and state into a field called date_state.   I then use the rank function to
calculate the rank by date, by state.
> 
>  
> 
> 2. I then add a field called ?one? that I set to 1 if the value in income
is numeric and to 0 if it is not.
> 
>  
> 
> 3. I then take an aggregate sum of ?one?.  This gives me a count (n) for
each date-state grouping.
> 
>  
> 
>  
> 
> 4. I next use merge to add this count to the table.
> 
>  
> 
> 5. Finally, I calculate the percent rank.
> 
>  
> 
> Pr<-rank/n
> 
>  
> 
> The merge takes quite a bit of time to process. 
> 
>  
> 
> Is there an easier/more efficient way to calculate the percent rank?
> 
How about using ?by:

set.seed(100)
# fake data set, replace with your own
# "Subject" is just a dummy to produce replicates
x <- expand.grid(Year = 2000:2005,
                  State = c("TX", "AL"),
                  Subject = 1:10)
x$Income <- floor(runif(NROW(x)) * 100000)

r <- by(x$Income, x[c("Year", "State")],
         function(x) {
           r <- rank(x)
           n <- length(x)
           cbind(Rank = r, PRank = r/n)
         })
x <- cbind(x, do.call("rbind", r))

HTH,

--sundar

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Nov 2005 - percent rank by an index key?

[R] percent rank by an index key?

[R] percent rank by an index key?

Possibly Parallel Threads