t c wrote:> What is the easiest way to calculate a percent rank ?by? an index key?
>
>
>
> Foe example, I have a dataset with 3 fields:
>
>
>
> Year, State, Income ,
>
>
>
> I wish to calculate the rank, by year, by state.
>
> I also wish to calculate the ?percent rank?, where I define percent rank as
rank/n.
>
>
>
> (n is the number of numeric data points within each date-state grouping.)
>
>
>
>
>
> This is what I am currently doing:
>
>
>
> 1. I create a ?group by? field by using the paste function to combine date
and state into a field called date_state. I then use the rank function to
calculate the rank by date, by state.
>
>
>
> 2. I then add a field called ?one? that I set to 1 if the value in income
is numeric and to 0 if it is not.
>
>
>
> 3. I then take an aggregate sum of ?one?. This gives me a count (n) for
each date-state grouping.
>
>
>
>
>
> 4. I next use merge to add this count to the table.
>
>
>
> 5. Finally, I calculate the percent rank.
>
>
>
> Pr<-rank/n
>
>
>
> The merge takes quite a bit of time to process.
>
>
>
> Is there an easier/more efficient way to calculate the percent rank?
>
How about using ?by:
set.seed(100)
# fake data set, replace with your own
# "Subject" is just a dummy to produce replicates
x <- expand.grid(Year = 2000:2005,
State = c("TX", "AL"),
Subject = 1:10)
x$Income <- floor(runif(NROW(x)) * 100000)
r <- by(x$Income, x[c("Year", "State")],
function(x) {
r <- rank(x)
n <- length(x)
cbind(Rank = r, PRank = r/n)
})
x <- cbind(x, do.call("rbind", r))
HTH,
--sundar