Thomas Brambor
2010-May-17 21:32 UTC
[R] Create counter variable for subsets without a loop
Hi all,
I am looking to create a rank variable based on a continuous variable
for subsets of the data. For example, for an R integrated data set
about US states this is how a loop could create what I want:
### Example with loop
data <- cbind(state.region,as.data.frame(state.x77))[,1:2] #
choosing a subset of the data
data <- data[order(data$state.region, 1/data$Population),] #
ordering the data
regions <- levels(data$state.region)
temp <- NULL
ranks <- NULL
for (i in 1:length(regions)){
temp <-
rev(rank(data[data$state.region==regions[i],"Population"]))
ranks <- c(ranks,temp)
}
data$rank <- ranks
data
where data$rank is the rank of the state by population within a region.
However, using loops is slow and cumbersome. I have a fairly large
data set with many subgroups and the loop runs a long time. Can
someone suggest a way to create such rank variable for subsets without
using a loop?
Thank you,
Thomas
take a look at the by, ave, aggregate and apply functions, perhaps one suits your needs Bart -- View this message in context: http://r.789695.n4.nabble.com/Create-counter-variable-for-subsets-without-a-loop-tp2220663p2220925.html Sent from the R help mailing list archive at Nabble.com.
Solved it another way, without apply: data2 <- data[order(data$state.region,-data$Population),] cx <- as.numeric(data2$state.region) data2$rank <- cumsum(rep(1,length(cx)))-match(cx,cx) + 1 all.equal(data2==data) Bart -- View this message in context: http://r.789695.n4.nabble.com/Create-counter-variable-for-subsets-without-a-loop-tp2220663p2221052.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2010-May-18 11:39 UTC
[R] Create counter variable for subsets without a loop
Here are four solutions:
data <- cbind(state.region,as.data.frame(state.x77))[,1:2]
# ave
data2 <- data[order(data$state.region, -data$Population), ]
data2$rank <- ave(data2$Population, data2$state.region, FUN = seq_len))
# by
f <- function(x) cbind(x[order(-x$Population), ], rank = 1:nrow(x))
do.call("rbind", by(data, data$state.region, f))
# ddply - same f as in by solution
library(plyr)
ddply(data, .(state.region), f)
# sqldf with PostgreSQL
library(RpgSQL)
library(sqldf)
sqldf('select
*, rank() over (partition by "state.region" order by
"Population" desc)
from data
order by "state.region", "Population" desc')
On Mon, May 17, 2010 at 5:32 PM, Thomas Brambor <tbrambor at stanford.edu>
wrote:> Hi all,
>
> I am looking to create a rank variable based on a continuous variable
> for subsets of the data. For example, for an R integrated data set
> about US states this is how a loop could create what I want:
>
> ### Example with loop
> data <- cbind(state.region,as.data.frame(state.x77))[,1:2] ? ? #
> choosing a subset of the data
> data <- data[order(data$state.region, 1/data$Population),] ? ?#
> ordering the data
> regions <- levels(data$state.region)
> temp <- NULL
> ranks <- NULL
> for (i in 1:length(regions)){
> ? ?temp <-
rev(rank(data[data$state.region==regions[i],"Population"]))
> ? ?ranks <- c(ranks,temp)
> ?}
> data$rank <- ranks
> data
>
> where data$rank is the rank of the state by population within a region.
>
> However, using loops is slow and cumbersome. I have a fairly large
> data set with many subgroups and the loop runs a long time. Can
> someone suggest a way to create such rank variable for subsets without
> using a loop?
>
> Thank you,
> Thomas
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Seemingly Similar Threads
- Lattice equivalent of par(mfrow = )
- Converting matrix to data frame without losing an assigned dimname
- problem with lattice tiff or bitmap: character size and color
- state.x77 dataset
- question re. package playwith not able to run command getting error message that I'm attempting to use non function