Thomas Brambor
2010-May-17 21:32 UTC
[R] Create counter variable for subsets without a loop
Hi all, I am looking to create a rank variable based on a continuous variable for subsets of the data. For example, for an R integrated data set about US states this is how a loop could create what I want: ### Example with loop data <- cbind(state.region,as.data.frame(state.x77))[,1:2] # choosing a subset of the data data <- data[order(data$state.region, 1/data$Population),] # ordering the data regions <- levels(data$state.region) temp <- NULL ranks <- NULL for (i in 1:length(regions)){ temp <- rev(rank(data[data$state.region==regions[i],"Population"])) ranks <- c(ranks,temp) } data$rank <- ranks data where data$rank is the rank of the state by population within a region. However, using loops is slow and cumbersome. I have a fairly large data set with many subgroups and the loop runs a long time. Can someone suggest a way to create such rank variable for subsets without using a loop? Thank you, Thomas
take a look at the by, ave, aggregate and apply functions, perhaps one suits your needs Bart -- View this message in context: http://r.789695.n4.nabble.com/Create-counter-variable-for-subsets-without-a-loop-tp2220663p2220925.html Sent from the R help mailing list archive at Nabble.com.
Solved it another way, without apply: data2 <- data[order(data$state.region,-data$Population),] cx <- as.numeric(data2$state.region) data2$rank <- cumsum(rep(1,length(cx)))-match(cx,cx) + 1 all.equal(data2==data) Bart -- View this message in context: http://r.789695.n4.nabble.com/Create-counter-variable-for-subsets-without-a-loop-tp2220663p2221052.html Sent from the R help mailing list archive at Nabble.com.
Gabor Grothendieck
2010-May-18 11:39 UTC
[R] Create counter variable for subsets without a loop
Here are four solutions: data <- cbind(state.region,as.data.frame(state.x77))[,1:2] # ave data2 <- data[order(data$state.region, -data$Population), ] data2$rank <- ave(data2$Population, data2$state.region, FUN = seq_len)) # by f <- function(x) cbind(x[order(-x$Population), ], rank = 1:nrow(x)) do.call("rbind", by(data, data$state.region, f)) # ddply - same f as in by solution library(plyr) ddply(data, .(state.region), f) # sqldf with PostgreSQL library(RpgSQL) library(sqldf) sqldf('select *, rank() over (partition by "state.region" order by "Population" desc) from data order by "state.region", "Population" desc') On Mon, May 17, 2010 at 5:32 PM, Thomas Brambor <tbrambor at stanford.edu> wrote:> Hi all, > > I am looking to create a rank variable based on a continuous variable > for subsets of the data. For example, for an R integrated data set > about US states this is how a loop could create what I want: > > ### Example with loop > data <- cbind(state.region,as.data.frame(state.x77))[,1:2] ? ? # > choosing a subset of the data > data <- data[order(data$state.region, 1/data$Population),] ? ?# > ordering the data > regions <- levels(data$state.region) > temp <- NULL > ranks <- NULL > for (i in 1:length(regions)){ > ? ?temp <- rev(rank(data[data$state.region==regions[i],"Population"])) > ? ?ranks <- c(ranks,temp) > ?} > data$rank <- ranks > data > > where data$rank is the rank of the state by population within a region. > > However, using loops is slow and cumbersome. I have a fairly large > data set with many subgroups and the loop runs a long time. Can > someone suggest a way to create such rank variable for subsets without > using a loop? > > Thank you, > Thomas > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- Lattice equivalent of par(mfrow = )
- Converting matrix to data frame without losing an assigned dimname
- problem with lattice tiff or bitmap: character size and color
- state.x77 dataset
- question re. package playwith not able to run command getting error message that I'm attempting to use non function