If I have a data frame X that looks like this: A B - - 1 2 1 3 1 4 2 3 2 1 2 1 3 2 3 1 3 3 and I want to make another column which has the rank of B computed separately for each value of A. I.e. something like: A B C - - - 1 2 1 1 3 2 1 4 3 2 3 3 2 1 1 2 1 2 3 2 2 3 1 1 3 3 3 by(X, X[,1], function(x) { rank(x[,1], ties.method="random") } ) almost seems to work, but the data is not in a frame, and I can't figure out how to merge it back into X properly. Thanks, Lukas
Does this do what you want?> x <- "A B+ 1 2 + 1 3 + 1 4 + 2 3 + 2 1 + 2 1 + 3 2 + 3 1 + 3 3"> x <- read.table(textConnection(x), header=TRUE) > x$C <- ave(x$B, x$A, FUN=rank) > xA B C 1 1 2 1.0 2 1 3 2.0 3 1 4 3.0 4 2 3 3.0 5 2 1 1.5 6 2 1 1.5 7 3 2 2.0 8 3 1 1.0 9 3 3 3.0 On 4/18/07, Lukas Biewald <lukeb at powerset.com> wrote:> If I have a data frame X that looks like this: > > A B > - - > 1 2 > 1 3 > 1 4 > 2 3 > 2 1 > 2 1 > 3 2 > 3 1 > 3 3 > > and I want to make another column which has the rank of B computed > separately for each value of A. > > I.e. something like: > > A B C > - - - > 1 2 1 > 1 3 2 > 1 4 3 > 2 3 3 > 2 1 1 > 2 1 2 > 3 2 2 > 3 1 1 > 3 3 3 > > by(X, X[,1], function(x) { rank(x[,1], ties.method="random") } ) almost > seems to work, but the data is not in a frame, and I can't figure out how to > merge it back into X properly. > > Thanks, > Lukas > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Steven McKinney
2007-Apr-20 02:54 UTC
[R] Computing an ordering on subsets of a data frame
Hi Lukas, Using by() or its cousins tapply() etc. is tricky, as you need to properly merge results back into X. You can do that by adding a key ID variable to X, and carrying along that key ID variable in calls to by() etc., though I haven't tested out a method. You can also create a new column in X to hold the results, and then sort the subsections of X in a for() loop.> X <- data.frame(A = c(1,1,1,2,2,2,3,3,3), B = c(2,3,4,3,1,1,2,1,3)) > XA B 1 1 2 2 1 3 3 1 4 4 2 3 5 2 1 6 2 1 7 3 2 8 3 1 9 3 3> > X$C <- rep(as.numeric(NA), nrow(X)) > > sortLevels <- unique(X$A) > > for(i in seq(along = sortLevels)) {+ sortIdxp <- X$A == sortLevels[i] + X$C[sortIdxp] <- rank(X$B[sortIdxp], ties.method = "random") + }> XA B C 1 1 2 1 2 1 3 2 3 1 4 3 4 2 3 3 5 2 1 1 6 2 1 2 7 3 2 2 8 3 1 1 9 3 3 3>Merging results back in after using tapply() or by() is harder if your data frame is in random order, but the for() loop approach with indexing still works fine.> set.seed(123) > Y <- X[sample(9), ] > YA B C 3 1 4 3 7 3 2 2 9 3 3 3 6 2 1 2 5 2 1 1 1 1 2 1 2 1 3 2 8 3 1 1 4 2 3 3> Y$C <- rep(as.numeric(NA), nrow(Y)) > > sortLevels <- unique(Y$A)## You can also use levels() instead of unique() if Y$A is a factor.> > for(i in seq(along = sortLevels)) {+ sortIdxp <- Y$A == sortLevels[i] + Y$C[sortIdxp] <- rank(Y$B[sortIdxp], ties.method = "random") + }> YA B C 3 1 4 3 7 3 2 2 9 3 3 3 6 2 1 2 5 2 1 1 1 1 2 1 2 1 3 2 8 3 1 1 4 2 3 3> oY <- order(Y$A) > Y[oY,]A B C 3 1 4 3 1 1 2 1 2 1 3 2 6 2 1 2 5 2 1 1 4 2 3 3 7 3 2 2 9 3 3 3 8 3 1 1>HTH Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney at bccrc.ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch [mailto:r-help- > bounces at stat.math.ethz.ch] On Behalf Of Lukas Biewald > Sent: Wednesday, April 18, 2007 2:49 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Computing an ordering on subsets of a data frame > > If I have a data frame X that looks like this: > > A B > - - > 1 2 > 1 3 > 1 4 > 2 3 > 2 1 > 2 1 > 3 2 > 3 1 > 3 3 > > and I want to make another column which has the rank of B computed > separately for each value of A. > > I.e. something like: > > A B C > - - - > 1 2 1 > 1 3 2 > 1 4 3 > 2 3 3 > 2 1 1 > 2 1 2 > 3 2 2 > 3 1 1 > 3 3 3 > > by(X, X[,1], function(x) { rank(x[,1], ties.method="random") } )almost> seems to work, but the data is not in a frame, and I can't figure outhow> to > merge it back into X properly. > > Thanks, > Lukas > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.