Hi, I'm trying to present a table of some experimental data, and I want to order the rows by the instance names. The issue I've got is that there are a variety of conventions for the instance names (e.g. competition01, competition13, small_1, big_20, med_9). What I want to be able to sort them first in category order so: competition < small < med < big, and then perform the secondary ordering by the final one or two digits. I've used Hadley Wickham's StringR package to split the names into the string and numeric sections so I could get those ordered easily enough. What I'm struggling with is how to sort the categories (because I don't want them in a straight alphabetic order). library(stringr) d <- data.frame(instance c("competition11","competition01","big_20","small_4","small_2","med_9")) id <- str_extract(d$instance, "\\d{1,}$") Any pointers would be gratefully received. Thanks, Alastair -- View this message in context: http://r.789695.n4.nabble.com/Sorting-a-Data-Frame-by-hybrid-string-number-key-tp3258283p3258283.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2011-Feb-03 15:46 UTC
[R] Sorting a Data Frame by hybrid string / number key
On Feb 3, 2011, at 10:12 AM, Alastair wrote:> > Hi, > > I'm trying to present a table of some experimental data, and I want > to order > the rows by the instance names. The issue I've got is that there are a > variety of conventions for the instance names (e.g. competition01, > competition13, small_1, big_20, med_9). What I want to be able to > sort them > first in category order so: competition < small < med < big, and then > perform the secondary ordering by the final one or two digits. > > I've used Hadley Wickham's StringR package to split the names into the > string and numeric sections so I could get those ordered easily > enough. What > I'm struggling with is how to sort the categories (because I don't > want them > in a straight alphabetic order). > > library(stringr) > d <- data.frame(instance > c > ("competition11 > ","competition01","big_20","small_4","small_2","med_9")) > id <- str_extract(d$instance, "\\d{1,}$")mixedsort {gtools} R Documentation Order or Sort strings with embedded numbers so that the numbers are in the correct order> > Any pointers would be gratefully received. > Thanks, > Alastair > > -- > View this message in context: http://r.789695.n4.nabble.com/Sorting-a-Data-Frame-by-hybrid-string-number-key-tp3258283p3258283.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
William Dunlap
2011-Feb-03 16:22 UTC
[R] Sorting a Data Frame by hybrid string / number key
To sort a character vector in a desired order you can convert it to a factor with the levels in the desired order. To sort strings like "2" and "11" in numerical order, use convert them to numbers with as.numeric. To sort by two variables, using the second to break ties in the first, use data[order(first, second),]. E.g., > library(stringr) > d <- data.frame(instance + c("competition11","competition01","big_20","small_4","small_2","med_9")) > mySortByInstance <- function(data) { + # assume data$instance is of form <type><number>, perhaps + # with underscore between. Sort by type, breaking ties + # with number. + id <- as.numeric(str_extract(data$instance, "\\d{1,}$")) + type <- factor(str_extract(data$instance, "^[[:alpha:]]+"), + levels=c("competition", "small", "med", "big")) + data[order(type, id), , drop=FALSE] + } > mySortByInstance(d) instance 2 competition01 1 competition11 5 small_2 4 small_4 6 med_9 3 big_20 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Alastair > Sent: Thursday, February 03, 2011 7:13 AM > To: r-help at r-project.org > Subject: [R] Sorting a Data Frame by hybrid string / number key > > > Hi, > > I'm trying to present a table of some experimental data, and > I want to order > the rows by the instance names. The issue I've got is that there are a > variety of conventions for the instance names (e.g. competition01, > competition13, small_1, big_20, med_9). What I want to be > able to sort them > first in category order so: competition < small < med < big, and then > perform the secondary ordering by the final one or two digits. > > I've used Hadley Wickham's StringR package to split the names into the > string and numeric sections so I could get those ordered > easily enough. What > I'm struggling with is how to sort the categories (because I > don't want them > in a straight alphabetic order). > > library(stringr) > d <- data.frame(instance > c("competition11","competition01","big_20","small_4","small_2","med_9"))> id <- str_extract(d$instance, "\\d{1,}$") > > Any pointers would be gratefully received. > Thanks, > Alastair > > -- > View this message in context: > http://r.789695.n4.nabble.com/Sorting-a-Data-Frame-by-hybrid-string-number-key-tp3258283p3258283.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >