Henrik Bengtsson
2014-Aug-27 00:24 UTC
[R] Mixed sorting/ordering of strings acknowledging roman numerals?
Hi, does anyone know of an implementation/function that sorts strings that *contain* roman numerals (I, II, III, IV, V, ...) which are treated as numbers. In 'gtools' there is mixedsort() which does this for strings that contains (decimal) numbers. I'm looking for a "mixedsortroman()" function that does the same but with roman numbers, e.g. ## DECIMAL NUMBERS> x <- sprintf("chr %d", 12:1) > x[1] "chr 12" "chr 11" "chr 10" "chr 9" "chr 8" [6] "chr 7" "chr 6" "chr 5" "chr 4" "chr 3" [11] "chr 2" "chr 1"> sort(x)[1] "chr 1" "chr 10" "chr 11" "chr 12" "chr 2" [6] "chr 3" "chr 4" "chr 5" "chr 6" "chr 7" [11] "chr 8" "chr 9"> gtools::mixedsort(x)[1] "chr 1" "chr 2" "chr 3" "chr 4" "chr 5" [6] "chr 6" "chr 7" "chr 8" "chr 9" "chr 10" [11] "chr 11" "chr 12" ## ROMAN NUMBERS> y <- sprintf("chr %s", as.roman(12:1)) > y[1] "chr XII" "chr XI" "chr X" "chr IX" [5] "chr VIII" "chr VII" "chr VI" "chr V" [9] "chr IV" "chr III" "chr II" "chr I"> sort(y)[1] "chr I" "chr II" "chr III" "chr IV" [5] "chr IX" "chr V" "chr VI" "chr VII" [9] "chr VIII" "chr X" "chr XI" "chr XII"> mixedsortroman(y)[1] "chr I" "chr II" "chr III" "chr IV" [5] "chr V" "chr VI" "chr VII" "chr VIII" [9] "chr IX" "chr X" "chr XI" "chr XII" The latter is what I'm looking for. Before hacking together something myself (e.g. identify roman numerals substrings, translate them to decimal numbers, use gtools::mixedsort() to sort them and then translate them back to roman numbers), I'd like to hear if someone already has this implemented/know of a package that does this. Thanks, Henrik
David Winsemius
2014-Aug-27 01:46 UTC
[R] Mixed sorting/ordering of strings acknowledging roman numerals?
On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote:> Hi, > > does anyone know of an implementation/function that sorts strings that > *contain* roman numerals (I, II, III, IV, V, ...) which are treated as > numbers. In 'gtools' there is mixedsort() which does this for strings > that contains (decimal) numbers. I'm looking for a "mixedsortroman()" > function that does the same but with roman numbers, e.g.It's pretty easy to sort something you know to be congruent with the existing roman class: romanC <- as.character( as.roman(1:3899) ) match(c("I", "II", "III","X","V"), romanC) #[1] 1 2 3 10 5 But I guess you already know that, so you want a regex approach to parsing. Looking at the path taken by Warnes, it would involve doing something like his regex based insertion of a delimiter for "Roman numeral" but simpler because he needed to deal with decimal points and signs and exponent notation, none of which you appear to need. If you only need to consider character and Roman, then this hack of Warnes tools succeeds: mixedorderRoman <- function (x) { if (length(x) < 1) return(NULL) else if (length(x) == 1) return(1) if (is.numeric(x)) return(order(x)) delim = "\\$\\@\\$" roman <- function(x) { suppressWarnings(match(x, romanC)) } nonnumeric <- function(x) { suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), NA)) } x <- as.character(x) which.nas <- which(is.na(x)) which.blanks <- which(x == "") if (length(which.blanks) > 0) x[which.blanks] <- -Inf if (length(which.nas) > 0) x[which.nas] <- Inf delimited <- gsub("([IVXCL]+)", paste(delim, "\\1", delim, sep = ""), x) step1 <- strsplit(delimited, delim) step1 <- lapply(step1, function(x) x[x > ""]) step1.roman <- lapply(step1, roman) step1.character <- lapply(step1, nonnumeric) maxelem <- max(sapply(step1, length)) step1.roman.t <- lapply(1:maxelem, function(i) sapply(step1.roman, function(x) x[i])) step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, function(x) x[i])) rank.roman <- sapply(step1.roman.t, rank) rank.character <- sapply(step1.character.t, function(x) as.numeric(factor(x))) rank.roman[!is.na(rank.character)] <- 0 rank.character <- t(t(rank.character) + apply(matrix(rank.roman), 2, max, na.rm = TRUE)) rank.overall <- ifelse(is.na(rank.character), rank.numeric, rank.character) order.frame <- as.data.frame(rank.overall) if (length(which.nas) > 0) order.frame[which.nas, ] <- Inf retval <- do.call("order", order.frame) return(retval) } y[mixedorderRoman(y)] [1] "chr I" "chr II" "chr III" "chr IV" "chr IX" [6] "chr V" "chr VI" "chr VII" "chr VIII" "chr X" [11] "chr XI" "chr XII" -- David.> > ## DECIMAL NUMBERS >> x <- sprintf("chr %d", 12:1) >> x > [1] "chr 12" "chr 11" "chr 10" "chr 9" "chr 8" > [6] "chr 7" "chr 6" "chr 5" "chr 4" "chr 3" > [11] "chr 2" "chr 1" > >> sort(x) > [1] "chr 1" "chr 10" "chr 11" "chr 12" "chr 2" > [6] "chr 3" "chr 4" "chr 5" "chr 6" "chr 7" > [11] "chr 8" "chr 9" > >> gtools::mixedsort(x) > [1] "chr 1" "chr 2" "chr 3" "chr 4" "chr 5" > [6] "chr 6" "chr 7" "chr 8" "chr 9" "chr 10" > [11] "chr 11" "chr 12" > > > ## ROMAN NUMBERS >> y <- sprintf("chr %s", as.roman(12:1)) >> y > [1] "chr XII" "chr XI" "chr X" "chr IX" > [5] "chr VIII" "chr VII" "chr VI" "chr V" > [9] "chr IV" "chr III" "chr II" "chr I" > >> sort(y) > [1] "chr I" "chr II" "chr III" "chr IV" > [5] "chr IX" "chr V" "chr VI" "chr VII" > [9] "chr VIII" "chr X" "chr XI" "chr XII" > >> mixedsortroman(y) > [1] "chr I" "chr II" "chr III" "chr IV" > [5] "chr V" "chr VI" "chr VII" "chr VIII" > [9] "chr IX" "chr X" "chr XI" "chr XII" > > The latter is what I'm looking for. > > Before hacking together something myself (e.g. identify roman numerals > substrings, translate them to decimal numbers, use gtools::mixedsort() > to sort them and then translate them back to roman numbers), I'd like > to hear if someone already has this implemented/know of a package that > does this. > > Thanks, > > Henrik > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA