... and continuing with this cute little thread... I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's.> set.seed(131) > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace= TRUE)> nums <- sub("[[:alpha:]]+","",xc) ## extract numeric part > alph <- sub("\\d+","",xc) ## extract alpha part > codes <- letters[1:3] ## whatever alpha codes are used > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values toconvert codes to> xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > data.frame (xc = xc, xnew = xnew)xc xnew 1 1a 1.3 2 2 2.0 3 1c 1.7 4 1c 1.7 5 1b 1.5 6 1a 1.3 7 2 2.0 8 2 2.0 9 1a 1.3 10 1a 1.3 11 2c 2.7 12 1b 1.5 13 1b 1.5 14 1 1.0 15 1c 1.7 Echoing others, no claim for optimality in any sense. Cheers, Bert On Fri, Jul 10, 2020 at 12:28 PM David Carlson <dcarlson at tamu.edu> wrote:> Here is a different approach: > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox at mcmaster.ca> wrote: > > > Dear Jean-Louis, > > > > There must be many ways to do this. Here's one simple way (with no claim > > of optimality!): > > > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > names(xn) <- xc > > > z <- xn[x] > > > > > > data.frame(z, x) > > z x > > 1 2.5 2b > > 2 2.5 2b > > 3 1.5 1b > > 4 2.3 2a > > 5 1.5 1b > > 6 1.3 1a > > 7 1.3 1a > > 8 2.3 2a > > 9 1.5 1b > > 10 2.0 2 > > 11 1.7 1c > > 12 2.3 2a > > 13 2.3 2a > > 14 1.0 1 > > 15 1.3 1a > > 16 1.5 1b > > 17 2.7 2c > > 18 2.0 2 > > 19 1.5 1b > > 20 1.5 1b > > > > I hope this helps, > > John > > > > ----------------------------- > > John Fox, Professor Emeritus > > McMaster University > > Hamilton, Ontario, Canada > > Web: http::/socserv.mcmaster.ca/jfox > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> > > wrote: > > > > > > Dear All > > > > > > I have a character vector, representing histology stages, such as for > > example: > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > have of course a pre-established classification available which does > > change according to the histology criteria under assessment. > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > such as > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > Unfortunately I have no clue on how to do that. > > > > > > Thanks for any help and apologies if I am missing the obvious way to do > > it. > > > > > > JL > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > > PLEASE do read the posting guide > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > PLEASE do read the posting guide > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Bert, Wouldn't you know it, but your contribution arrived just after I pressed "send" on my last message? So here's how your solution compares:> microbenchmark(John = John <- xn[x],+ Rich = Rich <- xn[match(x, xc)], + Jeff = Jeff <- { + n <- as.integer( sub( "[a-i]$", "", x ) ) + d <- match( sub( "^\\d+", "", x ), letters[1:9] ) + d[ is.na( d ) ] <- 0 + n + d / 10 + }, + David = David <- as.numeric(gsub("a", ".3", + gsub("b", ".5", + gsub("c", ".7", x)))), + Bert = Bert <- { + nums <- sub("[[:alpha:]]+","",x) + alph <- sub("\\d+","",x) + as.numeric(nums) + ifelse(alph == "",0, vals[alph]) + }, + times=1000L + ) Unit: microseconds expr min lq mean median uq max neval cld John 261.739 373.9765 599.9411 536.571 569.3750 14489.48 1000 a Rich 250.697 372.4450 542.3208 520.383 554.7215 10682.73 1000 a Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28 1000 b David 14337.510 18375.0100 20325.8796 20187.174 22161.0195 32575.31 1000 d Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465 32043.80 1000 c> all.equal(John, Rich)[1] TRUE> all.equal(John, David)[1] "names for target but not for current"> all.equal(John, Jeff)[1] "names for target but not for current" "Mean relative difference: 0.1498243"> all.equal(John, Bert)[1] "names for target but not for current" To make the comparison fair, I moved the parts of the solutions that don't depend on the length of the data outside the benchmark. Your solution does have the virtue of providing the right answer. Best, John> On Jul 10, 2020, at 3:54 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > ... and continuing with this cute little thread... > > I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's. > > > set.seed(131) > > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace = TRUE) > > nums <- sub("[[:alpha:]]+","",xc) ## extract numeric part > > alph <- sub("\\d+","",xc) ## extract alpha part > > codes <- letters[1:3] ## whatever alpha codes are used > > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to convert codes to > > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > > data.frame (xc = xc, xnew = xnew) > xc xnew > 1 1a 1.3 > 2 2 2.0 > 3 1c 1.7 > 4 1c 1.7 > 5 1b 1.5 > 6 1a 1.3 > 7 2 2.0 > 8 2 2.0 > 9 1a 1.3 > 10 1a 1.3 > 11 2c 2.7 > 12 1b 1.5 > 13 1b 1.5 > 14 1 1.0 > 15 1c 1.7 > > Echoing others, no claim for optimality in any sense. > > Cheers, > Bert > > > On Fri, Jul 10, 2020 at 12:28 PM David Carlson <dcarlson at tamu.edu> wrote: > Here is a different approach: > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox at mcmaster.ca> wrote: > > > Dear Jean-Louis, > > > > There must be many ways to do this. Here's one simple way (with no claim > > of optimality!): > > > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > names(xn) <- xc > > > z <- xn[x] > > > > > > data.frame(z, x) > > z x > > 1 2.5 2b > > 2 2.5 2b > > 3 1.5 1b > > 4 2.3 2a > > 5 1.5 1b > > 6 1.3 1a > > 7 1.3 1a > > 8 2.3 2a > > 9 1.5 1b > > 10 2.0 2 > > 11 1.7 1c > > 12 2.3 2a > > 13 2.3 2a > > 14 1.0 1 > > 15 1.3 1a > > 16 1.5 1b > > 17 2.7 2c > > 18 2.0 2 > > 19 1.5 1b > > 20 1.5 1b > > > > I hope this helps, > > John > > > > ----------------------------- > > John Fox, Professor Emeritus > > McMaster University > > Hamilton, Ontario, Canada > > Web: http::/socserv.mcmaster.ca/jfox > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> > > wrote: > > > > > > Dear All > > > > > > I have a character vector, representing histology stages, such as for > > example: > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > have of course a pre-established classification available which does > > change according to the histology criteria under assessment. > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > such as > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > Unfortunately I have no clue on how to do that. > > > > > > Thanks for any help and apologies if I am missing the obvious way to do > > it. > > > > > > JL > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > > PLEASE do read the posting guide > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > PLEASE do read the posting guide > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks! As I said, cute exercise. Best, Bert On Fri, Jul 10, 2020 at 1:21 PM Fox, John <jfox at mcmaster.ca> wrote:> Dear Bert, > > Wouldn't you know it, but your contribution arrived just after I pressed > "send" on my last message? So here's how your solution compares: > > > microbenchmark(John = John <- xn[x], > + Rich = Rich <- xn[match(x, xc)], > + Jeff = Jeff <- { > + n <- as.integer( sub( "[a-i]$", "", x ) ) > + d <- match( sub( "^\\d+", "", x ), letters[1:9] ) > + d[ is.na( d ) ] <- 0 > + n + d / 10 > + }, > + David = David <- as.numeric(gsub("a", ".3", > + gsub("b", ".5", > + gsub("c", ".7", x)))), > + Bert = Bert <- { > + nums <- sub("[[:alpha:]]+","",x) > + alph <- sub("\\d+","",x) > + as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > + }, > + times=1000L > + ) > Unit: microseconds > expr min lq mean median uq max > neval cld > John 261.739 373.9765 599.9411 536.571 569.3750 14489.48 > 1000 a > Rich 250.697 372.4450 542.3208 520.383 554.7215 10682.73 > 1000 a > Jeff 10879.223 13477.7665 15647.7856 15549.255 17516.7420 146155.28 > 1000 b > David 14337.510 18375.0100 20325.8796 20187.174 22161.0195 32575.31 > 1000 d > Bert 12344.506 15753.2510 18024.2757 17702.838 19973.0465 32043.80 > 1000 c > > all.equal(John, Rich) > [1] TRUE > > all.equal(John, David) > [1] "names for target but not for current" > > all.equal(John, Jeff) > [1] "names for target but not for current" "Mean relative difference: > 0.1498243" > > all.equal(John, Bert) > [1] "names for target but not for current" > > To make the comparison fair, I moved the parts of the solutions that don't > depend on the length of the data outside the benchmark. Your solution does > have the virtue of providing the right answer. > > Best, > John > > > On Jul 10, 2020, at 3:54 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > ... and continuing with this cute little thread... > > > > I found the OP's specification a little imprecise -- are your values > always a string that begins with *some sort" of numeric value followed by > "some sort" of alpha code? That is, could the numeric value be several > digits and the alpha code several letters? Probably not, and the existing > solutions you have been provided are almost certainly all you need. But for > fun, assuming this more general specification, here is a general way to > split your alphanumeric codes up into numeric and alpha parts and then > convert by using a couple of sub() 's. > > > > > set.seed(131) > > > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, > replace = TRUE) > > > nums <- sub("[[:alpha:]]+","",xc) ## extract numeric part > > > alph <- sub("\\d+","",xc) ## extract alpha part > > > codes <- letters[1:3] ## whatever alpha codes are used > > > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values to > convert codes to > > > xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > > > data.frame (xc = xc, xnew = xnew) > > xc xnew > > 1 1a 1.3 > > 2 2 2.0 > > 3 1c 1.7 > > 4 1c 1.7 > > 5 1b 1.5 > > 6 1a 1.3 > > 7 2 2.0 > > 8 2 2.0 > > 9 1a 1.3 > > 10 1a 1.3 > > 11 2c 2.7 > > 12 1b 1.5 > > 13 1b 1.5 > > 14 1 1.0 > > 15 1c 1.7 > > > > Echoing others, no claim for optimality in any sense. > > > > Cheers, > > Bert > > > > > > On Fri, Jul 10, 2020 at 12:28 PM David Carlson <dcarlson at tamu.edu> > wrote: > > Here is a different approach: > > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > > xn > > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > > > David L Carlson > > Professor Emeritus of Anthropology > > Texas A&M University > > > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox at mcmaster.ca> wrote: > > > > > Dear Jean-Louis, > > > > > > There must be many ways to do this. Here's one simple way (with no > claim > > > of optimality!): > > > > > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > set.seed(123) # for reproducibility > > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > > > names(xn) <- xc > > > > z <- xn[x] > > > > > > > > data.frame(z, x) > > > z x > > > 1 2.5 2b > > > 2 2.5 2b > > > 3 1.5 1b > > > 4 2.3 2a > > > 5 1.5 1b > > > 6 1.3 1a > > > 7 1.3 1a > > > 8 2.3 2a > > > 9 1.5 1b > > > 10 2.0 2 > > > 11 1.7 1c > > > 12 2.3 2a > > > 13 2.3 2a > > > 14 1.0 1 > > > 15 1.3 1a > > > 16 1.5 1b > > > 17 2.7 2c > > > 18 2.0 2 > > > 19 1.5 1b > > > 20 1.5 1b > > > > > > I hope this helps, > > > John > > > > > > ----------------------------- > > > John Fox, Professor Emeritus > > > McMaster University > > > Hamilton, Ontario, Canada > > > Web: http::/socserv.mcmaster.ca/jfox > > > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> > > > wrote: > > > > > > > > Dear All > > > > > > > > I have a character vector, representing histology stages, such as > for > > > example: > > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > > have of course a pre-established classification available which does > > > change according to the histology criteria under assessment. > > > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > > such as > > > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > > > Unfortunately I have no clue on how to do that. > > > > > > > > Thanks for any help and apologies if I am missing the obvious way to > do > > > it. > > > > > > > > JL > > > > -- > > > > Verif30042020 > > > > > > > > ______________________________________________ > > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > > > PLEASE do read the posting guide > > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > > > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > > PLEASE do read the posting guide > > > > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]