Dear Jean-Louis, There must be many ways to do this. Here's one simple way (with no claim of optimality!):> xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > set.seed(123) # for reproducibility > x <- sample(xc, 20, replace=TRUE) # "data" > > names(xn) <- xc > z <- xn[x] > > data.frame(z, x)z x 1 2.5 2b 2 2.5 2b 3 1.5 1b 4 2.3 2a 5 1.5 1b 6 1.3 1a 7 1.3 1a 8 2.3 2a 9 1.5 1b 10 2.0 2 11 1.7 1c 12 2.3 2a 13 2.3 2a 14 1.0 1 15 1.3 1a 16 1.5 1b 17 2.7 2c 18 2.0 2 19 1.5 1b 20 1.5 1b I hope this helps, John ----------------------------- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada Web: http::/socserv.mcmaster.ca/jfox> On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> wrote: > > Dear All > > I have a character vector, representing histology stages, such as for example: > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established classification available which does change according to the histology criteria under assessment. > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > Unfortunately I have no clue on how to do that. > > Thanks for any help and apologies if I am missing the obvious way to do it. > > JL > -- > Verif30042020 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Here is a different approach: xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) xn # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 David L Carlson Professor Emeritus of Anthropology Texas A&M University On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox at mcmaster.ca> wrote:> Dear Jean-Louis, > > There must be many ways to do this. Here's one simple way (with no claim > of optimality!): > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > set.seed(123) # for reproducibility > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > names(xn) <- xc > > z <- xn[x] > > > > data.frame(z, x) > z x > 1 2.5 2b > 2 2.5 2b > 3 1.5 1b > 4 2.3 2a > 5 1.5 1b > 6 1.3 1a > 7 1.3 1a > 8 2.3 2a > 9 1.5 1b > 10 2.0 2 > 11 1.7 1c > 12 2.3 2a > 13 2.3 2a > 14 1.0 1 > 15 1.3 1a > 16 1.5 1b > 17 2.7 2c > 18 2.0 2 > 19 1.5 1b > 20 1.5 1b > > I hope this helps, > John > > ----------------------------- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > Web: http::/socserv.mcmaster.ca/jfox > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> > wrote: > > > > Dear All > > > > I have a character vector, representing histology stages, such as for > example: > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > and this goes on to 3, 3a etc in various order for each patient. I do > have of course a pre-established classification available which does > change according to the histology criteria under assessment. > > > > I would want to convert xc, for plotting reasons, to a numeric vector > such as > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > Unfortunately I have no clue on how to do that. > > > > Thanks for any help and apologies if I am missing the obvious way to do > it. > > > > JL > > -- > > Verif30042020 > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > PLEASE do read the posting guide > urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > PLEASE do read the posting guide > urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
... and continuing with this cute little thread... I found the OP's specification a little imprecise -- are your values always a string that begins with *some sort" of numeric value followed by "some sort" of alpha code? That is, could the numeric value be several digits and the alpha code several letters? Probably not, and the existing solutions you have been provided are almost certainly all you need. But for fun, assuming this more general specification, here is a general way to split your alphanumeric codes up into numeric and alpha parts and then convert by using a couple of sub() 's.> set.seed(131) > xc <- sample(c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c"), 15, replace= TRUE)> nums <- sub("[[:alpha:]]+","",xc) ## extract numeric part > alph <- sub("\\d+","",xc) ## extract alpha part > codes <- letters[1:3] ## whatever alpha codes are used > vals <- setNames(c(.3,.5,.7), codes) ## whatever numeric values toconvert codes to> xnew <- as.numeric(nums) + ifelse(alph == "",0, vals[alph]) > data.frame (xc = xc, xnew = xnew)xc xnew 1 1a 1.3 2 2 2.0 3 1c 1.7 4 1c 1.7 5 1b 1.5 6 1a 1.3 7 2 2.0 8 2 2.0 9 1a 1.3 10 1a 1.3 11 2c 2.7 12 1b 1.5 13 1b 1.5 14 1 1.0 15 1c 1.7 Echoing others, no claim for optimality in any sense. Cheers, Bert On Fri, Jul 10, 2020 at 12:28 PM David Carlson <dcarlson at tamu.edu> wrote:> Here is a different approach: > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox at mcmaster.ca> wrote: > > > Dear Jean-Louis, > > > > There must be many ways to do this. Here's one simple way (with no claim > > of optimality!): > > > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > set.seed(123) # for reproducibility > > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > > > names(xn) <- xc > > > z <- xn[x] > > > > > > data.frame(z, x) > > z x > > 1 2.5 2b > > 2 2.5 2b > > 3 1.5 1b > > 4 2.3 2a > > 5 1.5 1b > > 6 1.3 1a > > 7 1.3 1a > > 8 2.3 2a > > 9 1.5 1b > > 10 2.0 2 > > 11 1.7 1c > > 12 2.3 2a > > 13 2.3 2a > > 14 1.0 1 > > 15 1.3 1a > > 16 1.5 1b > > 17 2.7 2c > > 18 2.0 2 > > 19 1.5 1b > > 20 1.5 1b > > > > I hope this helps, > > John > > > > ----------------------------- > > John Fox, Professor Emeritus > > McMaster University > > Hamilton, Ontario, Canada > > Web: http::/socserv.mcmaster.ca/jfox > > > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> > > wrote: > > > > > > Dear All > > > > > > I have a character vector, representing histology stages, such as for > > example: > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > > > and this goes on to 3, 3a etc in various order for each patient. I do > > have of course a pre-established classification available which does > > change according to the histology criteria under assessment. > > > > > > I would want to convert xc, for plotting reasons, to a numeric vector > > such as > > > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > > > Unfortunately I have no clue on how to do that. > > > > > > Thanks for any help and apologies if I am missing the obvious way to do > > it. > > > > > > JL > > > -- > > > Verif30042020 > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > > urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > > PLEASE do read the posting guide > > > urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > > urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > PLEASE do read the posting guide > > > urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi, We've had several solutions, and I was curious about their relative efficiency. Here's a test with a moderately large data vector:> library("microbenchmark") > set.seed(123) # for reproducibility > x <- sample(xc, 1e4, replace=TRUE) # "data" > microbenchmark(John = John <- xn[x],+ Rich = Rich <- xn[match(x, xc)], + Jeff = Jeff <- { + n <- as.integer( sub( "[a-i]$", "", x ) ) + d <- match( sub( "^\\d+", "", x ), letters[1:9] ) + d[ is.na( d ) ] <- 0 + n + d / 10 + }, + David = David <- as.numeric(gsub("a", ".3", + gsub("b", ".5", + gsub("c", ".7", x)))), + times=1000L + ) Unit: microseconds expr min lq mean median uq max neval cld John 228.816 345.371 513.5614 503.5965 533.0635 10829.08 1000 a Rich 217.395 343.035 534.2074 489.0075 518.3260 15388.96 1000 a Jeff 10325.471 13070.737 15387.2545 15397.9790 17204.0115 153486.94 1000 b David 14256.673 18148.492 20185.7156 20170.3635 22067.6690 34998.95 1000 c> all.equal(John, Rich)[1] TRUE> all.equal(John, David)[1] "names for target but not for current"> all.equal(John, Jeff)[1] "names for target but not for current" "Mean relative difference: 0.1498243" Of course, efficiency isn't the only consideration, and aesthetically (and no doubt subjectively) I prefer Rich Heiberger's solution. OTOH, Jeff's solution is more general in that it generates the correspondence between letters and numbers. The argument for Jeff's solution would, however, be stronger if it gave the desired answer. Best, John> On Jul 10, 2020, at 3:28 PM, David Carlson <dcarlson at tamu.edu> wrote: > > Here is a different approach: > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > xn <- as.numeric(gsub("a", ".3", gsub("b", ".5", gsub("c", ".7", xc)))) > xn > # [1] 1.0 1.3 1.5 1.7 2.0 2.3 2.5 2.7 > > David L Carlson > Professor Emeritus of Anthropology > Texas A&M University > > On Fri, Jul 10, 2020 at 1:10 PM Fox, John <jfox at mcmaster.ca> wrote: > Dear Jean-Louis, > > There must be many ways to do this. Here's one simple way (with no claim of optimality!): > > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > set.seed(123) # for reproducibility > > x <- sample(xc, 20, replace=TRUE) # "data" > > > > names(xn) <- xc > > z <- xn[x] > > > > data.frame(z, x) > z x > 1 2.5 2b > 2 2.5 2b > 3 1.5 1b > 4 2.3 2a > 5 1.5 1b > 6 1.3 1a > 7 1.3 1a > 8 2.3 2a > 9 1.5 1b > 10 2.0 2 > 11 1.7 1c > 12 2.3 2a > 13 2.3 2a > 14 1.0 1 > 15 1.3 1a > 16 1.5 1b > 17 2.7 2c > 18 2.0 2 > 19 1.5 1b > 20 1.5 1b > > I hope this helps, > John > > ----------------------------- > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > Web: http::/socserv.mcmaster.ca/jfox > > > On Jul 10, 2020, at 1:50 PM, Jean-Louis Abitbol <abitbol at sent.com> wrote: > > > > Dear All > > > > I have a character vector, representing histology stages, such as for example: > > xc <- c("1", "1a", "1b", "1c", "2", "2a", "2b", "2c") > > > > and this goes on to 3, 3a etc in various order for each patient. I do have of course a pre-established classification available which does change according to the histology criteria under assessment. > > > > I would want to convert xc, for plotting reasons, to a numeric vector such as > > > > xn <- c(1, 1.3, 1.5, 1.7, 2, 2.3, 2.5, 2.7) > > > > Unfortunately I have no clue on how to do that. > > > > Thanks for any help and apologies if I am missing the obvious way to do it. > > > > JL > > -- > > Verif30042020 > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > > PLEASE do read the posting guide urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcU3rSW6I$ > PLEASE do read the posting guide urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!V7p9rtNSgBWmF3KJ3U_01fR7vP_I7y-OnWHiTFxwRZ6bVJ3-emOwkBtcg7nzsmk$ > and provide commented, minimal, self-contained, reproducible code.