I am having some problems with what seems like a pretty simple issue. I have some data where I want to convert numbers. Specifically, this is cancer data and the size of tumors is encoded using millimeter measurements. However, if the actual measurement is not available the coding may imply a less specific range of sizes. For instance numbers 0-89 may indicate size in mm, but 90 indicates "greater than 90 mm" , 91 indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15, etc. I have many such tables so I would like to be able to write a function which takes as input a threshold over which new values need to be looked up, and the new lookup table, returning the new values. I successfully wrote the function: translate_seer_numeric <- function(var, upper, lookup) { names(lookup) <- c('old','new') names(var) <- 'old' var <- as.data.frame(var) lookup2 <- data.frame(old = c(1:upper), new = c(1:upper)) lookup3 <- rbind(lookup, lookup2) print(var) res <- left_join(var, lookup3, by = 'old') %>% select(new) res } test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95, 99), new = c(3, 5, NA)) translate_seer_numeric(test1, 90, lup) The above test generates the desired output: old1 992 953 934 8 new1 NA2 53 34 8 My problem comes when I try to put this in line with pipes and the mutate function: test1 %>% mutate(varb = translate_seer_numeric(var = old, 90, lup))#### Error: Problem with `mutate()` input `varb`. x Join columns must be present in data. x Problem with `old`. i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`. Thoughts?? [[alternative HTML version deleted]]
If you are willing to entertain another approach, have a look at ?cut. By defining the 'breaks' argument appropriately, you can easily create a factor that tells you which values should be looked up and which accepted as is. If I understand correctly, this seems to be what you want. If I have not, just ignore and wait for a more useful reply. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <sjrigatti at gmail.com> wrote:> I am having some problems with what seems like a pretty simple issue. I > have some data where I want to convert numbers. Specifically, this is > cancer data and the size of tumors is encoded using millimeter > measurements. However, if the actual measurement is not available the > coding may imply a less specific range of sizes. For instance numbers 0-89 > may indicate size in mm, but 90 indicates "greater than 90 mm" , 91 > indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15, > etc. > > I have many such tables so I would like to be able to write a function > which takes as input a threshold over which new values need to be looked > up, and the new lookup table, returning the new values. > > I successfully wrote the function: > > translate_seer_numeric <- function(var, upper, lookup) { > names(lookup) <- c('old','new') > names(var) <- 'old' > var <- as.data.frame(var) > lookup2 <- data.frame(old = c(1:upper), > new = c(1:upper)) > lookup3 <- rbind(lookup, lookup2) > print(var) > res <- left_join(var, lookup3, by = 'old') %>% > select(new) > > res > > } > > test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95, > 99), > new = c(3, 5, NA)) > translate_seer_numeric(test1, 90, lup) > > The above test generates the desired output: > > old1 992 953 934 8 > new1 NA2 53 34 8 > > My problem comes when I try to put this in line with pipes and the mutate > function: > > test1 %>% > mutate(varb = translate_seer_numeric(var = old, 90, lup))#### > Error: Problem with `mutate()` input `varb`. > x Join columns must be present in data. > x Problem with `old`. > i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`. > > Thoughts?? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On 1/19/21 7:50 AM, Steven Rigatti wrote:> I am having some problems with what seems like a pretty simple issue. I > have some data where I want to convert numbers. Specifically, this is > cancer data and the size of tumors is encoded using millimeter > measurements. However, if the actual measurement is not available the > coding may imply a less specific range of sizes. For instance numbers 0-89 > may indicate size in mm, but 90 indicates "greater than 90 mm" , 91 > indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15, etc. > > I have many such tables so I would like to be able to write a function > which takes as input a threshold over which new values need to be looked > up, and the new lookup table, returning the new values. > > I successfully wrote the function: > > translate_seer_numeric <- function(var, upper, lookup) { > names(lookup) <- c('old','new') > names(var) <- 'old' > var <- as.data.frame(var) > lookup2 <- data.frame(old = c(1:upper), > new = c(1:upper)) > lookup3 <- rbind(lookup, lookup2) > print(var) > res <- left_join(var, lookup3, by = 'old') %>% > select(new) > > res > > } > > test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95, 99),This throws an error when copy-pasted, since you posted in html and there was no line separator.> new = c(3, 5, NA)) > translate_seer_numeric(test1, 90, lup) > > The above test generates the desired output: > > old1 992 953 934 8 > new1 NA2 53 34 8 > > My problem comes when I try to put this in line with pipes and the mutate > function: > > test1 %>% > mutate(varb = translate_seer_numeric(var = old, 90, lup))#####Added: library(tidyverse)?? # since many people on rhelp are not particularly "tidy".> Error: Problem with `mutate()` input `varb`. > x Join columns must be present in data. > x Problem with `old`. > i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.I think I got useful results with this although you might need to extract the "new" column from the dataframe result. test1 %>% ?? mutate(varb = translate_seer_numeric( . , 90, lup)) #---------- ? old 1? 99 2? 95 3? 93 4?? 8 ? old new 1? 99? NA 2? 95?? 5 3? 93?? 3 4?? 8?? 8 ?When you want to refer to the prior result in a piped chain you use a dot ("."). I'm guessing you know this. But what I saw was that your successful test case was using a dataframe as the input to the first parameter of translate_seer_numeric, but you were apparently passing a column name when it was being used in a pipe. The error message wasn't particularly helpful to me, but maybe that's because I don't have enough experience in that non-standard universe. It did tell us that the there was a problem with "varb" and that was probably because that was the wrong parameter name. However even changing the call to just `var=old` would probably have failed as well because you didn't write the function to accept a variable name as the first parameter. Best; David.> > Thoughts?? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Your translate... function seems unnecessarily complicated and reusing the name 'var' for both the input and the data.frame containing the input makes it confusing to me. The following replacement, f, uses your algorithm but I think gets the answer you want. f <- function(var, upper, lookup) { names(lookup) <- c('old','new') var_df <- data.frame(old = var) lookup2 <- data.frame(old = c(1:upper), new = c(1:upper)) lookup3 <- rbind(lookup, lookup2) res <- left_join(var_df, lookup3, by = 'old') res$new # return a vector, not a data.frame or tibble. } E.g.,> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm f(YYY, 90, lup))XXX YYY YYY_mm 1 95 55 55 2 93 66 66 3 10 93 3 4 20 98 NA You can modify this so that it names the output column based on the name of the input column (by returning a data.frame/tibble instead of a numeric vector): f1 <- function(var, upper, lookup, new_varname paste0(deparse1(substitute(var)), "_mm")) { names(lookup) <- c('old',new_varname) var_df <- data.frame(old = var) lookup2 <- data.frame(old = c(1:upper), new = c(1:upper)) names(lookup2)[2] <- new_varname lookup3 <- rbind(lookup, lookup2) res <- left_join(var_df, lookup3, by = 'old')[2] res } E.g.,> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,90, lup)) XXX YYY YYY_mm 1 95 55 55 2 93 66 66 3 10 93 3 4 20 98 NA -Bill On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <sjrigatti at gmail.com> wrote:> I am having some problems with what seems like a pretty simple issue. I > have some data where I want to convert numbers. Specifically, this is > cancer data and the size of tumors is encoded using millimeter > measurements. However, if the actual measurement is not available the > coding may imply a less specific range of sizes. For instance numbers 0-89 > may indicate size in mm, but 90 indicates "greater than 90 mm" , 91 > indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15, > etc. > > I have many such tables so I would like to be able to write a function > which takes as input a threshold over which new values need to be looked > up, and the new lookup table, returning the new values. > > I successfully wrote the function: > > translate_seer_numeric <- function(var, upper, lookup) { > names(lookup) <- c('old','new') > names(var) <- 'old' > var <- as.data.frame(var) > lookup2 <- data.frame(old = c(1:upper), > new = c(1:upper)) > lookup3 <- rbind(lookup, lookup2) > print(var) > res <- left_join(var, lookup3, by = 'old') %>% > select(new) > > res > > } > > test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95, > 99), > new = c(3, 5, NA)) > translate_seer_numeric(test1, 90, lup) > > The above test generates the desired output: > > old1 992 953 934 8 > new1 NA2 53 34 8 > > My problem comes when I try to put this in line with pipes and the mutate > function: > > test1 %>% > mutate(varb = translate_seer_numeric(var = old, 90, lup))#### > Error: Problem with `mutate()` input `varb`. > x Join columns must be present in data. > x Problem with `old`. > i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`. > > Thoughts?? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]