Hello All, Season's greetings! Am trying to replicate some SAS code in R. The SAS code uses if-then-do code blocks. I've been trying to do likewise in R as that seems to be the most reliable way to get the same result. Below is some toy data and some code that does work. There are some things I don't necessarily like about the code though. So I was hoping some people could help make it better. One thing I don't like is that the within function reverses the order of the computed columns such that test1:test5 becomes test5:test1. I've used a mutate to overcome that but would prefer not to have to do so. Another, perhaps very small thing, is the need to calculate an ID variable that becomes the basis for a grouping.? I did considerable Internet searching for R code that conditionally computes blocks of code. I didn't find much though and so am wondering if my search terms were not sufficient or if there is some other reason. It occurred to me that maybe if-then-do code blocks like we often see in SAS as are frowned upon and therefore not much implemented. I'd be interested in seeing more R-compatible approaches if this is the case. I've learned that it's a mistake to try and make R be like SAS. It's better to let R be R. Trouble is I'm not always sure how to do that. Thanks, Paul d1 <- data.frame(workshop=rep(1:2,4), ??????????????? gender=rep(c("f","m"),each=4)) library(tibble) library(plyr) d2 <- d1 %>% ? rownames_to_column("ID") %>% ? mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% ? ddply("ID", ??????? within, ??????? if (gender == "f" & workshop == 1) { ????????? test1 <- 1 ????????? test1 <- 6 + test1 ????????? test2 <- 2 + test1 ????????? test4 <- 1 ????????? test5 <- 1 ??????? } else { ????????? test1 <- test2 <- test4 <- test5 <- 0 ??????? })
Dear Paul, R's power is that is works vectorised. Unlike SAS which is rowbased. Using R in a SAS way will lead to very slow code. Your examples can be written vectorised d1 %>% rownames_to_column("ID") %>% mutate( test1 = ifelse(gender == "f" & workshop == 1, 7, 0), test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), test4 = ifelse(gender == "f" & workshop == 1, 1, 0), test5 = test4 ) Here is a speed comparison. library(microbenchmark) microbenchmark( vector = {d1 %>% rownames_to_column("ID") %>% mutate( test1 = ifelse(gender == "f" & workshop == 1, 7, 0), test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), test4 = ifelse(gender == "f" & workshop == 1, 1, 0), test5 = test4 ) }, rowbased = {d1 %>% rownames_to_column("ID") %>% mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% ddply("ID", within, if (gender == "f" & workshop == 1) { test1 <- 1 test1 <- 6 + test1 test2 <- 2 + test1 test4 <- 1 test5 <- 1 } else { test1 <- test2 <- test4 <- test5 <- 0 })} ) Best regards, Thierry ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help < r-help at r-project.org>:> Hello All, > > Season's greetings! > > Am trying to replicate some SAS code in R. The SAS code uses if-then-do > code blocks. I've been trying to do likewise in R as that seems to be the > most reliable way to get the same result. > > Below is some toy data and some code that does work. There are some things > I don't necessarily like about the code though. So I was hoping some people > could help make it better. One thing I don't like is that the within > function reverses the order of the computed columns such that test1:test5 > becomes test5:test1. I've used a mutate to overcome that but would prefer > not to have to do so. > > Another, perhaps very small thing, is the need to calculate an ID > variable that becomes the basis for a grouping. > > I did considerable Internet searching for R code that conditionally > computes blocks of code. I didn't find much though and so am wondering if > my search terms were not sufficient or if there is some other reason. It > occurred to me that maybe if-then-do code blocks like we often see in SAS > as are frowned upon and therefore not much implemented. > > I'd be interested in seeing more R-compatible approaches if this is the > case. I've learned that it's a mistake to try and make R be like SAS. It's > better to let R be R. Trouble is I'm not always sure how to do that. > > Thanks, > > Paul > > > d1 <- data.frame(workshop=rep(1:2,4), > gender=rep(c("f","m"),each=4)) > > library(tibble) > library(plyr) > > d2 <- d1 %>% > rownames_to_column("ID") %>% > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > ddply("ID", > within, > if (gender == "f" & workshop == 1) { > test1 <- 1 > test1 <- 6 + test1 > test2 <- 2 + test1 > test4 <- 1 > test5 <- 1 > } else { > test1 <- test2 <- test4 <- test5 <- 0 > }) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
this can be dome even faster, and I think more easily read, using only base R d1 <- data.frame(workshop=rep(1:2,4), gender=rep(c("f","m"),each=4)) ## needed by vector and rowbased, not needed by rmh library(tibble) library(plyr) library(magrittr) microbenchmark( vector = {d1 %>% rownames_to_column("ID") %>% mutate( test1 = ifelse(gender == "f" & workshop == 1, 7, 0), test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), test4 = ifelse(gender == "f" & workshop == 1, 1, 0), test5 = test4 ) }, rowbased = {d1 %>% rownames_to_column("ID") %>% mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% ddply("ID", within, if (gender == "f" & workshop == 1) { test1 <- 1 test1 <- 6 + test1 test2 <- 2 + test1 test4 <- 1 test5 <- 1 } else { test1 <- test2 <- test4 <- test5 <- 0 })}, rmh={ data.frame(ID=rownames(d1), d1, test1=0, test2=0, test4=0, test5=0) myRowSubset <- d3$gender=="f" & d3$workshop==1 test1 <- 1 d3[myRowSubset, "test1"] <- test1 + 6 d3[myRowSubset, "test2"] <- test1 + 6 + 2 d3[myRowSubset, c("test4", "test5")] <- test1 } ) Unit: microseconds expr min lq mean median uq max neval cld vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help <r-help at r-project.org> wrote:> > Dear Paul, > > R's power is that is works vectorised. Unlike SAS which is rowbased. Using > R in a SAS way will lead to very slow code. > > Your examples can be written vectorised > > d1 %>% > rownames_to_column("ID") %>% > mutate( > test1 = ifelse(gender == "f" & workshop == 1, 7, 0), > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), > test4 = ifelse(gender == "f" & workshop == 1, 1, 0), > test5 = test4 > ) > > Here is a speed comparison. > > library(microbenchmark) > microbenchmark( > vector = {d1 %>% > rownames_to_column("ID") %>% > mutate( > test1 = ifelse(gender == "f" & workshop == 1, 7, 0), > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0), > test4 = ifelse(gender == "f" & workshop == 1, 1, 0), > test5 = test4 > ) }, > rowbased = {d1 %>% > rownames_to_column("ID") %>% > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > ddply("ID", > within, > if (gender == "f" & workshop == 1) { > test1 <- 1 > test1 <- 6 + test1 > test2 <- 2 + test1 > test4 <- 1 > test5 <- 1 > } else { > test1 <- test2 <- test4 <- test5 <- 0 > })} > ) > > > Best regards, > > Thierry > > ir. Thierry Onkelinx > Statisticus / Statistician > > Vlaamse Overheid / Government of Flanders > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND > FOREST > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance > thierry.onkelinx at inbo.be > Havenlaan 88 bus 73, 1000 Brussel > www.inbo.be > > /////////////////////////////////////////////////////////////////////////////////////////// > To call in the statistician after the experiment is done may be no more > than asking him to perform a post-mortem examination: he may be able to say > what the experiment died of. ~ Sir Ronald Aylmer Fisher > The plural of anecdote is not data. ~ Roger Brinner > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > /////////////////////////////////////////////////////////////////////////////////////////// > > <https://www.inbo.be> > > > Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help < > r-help at r-project.org>: > > > Hello All, > > > > Season's greetings! > > > > Am trying to replicate some SAS code in R. The SAS code uses if-then-do > > code blocks. I've been trying to do likewise in R as that seems to be the > > most reliable way to get the same result. > > > > Below is some toy data and some code that does work. There are some things > > I don't necessarily like about the code though. So I was hoping some people > > could help make it better. One thing I don't like is that the within > > function reverses the order of the computed columns such that test1:test5 > > becomes test5:test1. I've used a mutate to overcome that but would prefer > > not to have to do so. > > > > Another, perhaps very small thing, is the need to calculate an ID > > variable that becomes the basis for a grouping. > > > > I did considerable Internet searching for R code that conditionally > > computes blocks of code. I didn't find much though and so am wondering if > > my search terms were not sufficient or if there is some other reason. It > > occurred to me that maybe if-then-do code blocks like we often see in SAS > > as are frowned upon and therefore not much implemented. > > > > I'd be interested in seeing more R-compatible approaches if this is the > > case. I've learned that it's a mistake to try and make R be like SAS. It's > > better to let R be R. Trouble is I'm not always sure how to do that. > > > > Thanks, > > > > Paul > > > > > > d1 <- data.frame(workshop=rep(1:2,4), > > gender=rep(c("f","m"),each=4)) > > > > library(tibble) > > library(plyr) > > > > d2 <- d1 %>% > > rownames_to_column("ID") %>% > > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > > ddply("ID", > > within, > > if (gender == "f" & workshop == 1) { > > test1 <- 1 > > test1 <- 6 + test1 > > test2 <- 2 + test1 > > test4 <- 1 > > test5 <- 1 > > } else { > > test1 <- test2 <- test4 <- test5 <- 0 > > }) > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
There is some discussion of approaches to this here: https://stackoverflow.com/questions/34096162/dplyr-mutate-replace-on-a-subset-of-rows/34096575#34096575 On Mon, Dec 17, 2018 at 10:30 AM Paul Miller via R-help <r-help at r-project.org> wrote:> > Hello All, > > Season's greetings! > > Am trying to replicate some SAS code in R. The SAS code uses if-then-do code blocks. I've been trying to do likewise in R as that seems to be the most reliable way to get the same result. > > Below is some toy data and some code that does work. There are some things I don't necessarily like about the code though. So I was hoping some people could help make it better. One thing I don't like is that the within function reverses the order of the computed columns such that test1:test5 becomes test5:test1. I've used a mutate to overcome that but would prefer not to have to do so. > > Another, perhaps very small thing, is the need to calculate an ID variable that becomes the basis for a grouping. > > I did considerable Internet searching for R code that conditionally computes blocks of code. I didn't find much though and so am wondering if my search terms were not sufficient or if there is some other reason. It occurred to me that maybe if-then-do code blocks like we often see in SAS as are frowned upon and therefore not much implemented. > > I'd be interested in seeing more R-compatible approaches if this is the case. I've learned that it's a mistake to try and make R be like SAS. It's better to let R be R. Trouble is I'm not always sure how to do that. > > Thanks, > > Paul > > > d1 <- data.frame(workshop=rep(1:2,4), > gender=rep(c("f","m"),each=4)) > > library(tibble) > library(plyr) > > d2 <- d1 %>% > rownames_to_column("ID") %>% > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>% > ddply("ID", > within, > if (gender == "f" & workshop == 1) { > test1 <- 1 > test1 <- 6 + test1 > test2 <- 2 + test1 > test4 <- 1 > test5 <- 1 > } else { > test1 <- test2 <- test4 <- test5 <- 0 > }) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Hi Gabor, Richard, and Thierry, Thanks very much for your replies. Turns out I had already hit on Gabor's idea of "factor out" in writing an initial draft of the code converting from SAS to R. Below is the link Gabor sent describing this and other approaches. https://stackoverflow.com/questions/34096162/dplyr-mutate-replace-on-a-subset-of-rows/34096575#34096575 At the end of this email are some new test data plus a snippet of my initial R code. The R code I have replicates the result from SAS but is quite verbose. That should be obvious from the snippet. I know I can make the code less verbose with a subsequent draft but wonder if I can simplify to the point where the factor out approach gets a fair test. I'd appreciate it if people could share some ways to make the factor out approach less verbose. I'd also like to see how well some of the other approaches might work with these data. I spent considerable time looking at the link Gabor sent as well as the other responses I received. The mutate_cond function in the link seems promising but it wasn't clear to me how I could avoid having to repeat the various conditions using that approach. Thanks again. Paul library(magrittr) library(dplyr) ? test_data <- ? structure( ??? list( ?? ?? intPatientId = c("3", "37", "48", "6", "6", "5"), ????? intSurveySessionId = c(1L, 10996L, 19264L, 2841L, 28L, 34897L), ????? a_CCMA02 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_), ????? a_CCMA69 = c(7, NA, 0, 2, NA, 0), ????? a_CCMA70 = c(7, 0, NA, 10, NA, NA), ????? a_CCMA72 = c(7, 2, 3, NA, NA, NA), ????? CCMA2 = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,NA_integer_, NA_integer_), ????? a_CCMA05 = c(NA, NA, NA, NA, NA, 0), ????? a_CCMA43 = c(5, 0, 6, 5, NA, NA), ????? a_CCMA44 = c(5, 0, 0, 5, 0, NA), ????? CCMA5 = c(NA, NA, NA, NA, NA, 0) ??? ), ??? class = "data.frame", ??? row.names = c(NA,-6L) ? ) factor_out <- test_data %>% ? mutate( ??? CCMA2_cond = case_when( ????? (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) & ??????? (!is.na(a_CCMA69) & between(a_CCMA69, 0, 10) & ?????????? !is.na(a_CCMA70) & between(a_CCMA70, 0, 10) & ?????????? !is.na(a_CCMA72) & between(a_CCMA72, 0, 10)) ~ "A", ????? (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) & ??????? (is.na(a_CCMA69) | a_CCMA69 < 0 | a_CCMA69 >= 10) & ??????? !is.na(a_CCMA70) & between(a_CCMA70, 0, 10) & ??????? !is.na(a_CCMA72) & between(a_CCMA72, 0, 10) ~ "B", ????? (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) & ??????? (is.na(a_CCMA70) | a_CCMA70 < 0 | a_CCMA70 >= 10) & ??????? between(a_CCMA69, 0, 10) & between(a_CCMA72, 0, 10) ~ "C", ????? (is.na(a_CCMA02) | a_CCMA02 < 0 | a_CCMA02 > 10) & ??????? (is.na(a_CCMA72) | a_CCMA72 < 0 | a_CCMA72 >= 10) & ??????? between(a_CCMA69, 0, 10) & between(a_CCMA70, 0, 10) ~ "D") ? ) %>% ? mutate( ??? CCMA2 = case_when( ????? CCMA2_cond == "A" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) < 0? ~ 0, ????? CCMA2_cond == "A" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) > 10 ~ 10, ????? CCMA2_cond == "A" ~ 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72), ????? TRUE ~ as.double(CCMA2) ??? ), ??? CCMA2 = case_when( ????? CCMA2_cond == "B" & 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) < 0? ~ 0, ????? CCMA2_cond == "B" & 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72) > 10 ~ 10, ????? CCMA2_cond == "B" ~ 0.614 + (0.065 * (a_CCMA70 + a_CCMA72) / 2) + (-0.012 * a_CCMA70) + (0.504 * a_CCMA72), ????? TRUE ~ as.double(CCMA2) ??? ), ??? CCMA2 = case_when( ????? CCMA2_cond == "C" & 0.614 + (0.065 * a_CCMA69) + (-0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72) < 0? ~ 0, ????? CCMA2_cond == "C" & 0.614 + (0.065 * a_CCMA69) + (-0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72) > 10 ~ 10, ????? CCMA2_cond == "C" ~ 0.614 + (0.065 * a_CCMA69) + (-0.012 *(a_CCMA72 + a_CCMA69) / 2 ) + (0.504 * a_CCMA72), ????? TRUE ~ as.double(CCMA2) ??? ), ??? CCMA2 = case_when( ????? CCMA2_cond == "D" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2) < 0? ~ 0, ????? CCMA2_cond == "D" & 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2) > 10 ~ 10, ????? CCMA2_cond == "D" ~ 0.614 + (0.065 * a_CCMA69) + (-0.012 * a_CCMA70 ) + (0.504 *(a_CCMA70 + a_CCMA69) / 2), ????? TRUE ~ as.double(CCMA2) ??? ) ? ) %>% ? select(-CCMA2_cond) %>% ? mutate( ??? CCMA5_condA = if_else( ????? (is.na(a_CCMA05) | a_CCMA05 < 0 | a_CCMA05 > 10), ????? 1, 0 ??? ), ??? CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA43, 0, 10) & between(a_CCMA44, 0, 10), ?????????????????? 0.216 + (0.257 * a_CCMA43) + (0.828 * a_CCMA44), ?????????????????? CCMA5), ??? CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA43, 0, 10) & (is.na(a_CCMA44) | a_CCMA44 < 0 | a_CCMA44 > 10), ?????????????????? 0.216 + (0.257 * a_CCMA43) + (0.828 * a_CCMA43), ?????????????????? CCMA5), ??? CCMA5 = ifelse(CCMA5_condA == 1 & between(a_CCMA44, 0, 10) & (is.na(a_CCMA43) | a_CCMA43 < 0 | a_CCMA43 > 10), ?????????????????? 0.216 + (0.257 * a_CCMA44) + (0.828 * a_CCMA44), ?????????????????? CCMA5), ??? CCMA5 = ifelse(CCMA5_condA == 1 & !is.na(CCMA5) & CCMA5 < 0, ?????????????????? 0, ?????????????????? CCMA5), ??? CCMA5 = ifelse(CCMA5_condA == 1 & CCMA5 > 10, ?????????????????? 10, ?????????????????? CCMA5) ? ) %>% ? select(-CCMA5_condA)