Dear R cognoscenti, While having NA as a native type is nifty, it is annoying when making binary choices. Question: Is there anything bad about writing comparison functions that behavior like %in% (which I love) and ignore NAs? "%>%" <- function(table, x) { return(which(table > x)) } "%<%" <- function(table, x) { return(which(table < x)) } test <- c(NA, 1:4,NA,5) test %>% 2 # [1] 3 4 6 test %<% 2 # [1] 1 Why do I want to do this? Because in coding, I often end up with big chunks looking like this: ((mydataframeName$myvariableName > 2 & !is.na(mydataframeName$myvariableName)) & (mydataframeName$myotherVariableName == "male" & !is.na(mydataframeName$myotherVariableName))) Which is much less readable/maintainable/editable than mydataframeName$myvariableName > 2 & mydataframeName$myotherVariableName == "male" But ">" returns anything involving an NA, so it breaks selection statements (which can't contain NA) and leaves lines in data that are wished to be excluded If this does not have nasty side-effects, it would be a great addition to GTD* in R If anyone knows a short cut to code the effect I wish, love to hear it. Cheers, tim * GTD = Getting Things Done -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
On 13/09/2011 12:42 PM, Timothy Bates wrote:> Dear R cognoscenti, > > While having NA as a native type is nifty, it is annoying when making binary choices. > > Question: Is there anything bad about writing comparison functions that behavior like %in% (which I love) and ignore NAs? > > "%>%"<- function(table, x) { > return(which(table> x)) > } > > "%<%"<- function(table, x) { > return(which(table< x)) > } > > test<- c(NA, 1:4,NA,5) > test %>% 2 > # [1] 3 4 6 > test %<% 2 > # [1] 1 > > Why do I want to do this? > > Because in coding, I often end up with big chunks looking like this: > > ((mydataframeName$myvariableName> 2& !is.na(mydataframeName$myvariableName))& (mydataframeName$myotherVariableName == "male"& !is.na(mydataframeName$myotherVariableName))) > > Which is much less readable/maintainable/editable than > > mydataframeName$myvariableName> 2& mydataframeName$myotherVariableName == "male" > > But ">" returns anything involving an NA, so it breaks selection statements (which can't contain NA) and leaves lines in data that are wished to be excluded > > If this does not have nasty side-effects, it would be a great addition to GTD* in R > > If anyone knows a short cut to code the effect I wish, love to hear it.I would suggest subsetting first if you really want to ignore the NAs. A problem with your suggestion is that since it doesn't return a logical vector, it will behave quite differently from a standard comparison in an expression. For example (a < 5) & (b < 6) will work (but sometimes generate NAs), but (a %<% 5) & (b %<% 6) will not. (You'd need to use the intersect() function.) Duncan Murdoch> Cheers, > tim > > * GTD = Getting Things Done > >
> Because in coding, I often end up with big chunks looking like this: > > ((mydataframeName$myvariableName > 2 & !is.na(mydataframeName$myvariableName)) & (mydataframeName$myotherVariableName == "male" & !is.na(mydataframeName$myotherVariableName))) > > Which is much less readable/maintainable/editable than > > mydataframeName$myvariableName > 2 & mydataframeName$myotherVariableName == "male"Use subset: subset(mydataframeName, myvariableName > 2 & myotherVariableName == "male") (subset automatically treats NAs as false) Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
One additional thing to note is that %>% will have different precedence than > (something that was pointed out to me based on %<% that is in the TeachingDemos package). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Timothy Bates > Sent: Tuesday, September 13, 2011 10:42 AM > To: R list > Subject: [R] x %>% y as an alternative to which( x > y) > > Dear R cognoscenti, > > While having NA as a native type is nifty, it is annoying when making > binary choices. > > Question: Is there anything bad about writing comparison functions that > behavior like %in% (which I love) and ignore NAs? > > "%>%" <- function(table, x) { > return(which(table > x)) > } > > "%<%" <- function(table, x) { > return(which(table < x)) > } > > test <- c(NA, 1:4,NA,5) > test %>% 2 > # [1] 3 4 6 > test %<% 2 > # [1] 1 > > Why do I want to do this? > > Because in coding, I often end up with big chunks looking like this: > > ((mydataframeName$myvariableName > 2 & > !is.na(mydataframeName$myvariableName)) & > (mydataframeName$myotherVariableName == "male" & > !is.na(mydataframeName$myotherVariableName))) > > Which is much less readable/maintainable/editable than > > mydataframeName$myvariableName > 2 & > mydataframeName$myotherVariableName == "male" > > But ">" returns anything involving an NA, so it breaks selection > statements (which can't contain NA) and leaves lines in data that are > wished to be excluded > > If this does not have nasty side-effects, it would be a great addition > to GTD* in R > > If anyone knows a short cut to code the effect I wish, love to hear it. > > Cheers, > tim > > * GTD = Getting Things Done > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.