Martin Maechler
2016-Aug-06 14:18 UTC
[Rd] ifelse() woes ... can we agree on a ifelse2() ?
Dear R-devel readers, ( = people interested in the improvement and development of R). This is not the first time that this topic is raised. and I am in now state to promise that anything will result from this thread ... Still, I think the majority among us has agreed that 1) you should never use ifelse(test, yes, no) if you know that length(test) == 1, in which case if(test) yes else no is much preferable (though not equivalent: ifelse(NA, 1, 0) !) 2) it is potentially inefficient by design since it (almost always) evaluates both 'yes' and 'no' independent of 'test'. 3) is a nice syntax in principle, and so is often used, also by myself, inspite of '2)' just because nicely self-explaining code is sometimes clearly preferable to more efficient but less readable code. 4) it is too late to change ifelse() fundamentally, because it works according to its documentation (and I think very much the same as in S and S-PLUS) and has done so for ages. ---- and if you don't agree with 1) -- 4) you may pretend for a moment instead of starting to discuss them thoroughly. Recently, a useR has alerted me to the fact that my Rmpfr's package arbitrary (high) precision numbers don't work for a relatively simple function. As I found the reason was that that simple function used ifelse(.,.,.) and the problem was that the (*simplified*) gist of ifelse(test, yes, no) is test <- as.logical(test) ans <- test ans[ test] <- yes ans[!test] <- no and in case of Rmpfr, the problem is that <logical>[<logical>] <- <mpfr> cannot work correctly [[ maybe it could in a future R, if I could define a method setReplaceMethod("[", c("logical,"logical","mpfr"), function(x,i,value) .........) but that currently fails as the C-low-level dispatch for '[<-' does not look at the full signature ]] I vaguely remember having seen proposals for light weight substitutes for ifelse(), called ifelse1() or ifelse2() etc... and I wonder if we should not try to see if there was a version that could go into "base R" (maybe the 'utils' package, not 'base'; that's not so important). One difference to ifelse() would be that the type/mode/class of the result is not initialized by logical, by default but rather by the "common type" of yes and no ... maybe determined by c()'ing parts of those. The idea was that this would work for most S3 and S4 objects for which logical 'length', (logical) indexing '[', and 'rep()' works. One possibility would also be to consider a "numbers-only" or rather "same type"-only {e.g., would also work for characters} version. Of course, an ifelse2() should also be more efficient than ifelse() in typical "atomic" cases. Thank you for your ideas and suggestions. Again, there's no promise of implementation coming along with this e-mail. Martin Maechler ETH Zurich
have you tried seeing if `dplyr::if_else` behaves more to your liking? On Sat, Aug 6, 2016 at 10:20 AM Martin Maechler <maechler at stat.math.ethz.ch> wrote:> Dear R-devel readers, > ( = people interested in the improvement and development of R). > > This is not the first time that this topic is raised. > and I am in now state to promise that anything will result from > this thread ... > > Still, I think the majority among us has agreed that > > 1) you should never use ifelse(test, yes, no) > if you know that length(test) == 1, in which case > if(test) yes else no > is much preferable (though not equivalent: ifelse(NA, 1, 0) !) > > 2) it is potentially inefficient by design since it (almost > always) evaluates both 'yes' and 'no' independent of 'test'. > > 3) is a nice syntax in principle, and so is often used, also by > myself, inspite of '2)' just because nicely self-explaining > code is sometimes clearly preferable to more efficient but > less readable code. > > 4) it is too late to change ifelse() fundamentally, because it > works according to its documentation > (and I think very much the same as in S and S-PLUS) and has > done so for ages. > > ---- and if you don't agree with 1) -- 4) you may pretend for > a moment instead of starting to discuss them thoroughly. > > Recently, a useR has alerted me to the fact that my Rmpfr's > package arbitrary (high) precision numbers don't work for a > relatively simple function. > > As I found the reason was that that simple function used > ifelse(.,.,.) > and the problem was that the (*simplified*) gist of ifelse(test, yes, no) > is > > test <- as.logical(test) > ans <- test > ans[ test] <- yes > ans[!test] <- no > > and in case of Rmpfr, the problem is that > > <logical>[<logical>] <- <mpfr> > > cannot work correctly > > [[ maybe it could in a future R, if I could define a method > > setReplaceMethod("[", c("logical,"logical","mpfr"), > function(x,i,value) .........) > > but that currently fails as the C-low-level dispatch for '[<-' > does not look at the full signature > ]] > > I vaguely remember having seen proposals for > light weight substitutes for ifelse(), called > ifelse1() or > ifelse2() etc... > > and I wonder if we should not try to see if there was a version > that could go into "base R" (maybe the 'utils' package, not > 'base'; that's not so important). > > One difference to ifelse() would be that the type/mode/class of the result > is not initialized by logical, by default but rather by the > "common type" of yes and no ... maybe determined by c()'ing > parts of those. > The idea was that this would work for most S3 and S4 objects for > which logical 'length', (logical) indexing '[', and 'rep()' works. > > One possibility would also be to consider a "numbers-only" or > rather "same type"-only {e.g., would also work for characters} > version. > > Of course, an ifelse2() should also be more efficient than > ifelse() in typical "atomic" cases. > > > Thank you for your ideas and suggestions. > Again, there's no promise of implementation coming along with this e-mail. > > Martin Maechler > ETH Zurich > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
On 06/08/2016 10:18 AM, Martin Maechler wrote:> Dear R-devel readers, > ( = people interested in the improvement and development of R). > > This is not the first time that this topic is raised. > and I am in now state to promise that anything will result from > this thread ... > > Still, I think the majority among us has agreed that > > 1) you should never use ifelse(test, yes, no) > if you know that length(test) == 1, in which case > if(test) yes else no > is much preferable (though not equivalent: ifelse(NA, 1, 0) !) > > 2) it is potentially inefficient by design since it (almost > always) evaluates both 'yes' and 'no' independent of 'test'. > > 3) is a nice syntax in principle, and so is often used, also by > myself, inspite of '2)' just because nicely self-explaining > code is sometimes clearly preferable to more efficient but > less readable code. > > 4) it is too late to change ifelse() fundamentally, because it > works according to its documentation > (and I think very much the same as in S and S-PLUS) and has > done so for ages. > > ---- and if you don't agree with 1) -- 4) you may pretend for > a moment instead of starting to discuss them thoroughly. > > Recently, a useR has alerted me to the fact that my Rmpfr's > package arbitrary (high) precision numbers don't work for a > relatively simple function. > > As I found the reason was that that simple function used > ifelse(.,.,.) > and the problem was that the (*simplified*) gist of ifelse(test, yes, no) > is > > test <- as.logical(test) > ans <- test > ans[ test] <- yes > ans[!test] <- no > > and in case of Rmpfr, the problem is that > > <logical>[<logical>] <- <mpfr> > > cannot work correctly > > [[ maybe it could in a future R, if I could define a method > > setReplaceMethod("[", c("logical,"logical","mpfr"), > function(x,i,value) .........) > > but that currently fails as the C-low-level dispatch for '[<-' > does not look at the full signature > ]] > > I vaguely remember having seen proposals for > light weight substitutes for ifelse(), called > ifelse1() or > ifelse2() etc... > > and I wonder if we should not try to see if there was a version > that could go into "base R" (maybe the 'utils' package, not > 'base'; that's not so important). > > One difference to ifelse() would be that the type/mode/class of the result > is not initialized by logical, by default but rather by the > "common type" of yes and no ... maybe determined by c()'ing > parts of those. > The idea was that this would work for most S3 and S4 objects for > which logical 'length', (logical) indexing '[', and 'rep()' works.I think your description is more or less: test <- as.logical(test) ans <- c(yes, no)[seq_along(test)] ans <- ans[seq_along(test)] ans[ test] <- yes[test] ans[!test] <- no[!test] (though the implementation details would vary, and recycling rules would apply if the lengths of test, yes and no weren't all equal). You didn't mention what happens with attributes. Currently we keep the attributes from test, which probably doesn't make a lot of sense. In particular, ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4)) returns nonsense, as does my translation of your idea above. That implementation also drops attributes. I'd say this definition would make more sense: test <- as.logical(test) ans <- yes ans[!test] <- no[!test] (and this is suggested as an alternative in ?ifelse). It generates an error in my test example, which seems reasonable. It gives the "right" thing in ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2)) because the factors have the same levels. The lack of symmetry between yes and no is slightly irksome, but I would think in most cases you could choose attributes from just one of yes and no to be what you want in the result (and use !test to swap the order if necessary).> > One possibility would also be to consider a "numbers-only" or > rather "same type"-only {e.g., would also work for characters} > version.I don't know what you mean by these.> > Of course, an ifelse2() should also be more efficient than > ifelse() in typical "atomic" cases.I don't think it is obvious how to make it more efficient. ifelse() already skips evaluation of yes or no if not needed. (An argument could be made that it would be better to guarantee evaluation of both, but it's usually easy enough to do this explicitly, so I don't see a need.) Duncan Murdoch> > > Thank you for your ideas and suggestions. > Again, there's no promise of implementation coming along with this e-mail. > > Martin Maechler > ETH Zurich > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
On 06.08.2016 17:30, Duncan Murdoch wrote:> On 06/08/2016 10:18 AM, Martin Maechler wrote: >> Dear R-devel readers, >> ( = people interested in the improvement and development of R). >> >> This is not the first time that this topic is raised. >> and I am in now state to promise that anything will result from >> this thread ... >> >> Still, I think the majority among us has agreed that >> >> 1) you should never use ifelse(test, yes, no) >> if you know that length(test) == 1, in which case >> if(test) yes else no >> is much preferable (though not equivalent: ifelse(NA, 1, 0) !) >> >> 2) it is potentially inefficient by design since it (almost >> always) evaluates both 'yes' and 'no' independent of 'test'. >> >> 3) is a nice syntax in principle, and so is often used, also by >> myself, inspite of '2)' just because nicely self-explaining >> code is sometimes clearly preferable to more efficient but >> less readable code. >> >> 4) it is too late to change ifelse() fundamentally, because it >> works according to its documentation >> (and I think very much the same as in S and S-PLUS) and has >> done so for ages. >> >> ---- and if you don't agree with 1) -- 4) you may pretend for >> a moment instead of starting to discuss them thoroughly. >> >> Recently, a useR has alerted me to the fact that my Rmpfr's >> package arbitrary (high) precision numbers don't work for a >> relatively simple function. >> >> As I found the reason was that that simple function used >> ifelse(.,.,.) >> and the problem was that the (*simplified*) gist of ifelse(test, yes, no) >> is >> >> test <- as.logical(test) >> ans <- test >> ans[ test] <- yes >> ans[!test] <- no >> >> and in case of Rmpfr, the problem is that >> >> <logical>[<logical>] <- <mpfr> >> >> cannot work correctly >> >> [[ maybe it could in a future R, if I could define a method >> >> setReplaceMethod("[", c("logical,"logical","mpfr"), >> function(x,i,value) .........) >> >> but that currently fails as the C-low-level dispatch for '[<-' >> does not look at the full signature >> ]] >> >> I vaguely remember having seen proposals for >> light weight substitutes for ifelse(), called >> ifelse1() or >> ifelse2() etc... >> >> and I wonder if we should not try to see if there was a version >> that could go into "base R" (maybe the 'utils' package, not >> 'base'; that's not so important). >> >> One difference to ifelse() would be that the type/mode/class of the >> result >> is not initialized by logical, by default but rather by the >> "common type" of yes and no ... maybe determined by c()'ing >> parts of those. >> The idea was that this would work for most S3 and S4 objects for >> which logical 'length', (logical) indexing '[', and 'rep()' works. > > I think your description is more or less: > > test <- as.logical(test) > ans <- c(yes, no)[seq_along(test)] > ans <- ans[seq_along(test)] > ans[ test] <- yes[test] > ans[!test] <- no[!test] > > (though the implementation details would vary, and recycling rules would > apply if the lengths of test, yes and no weren't all equal). > > You didn't mention what happens with attributes. Currently we keep the > attributes from test, which probably doesn't make a lot of sense. In > particular, > > ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4)) > > returns nonsense, as does my translation of your idea above. > > That implementation also drops attributes. I'd say this definition would > make more sense: > > test <- as.logical(test) > ans <- yes > ans[!test] <- no[!test] > > (and this is suggested as an alternative in ?ifelse). It generates an > error in my test example, which seems reasonable. It gives the "right" > thing in > > ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2)) > > because the factors have the same levels. > > The lack of symmetry between yes and no is slightly irksome, but I would > think in most cases you could choose attributes from just one of yes and > no to be what you want in the result (and use !test to swap the order if > necessary). > >> >> One possibility would also be to consider a "numbers-only" or >> rather "same type"-only {e.g., would also work for characters} >> version. > > I don't know what you mean by these. >> >> Of course, an ifelse2() should also be more efficient than >> ifelse() in typical "atomic" cases. > > I don't think it is obvious how to make it more efficient. ifelse() > already skips evaluation of yes or no if not needed. (An argument could > be made that it would be better to guarantee evaluation of both, but > it's usually easy enough to do this explicitly, so I don't see a need.)Same from here: I do not see how this can easily be made more efficient, since evaluating ony parts causes a lot of copies of objects whichs slows stuff down, hence you need some complexity in yes and no to make evaluations of parts of them more efficient on R level. Anyway, to solve the problem, we may want an add argument to ifelse2() that allows for specification of the type of the result (as vapply does)? Best, Uwe> Duncan Murdoch > >> >> >> Thank you for your ideas and suggestions. >> Again, there's no promise of implementation coming along with this >> e-mail. >> >> Martin Maechler >> ETH Zurich >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2016-Aug-12 11:00 UTC
[Rd] ifelse() woes ... can we agree on a ifelse2() ?
Excuse for the delay; I had waited for other / additional comments and reactions (and been distracted with other urgent issues), but do want to keep this thread alive [inline] ..>>>>> Duncan Murdoch <murdoch.duncan at gmail.com> >>>>> on Sat, 6 Aug 2016 11:30:08 -0400 writes:> On 06/08/2016 10:18 AM, Martin Maechler wrote: >> Dear R-devel readers, >> ( = people interested in the improvement and development of R). >> >> This is not the first time that this topic is raised. >> and I am in now state to promise that anything will result from >> this thread ... >> >> Still, I think the majority among us has agreed that >> >> 1) you should never use ifelse(test, yes, no) >> if you know that length(test) == 1, in which case >> if(test) yes else no >> is much preferable (though not equivalent: ifelse(NA, 1, 0) !) >> >> 2) it is potentially inefficient by design since it (almost >> always) evaluates both 'yes' and 'no' independent of 'test'. >> >> 3) is a nice syntax in principle, and so is often used, also by >> myself, inspite of '2)' just because nicely self-explaining >> code is sometimes clearly preferable to more efficient but >> less readable code. >> >> 4) it is too late to change ifelse() fundamentally, because it >> works according to its documentation >> (and I think very much the same as in S and S-PLUS) and has >> done so for ages. >> >> ---- and if you don't agree with 1) -- 4) you may pretend for >> a moment instead of starting to discuss them thoroughly. >> >> Recently, a useR has alerted me to the fact that my Rmpfr's >> package arbitrary (high) precision numbers don't work for a >> relatively simple function. >> >> As I found the reason was that that simple function used >> ifelse(.,.,.) >> and the problem was that the (*simplified*) gist of ifelse(test, yes, no) >> is >> >> test <- as.logical(test) >> ans <- test >> ans[ test] <- yes >> ans[!test] <- no >> >> and in case of Rmpfr, the problem is that >> >> <logical>[<logical>] <- <mpfr> >> >> cannot work correctly >> >> [[ maybe it could in a future R, if I could define a method >> >> setReplaceMethod("[", c("logical,"logical","mpfr"), >> function(x,i,value) .........) >> >> but that currently fails as the C-low-level dispatch for '[<-' >> does not look at the full signature >> ]] >> >> I vaguely remember having seen proposals for >> light weight substitutes for ifelse(), called >> ifelse1() or >> ifelse2() etc... >> >> and I wonder if we should not try to see if there was a version >> that could go into "base R" (maybe the 'utils' package, not >> 'base'; that's not so important). >> >> One difference to ifelse() would be that the type/mode/class of the result >> is not initialized by logical, by default but rather by the >> "common type" of yes and no ... maybe determined by c()'ing >> parts of those. >> The idea was that this would work for most S3 and S4 objects for >> which logical 'length', (logical) indexing '[', and 'rep()' works. > I think your description is more or less: > test <- as.logical(test) > ans <- c(yes, no)[seq_along(test)] > ans <- ans[seq_along(test)] > ans[ test] <- yes[test] > ans[!test] <- no[!test] > (though the implementation details would vary, and recycling rules would > apply if the lengths of test, yes and no weren't all equal). Yes, more or less, notably, conceptually a version of c(yes, no) to get a common mode/class.... but as you mention below, c() cannot be used alone because famously "misbehaves" e.g., for factors. > You didn't mention what happens with attributes. Currently we keep the > attributes from test, which probably doesn't make a lot of sense. In > particular, > ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4)) > returns nonsense, as does my translation of your idea above. yes. factor()s or "Date" or "POSIXt" objects are 'base R' examples where an alternative ifelse() would have to work (ideally automatically with no special-case code!) by "keeping the class". > That implementation also drops attributes. I'd say this definition would > make more sense: > test <- as.logical(test) > ans <- yes > ans[!test] <- no[!test] > (and this is suggested as an alternative in ?ifelse). It generates an > error in my test example, which seems reasonable. It gives the "right" > thing in > ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2)) > because the factors have the same levels. > The lack of symmetry between yes and no is slightly irksome, but I would > think in most cases you could choose attributes from just one of yes and > no to be what you want in the result (and use !test to swap the order if > necessary). Yes, you are right, that's a good point: if we don't want to "take everything" from 'test' (which is symmetric in 'yes' and 'no'), but rather from 'yes' and 'no', we either must be "very strict" -- as e.g., dplyr::if_else() -- or then have (border) cases where the first argument takes precedence over the second as in other cases in R e.g. how names/dimnames of the result are determined in cbind(), rbind(), and I think data.frame(). >> One possibility would also be to consider a "numbers-only" or >> rather "same type"-only {e.g., would also work for characters} >> version. > I don't know what you mean by these. In the mean time, Bob Rudis mentioned dplyr::if_else(), which is very relevant, thank you Bob! As I have found, that actually works in such a "same type"-only way: It does not try to coerce, but gives an error when the classes differ, even in this somewhat debatable case : > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error: `false` has type 'double' not 'integer' > As documented, if_else() is clearly stricter than ifelse() and e.g., also does no recycling (but of length() 1). I'm dropping the remaining issue of efficiency as I replied to that (on Aug. 8) in the other branch of this thread. Hadley's if_else() is really nice in its clean approach and does fulfill the main important desideratum, and hence e.g., works for "Date" etc. My goal however would still be considerably closer to base::ifelse(), namely an alternative would - *coerce* types/classes as much as sensible, - (by default at least) recycle (test, yes, no) to common length Unfortunately I've been too busy these days, also with a couple of non-R (and some non-work) matters, so am not yet proposing concrete alternatives. More thoughts, ideas and proposals are still very welcome; as with Duncan, it does make much sense to discuss the theme already relatively abstractly! Martin