thr3ads.net - R devel - [Rd] ifelse() woes ... can we agree on a ifelse2() ? [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Suharto Anggono Suharto Anggono

2016-Nov-26 17:14 UTC

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Just stating, in 'ifelse', 'test' is not recycled. As I said in
"R-intro: length of 'ifelse' result"
(https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html),
ifelse(condition, a, b) returns a vector of the length of 'condition',
even if 'a' or 'b' is longer.

On current 'ifelse' code in R:
* The part
ans[nas] <- NA
could be omitted because NA's are already in place.
If the part is removed, variable 'nas' is no longer used.
* The any(*) part actually checks the thing that is used as the index vector.
The index vector could be stored and then repeatedly used, like the following.
? ? if (any(sel <- test & ok))
??? ans[sel] <- rep(yes, length.out = length(ans))[sel]
* If 'test' is a factor, doing
storage.mode(test) <- "logical"
is not appropriate, but is.atomic(test) returns TRUE. Maybe use
if(!is.object(test))
instead of
if(is.atomic(test)) .

On ifelse-checks.R:
* In function 'chkIfelse', if the fourth function argument names is not
"NA.", the argument name is changed, but the function body still uses
the old name. That makes error in chkIfelse(ifelseHW) .
A fix:
? ? ? ? if(names(formals(FUN))[[4]] != "NA.") {
? ? ? ? ? ? body(FUN) <- do.call(substitute, list(body(FUN),
? ? ? ? ? ? ? ? setNames(list(quote(NA.)), names(formals(FUN))[[4]])))
? ? ? ? ? ? names(formals(FUN))[[4]] <- "NA."
? ? ? ? }
After fixing, chkIfelse(ifelseHW) just fails at identical(iflt,
as.POSIXlt(ifct)) .
'iflt' has NA as 'tzone' and 'isdst' components.
* Because function 'chkIfelse' continues checking after failure,
as.POSIXlt(ifct)
may give error. The error happens, for example, in chkIfelse(ifelseR) . Maybe
place it inside try(...).
* If 'lt' is a "POSIXlt" object, (lt-100) is a
"POSIXct" object.
So,
FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
is an example of mixed class.
* The part of function 'chkIfelse' in
for(i in seq_len(nFact))
uses 'NA.' function argument. That makes error when 'chkIfelse'
is applied to function without fourth argument.
The part should be wrapped in
if(has.4th) .
* Function 'ifelseJH' has fourth argument, but the argument is not for
value if NA. So, instead of
chkIfelse(ifelseJH) ,
maybe call
chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .

A concrete version of 'ifelse2' that starts the result from
'yes':
function(test, yes, no, NA. = NA) {
? ? if(!is.logical(test))
? ? ? ? test <- if(isS4(test)) methods::as(test, "logical") else
as.logical(test)
? ? n <- length(test)
? ? ans <- rep(yes, length.out = n)
? ? ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test &
!is.na(test)]
? ? ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
? ? ans
}

It requires 'rep' method that is compatible with subsetting. It also
works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and
gives an appropriate result if time zones are the same.
For coercion of 'test', there is no need of keeping attributes. So, it
doesn't do
storage.mode(test) <- "logical"
and goes directly to 'as.logical'.
It relies on subassignment for silent coercions of
logical < integer < double < complex .
Unlike 'ifelse', it never skips any subassignment. So, phenomenon as in
"example of different return modes" in ?ifelse doesn't happen.

Another version, for keeping attributes as pointed out by Duncan Murdoch:
function(test, yes, no, NA. = NA) {
? ? if(!is.logical(test))
? ? ? ? test <- if(isS4(test)) methods::as(test, "logical") else
as.logical(test)
? ? n <- length(test)
? ? n.yes <- length(yes); n.no <- length(no)
? ? if (n.yes != n) {
? ? ? ? if (n.no == n) {? # swap yes <-> no
? ? ? ? ? ? test <- !test
? ? ? ? ? ? ans <- yes; yes <- no; no <- ans
? ? ? ? ? ? n.no <- n.yes
? ? ? ? } else yes <- yes[rep_len(seq_len(n.yes), n)]
? ? }
? ? ans <- yes
? ? if (n.no == 1L)
? ? ? ? ans[!test] <- no
? ? else
? ? ? ? ans[!test & !is.na(test)] <- no[
? ? ? ? ? ? if (n.no == n) !test & !is.na(test)
? ? ? ? ? ? else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
? ? stopifnot(length(NA.) == 1L)
? ? ans[is.na(test)] <- NA.
? ? ans
}

Note argument evaluation order: 'test', 'yes', 'no',
'NA.'.
First, it chooses the first of 'yes' and 'no' that has the same
length as the result. If none of 'yes' and 'no' matches the
length of the result, it chooses recycled (or truncated) 'yes'.
It uses 'rep' on the index and subsetting as a substitute for
'rep' on the value.
It requires 'length' method that is compatible with subsetting.
Additionally, it uses the same idea as dplyr::if_else, or more precisely the
helper function 'replace_with'. It doesn't use 'rep' if the
length of 'no' is 1 or is the same as the length of the result. For
subassignment with value of length 1, recycling happens by itself and NA in
index is OK.
It limits 'NA.' to be of length 1, considering 'NA.' just as a
label for NA.

Cases where the last version above or 'ifelse2 or 'ifelseHW' in
ifelse-def.R gives inappropriate answers:
- 'yes' and 'no' are "difftime" objects with different
"units" attribute
- 'yes' and 'no' are "POSIXlt" objects with different
time zone
Example: 'yes' in "UTC" and 'no' in
"EST5EDT". The reverse, 'yes' in "EST5EDT" and
'no' in "UTC" gives error.

For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R
gives a right answer for "POSIXlt" case.
---------------------
Martin et al.,




On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
>
> Note that my premise was really to get *away* from inheriting
> too much from 'test'.
> Hence, I have *not* been talking about replacing ifelse() but
> rather of providing a new? ifelse2()
>
>? ? ? ? [ or if_else()? if Hadley was willing to ditch the dplyr one
>? ? ? ? ? ? ? ? ? ? ? ? in favor of a base one]
>
>? ???> Specifically, based on an unrelated discussion with Henrik
Bengtsson
> on
>? ???> Twitter, I wonder if preserving the recycling behavior test is
> longer than
>? ???> yes, no, but making the case where
>
>? ???> length( test ) < max(length( yes ), length( no ))
>
>? ???> would simplify usage for userRs in a useful way.
>
That was a copyediting bug on my part, it seems I hit send with my message
only half-edited/proofread. Apologies.

 That should have said that making the case where test is the one that will
be recycled (because it is shorter than either yes or no) an error. My
claim is that the fact that test itself can be recycled, rather than just
yes or no, is confusing to many R users. If we are writing an ifelse2 we
might want to drop that feature and just throw an error in that case.
(Users could still use the original ifelse if they understand and
specifically want that behavior).

Does that make more sense?


>
>? ???> Also, If we combine a stricter contract that the output will
always
> be of
>? ???> length with the suggestion of a specified output class
>
>Here, again, I was talking about the restriction that the output be
guaranteed to be the length of test, regardless of the length of yes and
no. That, combined with a specific, guaranteed output class would make a
much narrower/more restricted but also (I argue) much easier to understand
function. Particularly for beginning and intermediate users.

I do hear what you're saying about silent conversion, though, so what
I'm
describing might be a third function (ifelse3 for lack of a better name for
now), as you pointed out.

> that was not my intent here.... but would be another interesting
> extension. However, I would like to keep? R-semantic silent coercions
> such as
>? ? ? ? ???logical < integer < double < complex
>
> and your pseudo code below would not work so easily I think.
>
>? ???> the pseudo code could be
>
> (I'm changing assignment '=' to? '<-' ...? [please!]
)
>
>? ???> ifelse2 <- function(test, yes, no, outclass) {
>? ???>???lenout? <- length(test)
>? ???>???out <- as( rep(yes, length.out <- lenout), outclass)
>? ???>???out[!test] <- as(rep(no, length.out = lenout)[!test],
outclass)
>? ???>???# handle NA stuff
>? ???>???out
>? ???> }
>
>
>? ???> NAs could be tricky if outclass were allowed to be completely
> general, but
>? ???> doable, I think? Another approach? if we ARE fast-passing while
> leaving
>? ???> ifelse intact is that maybe NA's in test just aren't
allowed in
> ifelse2.
>? ???> I'm not saying we should definitely do that, but it's
possible and
> would
>? ???> make things faster.
>
>? ???> Finally, In terms of efficiency, with the stuff that Luke and I
are
> working
>? ???> on, the NA detection could be virtually free in certain cases,
which
> could
>? ???> give a nice boost for long vectors? that don't have any NAs
(and
> 'know'
>? ???> that they don't).
>
> That *is* indeed a very promising prospect!
> Thank you in advance!
>
>? ???> Best,
>? ???> ~G
>
> I still am bit disappointed by the fact that it seems nobody has
> taken a good look at my ifelse2() proposal.
>
I plan to look at it soon. Thanks again for all your work.

~G

>
> I really would like an alternative to ifelse() in *addition* to
> the current ifelse(), but hopefully in the future being used in
> quite a few places instead of ifelse()
> efficiency but for changed semantics, namely working for considerably
> more "vector like" classes of? 'yes' and 'no'?
than the current
> ifelse().
>
> As I said, the current proposal works for objects of class
>? ? "Date", "POSIXct", "POSIXlt",
"factor",? "mpfr" (pkg 'Rmpfr')
> and hopefully for "sparseVector" (in a next version of the
'Matrix' pkg).
>
> Martin
>
>? ???> On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
> maechler at stat.math.ethz.ch
>? ???>> wrote:
>
>? ???>> Finally getting back to this :
>? ???>>
>? ???>> >>>>> Hadley Wickham <h.wickham at
gmail.com>
>? ???>> >>>>>? ???on Mon, 15 Aug 2016 07:51:35 -0500
writes:
>? ???>>
>? ???>> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>? ???>> > <h.wickham at gmail.com> wrote:
>? ???>> >>> >> One possibility would also be to
consider a
>? ???>> >>> "numbers-only" or >> rather
"same type"-only {e.g.,
>? ???>> >>> would also work for characters} >> version.
>? ???>> >>>
>? ???>> >>> > I don't know what you mean by these.
>? ???>> >>>
>? ???>> >>> In the mean time, Bob Rudis mentioned
dplyr::if_else(),
>? ???>> >>> which is very relevant, thank you Bob!
>? ???>> >>>
>? ???>> >>> As I have found, that actually works in such a
"same
>? ???>> >>> type"-only way: It does not try to coerce,
but gives an
>? ???>> >>> error when the classes differ, even in this
somewhat
>? ???>> >>> debatable case :
>? ???>> >>>
>? ???>> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11)
Error:
>? ???>> >>> `false` has type 'double' not
'integer'
>? ???>> >>> >
>? ???>> >>>
>? ???>> >>> As documented, if_else() is clearly stricter than
>? ???>> >>> ifelse() and e.g., also does no recycling (but of
>? ???>> >>> length() 1).
>? ???>> >>
>? ???>> >> I agree that if_else() is currently too strict -
it's
>? ???>> >> particularly annoying if you want to replace some
values
>? ???>> >> with a missing:
>? ???>> >>
>? ???>> >> x <- sample(10) if_else(x > 5, NA, x) # Error:
`false`
>? ???>> >> has type 'integer' not 'logical'
>? ???>> >>
>? ???>> >> But I would like to make sure that this remains an
error:
>? ???>> >>
>? ???>> >> if_else(x > 5, x, "BLAH")
>? ???>> >>
>? ???>> >> Because that seems more likely to be a user error
(but
>? ???>> >> reasonable people might certainly believe that it
should
>? ???>> >> just work)
>? ???>> >>
>? ???>> >> dplyr is more accommodating in other places (i.e. in
>? ???>> >> bind_rows(), collapse() and the joins) but it's
>? ???>> >> surprisingly hard to get all the details right. For
>? ???>> >> example, what should the result of this call be?
>? ???>> >>
>? ???>> >> if_else(c(TRUE, FALSE), factor(c("a",
"b")),
>? ???>> >> factor(c("c", "b"))
>? ???>> >>
>? ???>> >> Strictly speaking I think you could argue it's an
error,
>? ???>> >> but that's not very user-friendly. Should it be a
factor
>? ???>> >> with the union of the levels? Should it be a
character
>? ???>> >> vector + warning? Should the behaviour change if one
set
>? ???>> >> of levels is a subset of the other set?
>? ???>> >>
>? ???>> >> There are similar issues for POSIXct (if the time
zones
>? ???>> >> are different, which should win?), and difftimes
>? ???>> >> (similarly for units).? Ideally you'd like the
behaviour
>? ???>> >> to be extensible for new S3 classes, which suggests
it
>? ???>> >> should be a generic (and for the most general case,
it
>? ???>> >> would need to dispatch on both arguments).
>? ???>>
>? ???>> > One possible principle would be to use c() -
>? ???>> > i.e. construct out as
>? ???>>
>? ???>> > out <- c(yes[0], no[0]
>? ???>> > length(out) <- max(length(yes), length(no))
>? ???>>
>? ???>> yes; this would require that a? `length<-` method works for
the
>? ???>> class of the result.
>? ???>>
>? ???>> Duncan Murdoch mentioned a version of this, in his very
>? ???>> first reply:
>? ???>>
>? ???>> ans <- c(yes, no)[seq_along(test)]
>? ???>> ans <- ans[seq_along(test)]
>? ???>>
>? ???>> which is less efficient for atomic vectors, but requires
>? ???>> less from the class: it "only" needs `c` and `[` to
work
>? ???>>
>? ???>> and a mixture of your two proposals would be possible too:
>? ???>>
>? ???>> ans <- c(yes[0], no[0])
>? ???>> ans <- ans[seq_along(test)]
>? ???>>
>? ???>> which does *not* work for my "mpfr" numbers (CRAN
package 'Rmpfr'),
>? ???>> but that's a buglet in the? c.mpfr() implementation of my
Rmpfr
>? ???>> package... (which has already been fixed in the development
version
> on
>? ???>> R-forge,
>? ???>> https://r-forge.r-project.org/R/?group_id=386)
>? ???>>
>? ???>> > But of course that wouldn't help with factor
responses.
>? ???>>
>? ???>> Yes.? However, a version of Duncan's suggestion -- of
treating
> 'yes' first
>? ???>> -- does help in that case.
>? ???>>
>? ???>> For once, mainly as "feasability experiment",
>? ???>> I have created a github gist to make my current ifelse2()
proposal
>? ???>> available
>? ???>> for commenting, cloning, pullrequesting, etc:
>? ???>>
>? ???>> Consisting of 2 files
>? ???>> - ifelse-def.R :? Functions definitions only, basically all
the
> current
>? ???>> proposals, called? ifelse*()
>? ???>> - ifelse-checks.R : A simplistic checking function
>? ???>> and examples calling it, notably demonstrating that my
>? ???>> ifelse2()? does work with
>? ???>> "Date", <dateTime> (i.e. "POSIXct"
and "POSIXlt"), factors,
>? ???>> and "mpfr" (the arbitrary-precision numbers in my
package "Rmpfr")
>? ???>>
>? ???>> Also if you are not on github, you can quickly get to the
ifelse2()
>? ???>> definition :
>? ???>>
>? ???>>
https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
>? ???>> 94#file-ifelse-def-r-L168
>? ???>>
>? ???>> > Also, if you're considering an improved ifelse(),
I'd
>? ???>> > strongly urge you to consider adding an `na` argument,
>? ???>>
>? ???>> I now did (called it 'NA.').
>? ???>>
>? ???>> > so that you can use ifelse() to transform all three
>? ???>> > possible values in a logical vector.
>? ???>>
>? ???>> > Hadley
>? ???>> > -- http://hadley.nz
>? ???>>
>? ???>> For those who really hate GH (and don't want or cannot
easily
> follow the
>? ???>> above URL), here's my current definition:
>? ???>>
>? ???>>
>? ???>> ##' Martin Maechler, 14. Nov 2016 --- taking into account
Duncan M.
> and
>? ???>> Hadley's
>? ???>> ##' ideas in the R-devel thread starting at (my mom's
86th
> birthday):
>? ???>> ##'
https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
>? ???>> ifelse2 <- function (test, yes, no, NA. = NA) {
>? ???>> if(!is.logical(test)) {
>? ???>> if(is.atomic(test))
>? ???>> storage.mode(test) <- "logical"
>? ???>> else ## typically a "class"; storage.mode<-()
typically fails
>? ???>> test <- if(isS4(test)) methods::as(test,
"logical") else
>? ???>> as.logical(test)
>? ???>> }
>? ???>>
>? ???>> ## No longer optimize the? "if (a) x else y"? cases:
>? ???>> ## Only "non-good" R users use ifelse(.) instead of
if(.) in these
>? ???>> cases.
>? ???>>
>? ???>> ans <-
>? ???>> tryCatch(rep(if(is.object(yes) &&
identical(class(yes), class(no)))
>? ???>> ## as c(o) or o[0] may not work for the class
>? ???>> yes else c(yes[0], no[0]), length.out >? ???>>
length(test)),
>? ???>> error = function(e) { ## try asymmetric, yes-leaning
>? ???>> r <- yes
>? ???>> r[!test] <- no[!test]
>? ???>> r
>? ???>> })
>? ???>> ok <- !(nas <- is.na(test))
>? ???>> if (any(test[ok]))
>? ???>> ans[test & ok] <- rep(yes, length.out =
length(ans))[test & ok]
>? ???>> if (any(!test[ok]))
>? ???>> ans[!test & ok] <- rep(no, length.out =
length(ans))[!test & ok]
>? ???>> ans[nas] <- NA. # possibly coerced to class(ans)
>? ???>> ans
>? ???>> }
>? ???>>
>? ???>> ______________________________________________
>? ???>> R-devel at r-project.org mailing list
>? ???>> https://stat.ethz.ch/mailman/listinfo/r-devel
>? ???>>
>
>
>
>? ???> --
>? ???> Gabriel Becker, PhD
>? ???> Associate Scientist (Bioinformatics)
>? ???> Genentech Research
>
>? ???> [[alternative HTML version deleted]]
>
>

-- 
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

??? [[alternative HTML version deleted]]

S Ellison

2016-Nov-26 18:52 UTC

head link

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

> Just stating, in 'ifelse', 'test' is not recycled. As I
said in "R-intro: length of 'ifelse' result"
> (https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html),
ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a'
or 'b' is longer.
That is indeed (almost) the documented behaviour. The documented behaviour is
slightly more complex; '... returns a value _of the same shape_ as
'test''. IN principle, test can be a matrix, for example.
> A concrete version of 'ifelse2' that starts the result from
'yes':
> .. still a bit disappointed that nobody has taken a look ...
I took a look. The idea leaves (at least) me very uneasy. If you are recycling
'test' as well as arbitrary-length yes and no, results will become
frighteningly hard to predict except in very simple cases where you have
well-defined and consistent regularities in the data. And where you do, surely
passing ifelse a vetor of the right length, generated by rep() applied to a
short 'test' vector, will do what you want without messing around with
new functions that hide what you're doing.

Do you really have a case where 'test' is neither a single logical (that
could be used with 'if') nor a vector that can be readily replicated to
the desired length with 'rep'?

If not, I'd drop the attempt to generate new ifelse-like functions. 

S Ellison



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Martin Maechler

2016-Nov-28 15:00 UTC

head link

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
>>>>>     on Sat, 26 Nov 2016 17:14:01 +0000 writes:
    > Just stating, in 'ifelse', 'test' is not recycled. As I
said in "R-intro: length of 'ifelse' result"
(https://stat.ethz.ch/pipermail/r-devel/2016-September/073136.html),
ifelse(condition, a, b) returns a vector of the length of 'condition',
even if 'a' or 'b' is longer.

yes and  ?ifelse (the help page) also does not say that test is
recycled, rather

   >>  If \code{yes} or \code{no} are too short, their elements are
recycled.

(*and* the problem you wrote the above has been corrected in the
 R-intro manual shortly after).


    > On current 'ifelse' code in R:

    > * The part
    > ans[nas] <- NA
    > could be omitted because NA's are already in place.
    > If the part is removed, variable 'nas' is no longer used.

I agree that this seems logical.  If I apply the change, R's own
full checks do not seem affected, and I may try to commit that
change and "wait and see".


    > * The any(*) part actually checks the thing that is used as the index
vector. The index vector could be stored and then repeatedly used, like the
following.

    > ? ? if (any(sel <- test & ok))
    > ??? ans[sel] <- rep(yes, length.out = length(ans))[sel]

yes, I know, and have had similar thoughts in the past.
However note  (I know you that) the current code is

    if (any(test[ok]))
	ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]

and   any(test[ok])    	     may be considerably faster than
      any(sel <- test & ok)
      
OTOH I think the current code would only be faster (for the
above) when any(.) returned FALSE ...
I think it may depend on the typical use cases which of the two
versions is more efficient.


    > * If 'test' is a factor, doing
    > storage.mode(test) <- "logical"
    > is not appropriate, but is.atomic(test) returns TRUE. Maybe use
    > if(!is.object(test))
    >    instead of
    > if(is.atomic(test)) .

This would be a considerable change I think...
Note that I'm currently really proposing to introduce an *additional*
ifelse function with different "more reasonable" semantic,  and
your last change would do that too.

My alternative should really work
- for factors
- for "array"s including "matrix" (as the current ifelse()
does!)
- for "Date", "POSIXct", "ts"(timeseries),
"zoo",
      "sparseVector", "sparseMatrix" (*), or
"mpfr",
  without any special code, but rather by design.

 *) Currently needs the R-forge version of  Matrix, version 1.2-8.

A bit less than an hour ago, I have updated the gist with an updated
proposal ifelse2() {and the current alternatives that I know},
modified so it *does* keep more, e.g.  dim() attributes in
reasonable cases.

https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d87894#file-ifelse-def-r-L168

Hence my ifelse2() became even a bit longer (but not slower)
working for even more classes of  "yes" and "no".


    > On ifelse-checks.R:
    > * In function 'chkIfelse', if the fourth function argument
names is not "NA.", the argument name is changed, but the function
body still uses the old name. That makes error in chkIfelse(ifelseHW) .
    > A fix:
    > ? ? ? ? if(names(formals(FUN))[[4]] != "NA.") {
    > ? ? ? ? ? ? body(FUN) <- do.call(substitute, list(body(FUN),
    > ? ? ? ? ? ? ? ? setNames(list(quote(NA.)), names(formals(FUN))[[4]])))
    > ? ? ? ? ? ? names(formals(FUN))[[4]] <- "NA."
    > ? ? ? ? }

yes, thank you!  (a bit embarrassing for me ..)

    > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt,
as.POSIXlt(ifct)) .
    > 'iflt' has NA as 'tzone' and 'isdst'
components.
    > * Because function 'chkIfelse' continues checking after
failure,
    > as.POSIXlt(ifct)
    > may give error. The error happens, for example, in chkIfelse(ifelseR) .
Maybe place it inside try(...).
    > * If 'lt' is a "POSIXlt" object, (lt-100) is a
"POSIXct" object.
    > So,
    > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
    > is an example of mixed class.

good; thank you for the hint.

    > * The part of function 'chkIfelse' in
    > for(i in seq_len(nFact))
    > uses 'NA.' function argument. That makes error when
'chkIfelse' is applied to function without fourth argument.
    > The part should be wrapped in
    > if(has.4th) .
yes of course

    > * Function 'ifelseJH' has fourth argument, but the argument is
not for value if NA. So, instead of
    > chkIfelse(ifelseJH) ,
    > maybe call
    > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .
You are right;
I've decided to solve this differently.

I'm looking at these suggestions now, notably also your proposals below;
thank you, Suharto!

(I wanted to put my improved 'ifelse2' out first, quickly).
Martin


    > A concrete version of 'ifelse2' that starts the result from
'yes':
    > function(test, yes, no, NA. = NA) {
    > ? ? if(!is.logical(test))
    > ? ? ? ? test <- if(isS4(test)) methods::as(test,
"logical") else as.logical(test)
    > ? ? n <- length(test)
    > ? ? ans <- rep(yes, length.out = n)
    > ? ? ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test
& !is.na(test)]
    > ? ? ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
    > ? ? ans
    > }

    > It requires 'rep' method that is compatible with subsetting. It
also works with "POSIXlt" in R 2.7.2, when 'length' gives 9,
and gives an appropriate result if time zones are the same.
    > For coercion of 'test', there is no need of keeping attributes.
So, it doesn't do
    > storage.mode(test) <- "logical"
    > and goes directly to 'as.logical'.
    > It relies on subassignment for silent coercions of
    > logical < integer < double < complex .
    > Unlike 'ifelse', it never skips any subassignment. So,
phenomenon as in "example of different return modes" in ?ifelse
doesn't happen.

    > Another version, for keeping attributes as pointed out by Duncan
Murdoch:
    > function(test, yes, no, NA. = NA) {
    > ? ? if(!is.logical(test))
    > ? ? ? ? test <- if(isS4(test)) methods::as(test,
"logical") else as.logical(test)
    > ? ? n <- length(test)
    > ? ? n.yes <- length(yes); n.no <- length(no)
    > ? ? if (n.yes != n) {
    > ? ? ? ? if (n.no == n) {? # swap yes <-> no
    > ? ? ? ? ? ? test <- !test
    > ? ? ? ? ? ? ans <- yes; yes <- no; no <- ans
    > ? ? ? ? ? ? n.no <- n.yes
    > ? ? ? ? } else yes <- yes[rep_len(seq_len(n.yes), n)]
    > ? ? }
    > ? ? ans <- yes
    > ? ? if (n.no == 1L)
    > ? ? ? ? ans[!test] <- no
    > ? ? else
    > ? ? ? ? ans[!test & !is.na(test)] <- no[
    > ? ? ? ? ? ? if (n.no == n) !test & !is.na(test)
    > ? ? ? ? ? ? else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
    > ? ? stopifnot(length(NA.) == 1L)
    > ? ? ans[is.na(test)] <- NA.
    > ? ? ans
    > }

    > Note argument evaluation order: 'test', 'yes',
'no', 'NA.'.
    > First, it chooses the first of 'yes' and 'no' that has
the same length as the result. If none of 'yes' and 'no' matches
the length of the result, it chooses recycled (or truncated) 'yes'.
    > It uses 'rep' on the index and subsetting as a substitute for
'rep' on the value.
    > It requires 'length' method that is compatible with subsetting.
    > Additionally, it uses the same idea as dplyr::if_else, or more
precisely the helper function 'replace_with'. It doesn't use
'rep' if the length of 'no' is 1 or is the same as the length of
the result. For subassignment with value of length 1, recycling happens by
itself and NA in index is OK.
    > It limits 'NA.' to be of length 1, considering 'NA.'
just as a label for NA.

    > Cases where the last version above or 'ifelse2 or
'ifelseHW' in ifelse-def.R gives inappropriate answers:
    > - 'yes' and 'no' are "difftime" objects with
different "units" attribute
    > - 'yes' and 'no' are "POSIXlt" objects with
different time zone
    > Example: 'yes' in "UTC" and 'no' in
"EST5EDT". The reverse, 'yes' in "EST5EDT" and
'no' in "UTC" gives error.

    > For the cases, c(yes, no) helps. Function 'ifelseJH' in
ifelse-def.R gives a right answer for "POSIXlt" case.
    > ---------------------
    > Martin et al.,




    > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at
stat.math.ethz.ch
    >> wrote:

    >> 
    >> Note that my premise was really to get *away* from inheriting
    >> too much from 'test'.
    >> Hence, I have *not* been talking about replacing ifelse() but
    >> rather of providing a new? ifelse2()
    >> 
    >> ? ? ? ? [ or if_else()? if Hadley was willing to ditch the dplyr
one
    >> ? ? ? ? ? ? ? ? ? ? ? ? in favor of a base one]
    >> 
    >> ? ???> Specifically, based on an unrelated discussion with
Henrik Bengtsson
    >> on
    >> ? ???> Twitter, I wonder if preserving the recycling behavior
test is
    >> longer than
    >> ? ???> yes, no, but making the case where
    >> 
    >> ? ???> length( test ) < max(length( yes ), length( no ))
    >> 
    >> ? ???> would simplify usage for userRs in a useful way.
    >> 

    > That was a copyediting bug on my part, it seems I hit send with my
message
    > only half-edited/proofread. Apologies.

    > That should have said that making the case where test is the one that
will
    > be recycled (because it is shorter than either yes or no) an error. My
    > claim is that the fact that test itself can be recycled, rather than
just
    > yes or no, is confusing to many R users. If we are writing an ifelse2
we
    > might want to drop that feature and just throw an error in that case.
    > (Users could still use the original ifelse if they understand and
    > specifically want that behavior).

    > Does that make more sense?



    >> 
    >> ? ???> Also, If we combine a stricter contract that the output
will always
    >> be of
    >> ? ???> length with the suggestion of a specified output class
    >> 
    >> 
    > Here, again, I was talking about the restriction that the output be
    > guaranteed to be the length of test, regardless of the length of yes
and
    > no. That, combined with a specific, guaranteed output class would make
a
    > much narrower/more restricted but also (I argue) much easier to
understand
    > function. Particularly for beginning and intermediate users.

    > I do hear what you're saying about silent conversion, though, so
what I'm
    > describing might be a third function (ifelse3 for lack of a better name
for
    > now), as you pointed out.


    >> that was not my intent here.... but would be another interesting
    >> extension. However, I would like to keep? R-semantic silent
coercions
    >> such as
    >> ? ? ? ? ???logical < integer < double < complex
    >> 
    >> and your pseudo code below would not work so easily I think.
    >> 
    >> ? ???> the pseudo code could be
    >> 
    >> (I'm changing assignment '=' to? '<-' ...?
[please!] )
    >> 
    >> ? ???> ifelse2 <- function(test, yes, no, outclass) {
    >> ? ???>???lenout? <- length(test)
    >> ? ???>???out <- as( rep(yes, length.out <- lenout),
outclass)
    >> ? ???>???out[!test] <- as(rep(no, length.out =
lenout)[!test], outclass)
    >> ? ???>???# handle NA stuff
    >> ? ???>???out
    >> ? ???> }
    >> 
    >> 
    >> ? ???> NAs could be tricky if outclass were allowed to be
completely
    >> general, but
    >> ? ???> doable, I think? Another approach? if we ARE fast-passing
while
    >> leaving
    >> ? ???> ifelse intact is that maybe NA's in test just
aren't allowed in
    >> ifelse2.
    >> ? ???> I'm not saying we should definitely do that, but
it's possible and
    >> would
    >> ? ???> make things faster.
    >> 
    >> ? ???> Finally, In terms of efficiency, with the stuff that Luke
and I are
    >> working
    >> ? ???> on, the NA detection could be virtually free in certain
cases, which
    >> could
    >> ? ???> give a nice boost for long vectors? that don't have
any NAs (and
    >> 'know'
    >> ? ???> that they don't).
    >> 
    >> That *is* indeed a very promising prospect!
    >> Thank you in advance!
    >> 
    >> ? ???> Best,
    >> ? ???> ~G
    >> 
    >> I still am bit disappointed by the fact that it seems nobody has
    >> taken a good look at my ifelse2() proposal.
    >> 

    > I plan to look at it soon. Thanks again for all your work.

    > ~G


    >> 
    >> I really would like an alternative to ifelse() in *addition* to
    >> the current ifelse(), but hopefully in the future being used in
    >> quite a few places instead of ifelse()
    >> efficiency but for changed semantics, namely working for
considerably
    >> more "vector like" classes of? 'yes' and
'no'? than the current
    >> ifelse().
    >> 
    >> As I said, the current proposal works for objects of class
    >> ? ? "Date", "POSIXct", "POSIXlt",
"factor",? "mpfr" (pkg 'Rmpfr')
    >> and hopefully for "sparseVector" (in a next version of
the 'Matrix' pkg).
    >> 
    >> Martin
    >> 
    >> ? ???> On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
    >> maechler at stat.math.ethz.ch
    >> ? ???>> wrote:
    >> 
    >> ? ???>> Finally getting back to this :
    >> ? ???>>
    >> ? ???>> >>>>> Hadley Wickham <h.wickham at
gmail.com>
    >> ? ???>> >>>>>? ???on Mon, 15 Aug 2016 07:51:35
-0500 writes:
    >> ? ???>>
    >> ? ???>> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
    >> ? ???>> > <h.wickham at gmail.com> wrote:
    >> ? ???>> >>> >> One possibility would also be
to consider a
    >> ? ???>> >>> "numbers-only" or >>
rather "same type"-only {e.g.,
    >> ? ???>> >>> would also work for characters} >>
version.
    >> ? ???>> >>>
    >> ? ???>> >>> > I don't know what you mean by
these.
    >> ? ???>> >>>
    >> ? ???>> >>> In the mean time, Bob Rudis mentioned
dplyr::if_else(),
    >> ? ???>> >>> which is very relevant, thank you Bob!
    >> ? ???>> >>>
    >> ? ???>> >>> As I have found, that actually works in
such a "same
    >> ? ???>> >>> type"-only way: It does not try to
coerce, but gives an
    >> ? ???>> >>> error when the classes differ, even in
this somewhat
    >> ? ???>> >>> debatable case :
    >> ? ???>> >>>
    >> ? ???>> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3,
0+10:11) Error:
    >> ? ???>> >>> `false` has type 'double' not
'integer'
    >> ? ???>> >>> >
    >> ? ???>> >>>
    >> ? ???>> >>> As documented, if_else() is clearly
stricter than
    >> ? ???>> >>> ifelse() and e.g., also does no
recycling (but of
    >> ? ???>> >>> length() 1).
    >> ? ???>> >>
    >> ? ???>> >> I agree that if_else() is currently too
strict - it's
    >> ? ???>> >> particularly annoying if you want to replace
some values
    >> ? ???>> >> with a missing:
    >> ? ???>> >>
    >> ? ???>> >> x <- sample(10) if_else(x > 5, NA, x)
# Error: `false`
    >> ? ???>> >> has type 'integer' not
'logical'
    >> ? ???>> >>
    >> ? ???>> >> But I would like to make sure that this
remains an error:
    >> ? ???>> >>
    >> ? ???>> >> if_else(x > 5, x, "BLAH")
    >> ? ???>> >>
    >> ? ???>> >> Because that seems more likely to be a user
error (but
    >> ? ???>> >> reasonable people might certainly believe
that it should
    >> ? ???>> >> just work)
    >> ? ???>> >>
    >> ? ???>> >> dplyr is more accommodating in other places
(i.e. in
    >> ? ???>> >> bind_rows(), collapse() and the joins) but
it's
    >> ? ???>> >> surprisingly hard to get all the details
right. For
    >> ? ???>> >> example, what should the result of this call
be?
    >> ? ???>> >>
    >> ? ???>> >> if_else(c(TRUE, FALSE),
factor(c("a", "b")),
    >> ? ???>> >> factor(c("c", "b"))
    >> ? ???>> >>
    >> ? ???>> >> Strictly speaking I think you could argue
it's an error,
    >> ? ???>> >> but that's not very user-friendly.
Should it be a factor
    >> ? ???>> >> with the union of the levels? Should it be a
character
    >> ? ???>> >> vector + warning? Should the behaviour
change if one set
    >> ? ???>> >> of levels is a subset of the other set?
    >> ? ???>> >>
    >> ? ???>> >> There are similar issues for POSIXct (if the
time zones
    >> ? ???>> >> are different, which should win?), and
difftimes
    >> ? ???>> >> (similarly for units).? Ideally you'd
like the behaviour
    >> ? ???>> >> to be extensible for new S3 classes, which
suggests it
    >> ? ???>> >> should be a generic (and for the most
general case, it
    >> ? ???>> >> would need to dispatch on both arguments).
    >> ? ???>>
    >> ? ???>> > One possible principle would be to use c() -
    >> ? ???>> > i.e. construct out as
    >> ? ???>>
    >> ? ???>> > out <- c(yes[0], no[0]
    >> ? ???>> > length(out) <- max(length(yes), length(no))
    >> ? ???>>
    >> ? ???>> yes; this would require that a? `length<-` method
works for the
    >> ? ???>> class of the result.
    >> ? ???>>
    >> ? ???>> Duncan Murdoch mentioned a version of this, in his
very
    >> ? ???>> first reply:
    >> ? ???>>
    >> ? ???>> ans <- c(yes, no)[seq_along(test)]
    >> ? ???>> ans <- ans[seq_along(test)]
    >> ? ???>>
    >> ? ???>> which is less efficient for atomic vectors, but
requires
    >> ? ???>> less from the class: it "only" needs `c`
and `[` to work
    >> ? ???>>
    >> ? ???>> and a mixture of your two proposals would be possible
too:
    >> ? ???>>
    >> ? ???>> ans <- c(yes[0], no[0])
    >> ? ???>> ans <- ans[seq_along(test)]
    >> ? ???>>
    >> ? ???>> which does *not* work for my "mpfr" numbers
(CRAN package 'Rmpfr'),
    >> ? ???>> but that's a buglet in the? c.mpfr()
implementation of my Rmpfr
    >> ? ???>> package... (which has already been fixed in the
development version
    >> on
    >> ? ???>> R-forge,
    >> ? ???>> https://r-forge.r-project.org/R/?group_id=386)
    >> ? ???>>
    >> ? ???>> > But of course that wouldn't help with factor
responses.
    >> ? ???>>
    >> ? ???>> Yes.? However, a version of Duncan's suggestion
-- of treating
    >> 'yes' first
    >> ? ???>> -- does help in that case.
    >> ? ???>>
    >> ? ???>> For once, mainly as "feasability
experiment",
    >> ? ???>> I have created a github gist to make my current
ifelse2() proposal
    >> ? ???>> available
    >> ? ???>> for commenting, cloning, pullrequesting, etc:
    >> ? ???>>
    >> ? ???>> Consisting of 2 files
    >> ? ???>> - ifelse-def.R :? Functions definitions only,
basically all the
    >> current
    >> ? ???>> proposals, called? ifelse*()
    >> ? ???>> - ifelse-checks.R : A simplistic checking function
    >> ? ???>> and examples calling it, notably demonstrating that
my
    >> ? ???>> ifelse2()? does work with
    >> ? ???>> "Date", <dateTime> (i.e.
"POSIXct" and "POSIXlt"), factors,
    >> ? ???>> and "mpfr" (the arbitrary-precision numbers
in my package "Rmpfr")
    >> ? ???>>
    >> ? ???>> Also if you are not on github, you can quickly get to
the ifelse2()
    >> ? ???>> definition :
    >> ? ???>>
    >> ? ???>>
https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
    >> ? ???>> 94#file-ifelse-def-r-L168
    >> ? ???>>
    >> ? ???>> > Also, if you're considering an improved
ifelse(), I'd
    >> ? ???>> > strongly urge you to consider adding an `na`
argument,
    >> ? ???>>
    >> ? ???>> I now did (called it 'NA.').
    >> ? ???>>
    >> ? ???>> > so that you can use ifelse() to transform all
three
    >> ? ???>> > possible values in a logical vector.
    >> ? ???>>
    >> ? ???>> > Hadley
    >> ? ???>> > -- http://hadley.nz
    >> ? ???>>
    >> ? ???>> For those who really hate GH (and don't want or
cannot easily
    >> follow the
    >> ? ???>> above URL), here's my current definition:
    >> ? ???>>
    >> ? ???>>
    >> ? ???>> ##' Martin Maechler, 14. Nov 2016 --- taking into
account Duncan M.
    >> and
    >> ? ???>> Hadley's
    >> ? ???>> ##' ideas in the R-devel thread starting at (my
mom's 86th
    >> birthday):
    >> ? ???>> ##'
https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
    >> ? ???>> ifelse2 <- function (test, yes, no, NA. = NA) {
    >> ? ???>> if(!is.logical(test)) {
    >> ? ???>> if(is.atomic(test))
    >> ? ???>> storage.mode(test) <- "logical"
    >> ? ???>> else ## typically a "class";
storage.mode<-() typically fails
    >> ? ???>> test <- if(isS4(test)) methods::as(test,
"logical") else
    >> ? ???>> as.logical(test)
    >> ? ???>> }
    >> ? ???>>
    >> ? ???>> ## No longer optimize the? "if (a) x else
y"? cases:
    >> ? ???>> ## Only "non-good" R users use ifelse(.)
instead of if(.) in these
    >> ? ???>> cases.
    >> ? ???>>
    >> ? ???>> ans <-
    >> ? ???>> tryCatch(rep(if(is.object(yes) &&
identical(class(yes), class(no)))
    >> ? ???>> ## as c(o) or o[0] may not work for the class
    >> ? ???>> yes else c(yes[0], no[0]), length.out     >> ?
???>> length(test)),
    >> ? ???>> error = function(e) { ## try asymmetric, yes-leaning
    >> ? ???>> r <- yes
    >> ? ???>> r[!test] <- no[!test]
    >> ? ???>> r
    >> ? ???>> })
    >> ? ???>> ok <- !(nas <- is.na(test))
    >> ? ???>> if (any(test[ok]))
    >> ? ???>> ans[test & ok] <- rep(yes, length.out =
length(ans))[test & ok]
    >> ? ???>> if (any(!test[ok]))
    >> ? ???>> ans[!test & ok] <- rep(no, length.out =
length(ans))[!test & ok]
    >> ? ???>> ans[nas] <- NA. # possibly coerced to class(ans)
    >> ? ???>> ans
    >> ? ???>> }
    >> ? ???>>
    >> ? ???>> ______________________________________________
    >> ? ???>> R-devel at r-project.org mailing list
    >> ? ???>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> ? ???>>
    >> 
    >> 
    >> 
    >> ? ???> --
    >> ? ???> Gabriel Becker, PhD
    >> ? ???> Associate Scientist (Bioinformatics)
    >> ? ???> Genentech Research
    >> 
    >> ? ???> [[alternative HTML version deleted]]
    >> 
    >> 


    > -- 
    > Gabriel Becker, PhD
    > Associate Scientist (Bioinformatics)
    > Genentech Research

    > ??? [[alternative HTML version deleted]]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Gabriel Becker

2016-Nov-28 15:58 UTC

head link

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Well, that's embarrassing. Sorry for the noise on that front, everyone. I
misunderstood something from the aforementioned unrelated conversation I
was having, but not double checking is on me (I rarely use if else and when
I do I avoid that situation in my own code, which is why I didn't already
know this)


I'd still argue that situation should at least warn, possibly error, as it
seems indicative of a bug in the user's code.

On Mon, Nov 28, 2016 at 7:00 AM, Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
> >>>>> Suharto Anggono Suharto Anggono via R-devel
<r-devel at r-project.org>
> >>>>>     on Sat, 26 Nov 2016 17:14:01 +0000 writes:
>
>     > Just stating, in 'ifelse', 'test' is not recycled.
As I said in
> "R-intro: length of 'ifelse' result"
(https://stat.ethz.ch/
> pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a'
or 'b' is longer.
>
> yes and  ?ifelse (the help page) also does not say that test is
> recycled, rather
>
>    >>  If \code{yes} or \code{no} are too short, their elements are
> recycled.
>
> (*and* the problem you wrote the above has been corrected in the
>  R-intro manual shortly after).
>
>
>     > On current 'ifelse' code in R:
>
>     > * The part
>     > ans[nas] <- NA
>     > could be omitted because NA's are already in place.
>     > If the part is removed, variable 'nas' is no longer used.
>
> I agree that this seems logical.  If I apply the change, R's own
> full checks do not seem affected, and I may try to commit that
> change and "wait and see".
>
>
>     > * The any(*) part actually checks the thing that is used as the
> index vector. The index vector could be stored and then repeatedly used,
> like the following.
>
>     >     if (any(sel <- test & ok))
>     >     ans[sel] <- rep(yes, length.out = length(ans))[sel]
>
> yes, I know, and have had similar thoughts in the past.
> However note  (I know you that) the current code is
>
>     if (any(test[ok]))
>         ans[test & ok] <- rep(yes, length.out = length(ans))[test
& ok]
>
> and   any(test[ok])          may be considerably faster than
>       any(sel <- test & ok)
>
> OTOH I think the current code would only be faster (for the
> above) when any(.) returned FALSE ...
> I think it may depend on the typical use cases which of the two
> versions is more efficient.
>
>
>     > * If 'test' is a factor, doing
>     > storage.mode(test) <- "logical"
>     > is not appropriate, but is.atomic(test) returns TRUE. Maybe use
>     > if(!is.object(test))
>     >    instead of
>     > if(is.atomic(test)) .
>
> This would be a considerable change I think...
> Note that I'm currently really proposing to introduce an *additional*
> ifelse function with different "more reasonable" semantic,  and
> your last change would do that too.
>
> My alternative should really work
> - for factors
> - for "array"s including "matrix" (as the current
ifelse() does!)
> - for "Date", "POSIXct", "ts"(timeseries),
"zoo",
>       "sparseVector", "sparseMatrix" (*), or
"mpfr",
>   without any special code, but rather by design.
>
>  *) Currently needs the R-forge version of  Matrix, version 1.2-8.
>
> A bit less than an hour ago, I have updated the gist with an updated
> proposal ifelse2() {and the current alternatives that I know},
> modified so it *does* keep more, e.g.  dim() attributes in
> reasonable cases.
>
> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
> 94#file-ifelse-def-r-L168
>
> Hence my ifelse2() became even a bit longer (but not slower)
> working for even more classes of  "yes" and "no".
>
>
>     > On ifelse-checks.R:
>     > * In function 'chkIfelse', if the fourth function argument
names is
> not "NA.", the argument name is changed, but the function body
still uses
> the old name. That makes error in chkIfelse(ifelseHW) .
>     > A fix:
>     >         if(names(formals(FUN))[[4]] != "NA.") {
>     >             body(FUN) <- do.call(substitute, list(body(FUN),
>     >                 setNames(list(quote(NA.)),
> names(formals(FUN))[[4]])))
>     >             names(formals(FUN))[[4]] <- "NA."
>     >         }
>
> yes, thank you!  (a bit embarrassing for me ..)
>
>     > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt,
> as.POSIXlt(ifct)) .
>     > 'iflt' has NA as 'tzone' and 'isdst'
components.
>     > * Because function 'chkIfelse' continues checking after
failure,
>     > as.POSIXlt(ifct)
>     > may give error. The error happens, for example, in
> chkIfelse(ifelseR) . Maybe place it inside try(...).
>     > * If 'lt' is a "POSIXlt" object, (lt-100) is a
"POSIXct" object.
>     > So,
>     > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
>     > is an example of mixed class.
>
> good; thank you for the hint.
>
>     > * The part of function 'chkIfelse' in
>     > for(i in seq_len(nFact))
>     > uses 'NA.' function argument. That makes error when
'chkIfelse' is
> applied to function without fourth argument.
>     > The part should be wrapped in
>     > if(has.4th) .
> yes of course
>
>     > * Function 'ifelseJH' has fourth argument, but the
argument is not
> for value if NA. So, instead of
>     > chkIfelse(ifelseJH) ,
>     > maybe call
>     > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .
> You are right;
> I've decided to solve this differently.
>
> I'm looking at these suggestions now, notably also your proposals
below;
> thank you, Suharto!
>
> (I wanted to put my improved 'ifelse2' out first, quickly).
> Martin
>
>
>     > A concrete version of 'ifelse2' that starts the result
from 'yes':
>     > function(test, yes, no, NA. = NA) {
>     >     if(!is.logical(test))
>     >         test <- if(isS4(test)) methods::as(test,
"logical") else
> as.logical(test)
>     >     n <- length(test)
>     >     ans <- rep(yes, length.out = n)
>     >     ans[!test & !is.na(test)] <- rep(no, length.out =
n)[!test & !
> is.na(test)]
>     >     ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
>     >     ans
>     > }
>
>     > It requires 'rep' method that is compatible with
subsetting. It also
> works with "POSIXlt" in R 2.7.2, when 'length' gives 9,
and gives an
> appropriate result if time zones are the same.
>     > For coercion of 'test', there is no need of keeping
attributes. So,
> it doesn't do
>     > storage.mode(test) <- "logical"
>     > and goes directly to 'as.logical'.
>     > It relies on subassignment for silent coercions of
>     > logical < integer < double < complex .
>     > Unlike 'ifelse', it never skips any subassignment. So,
phenomenon as
> in "example of different return modes" in ?ifelse doesn't
happen.
>
>     > Another version, for keeping attributes as pointed out by Duncan
> Murdoch:
>     > function(test, yes, no, NA. = NA) {
>     >     if(!is.logical(test))
>     >         test <- if(isS4(test)) methods::as(test,
"logical") else
> as.logical(test)
>     >     n <- length(test)
>     >     n.yes <- length(yes); n.no <- length(no)
>     >     if (n.yes != n) {
>     >         if (n.no == n) {  # swap yes <-> no
>     >             test <- !test
>     >             ans <- yes; yes <- no; no <- ans
>     >             n.no <- n.yes
>     >         } else yes <- yes[rep_len(seq_len(n.yes), n)]
>     >     }
>     >     ans <- yes
>     >     if (n.no == 1L)
>     >         ans[!test] <- no
>     >     else
>     >         ans[!test & !is.na(test)] <- no[
>     >             if (n.no == n) !test & !is.na(test)
>     >             else rep_len(seq_len(n.no), n)[!test &
!is.na(test)]]
>     >     stopifnot(length(NA.) == 1L)
>     >     ans[is.na(test)] <- NA.
>     >     ans
>     > }
>
>     > Note argument evaluation order: 'test', 'yes',
'no', 'NA.'.
>     > First, it chooses the first of 'yes' and 'no' that
has the same
> length as the result. If none of 'yes' and 'no' matches the
length of the
> result, it chooses recycled (or truncated) 'yes'.
>     > It uses 'rep' on the index and subsetting as a substitute
for 'rep'
> on the value.
>     > It requires 'length' method that is compatible with
subsetting.
>     > Additionally, it uses the same idea as dplyr::if_else, or more
> precisely the helper function 'replace_with'. It doesn't use
'rep' if the
> length of 'no' is 1 or is the same as the length of the result. For
> subassignment with value of length 1, recycling happens by itself and NA in
> index is OK.
>     > It limits 'NA.' to be of length 1, considering
'NA.' just as a label
> for NA.
>
>     > Cases where the last version above or 'ifelse2 or
'ifelseHW' in
> ifelse-def.R gives inappropriate answers:
>     > - 'yes' and 'no' are "difftime" objects
with different "units"
> attribute
>     > - 'yes' and 'no' are "POSIXlt" objects
with different time zone
>     > Example: 'yes' in "UTC" and 'no' in
"EST5EDT". The reverse, 'yes' in
> "EST5EDT" and 'no' in "UTC" gives error.
>
>     > For the cases, c(yes, no) helps. Function 'ifelseJH' in
ifelse-def.R
> gives a right answer for "POSIXlt" case.
>     > ---------------------
>     > Martin et al.,
>
>
>
>
>     > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at
> stat.math.ethz.ch
>     >> wrote:
>
>     >>
>     >> Note that my premise was really to get *away* from inheriting
>     >> too much from 'test'.
>     >> Hence, I have *not* been talking about replacing ifelse() but
>     >> rather of providing a new  ifelse2()
>     >>
>     >>         [ or if_else()  if Hadley was willing to ditch the
dplyr one
>     >>                         in favor of a base one]
>     >>
>     >>      > Specifically, based on an unrelated discussion with
Henrik
> Bengtsson
>     >> on
>     >>      > Twitter, I wonder if preserving the recycling
behavior test
> is
>     >> longer than
>     >>      > yes, no, but making the case where
>     >>
>     >>      > length( test ) < max(length( yes ), length( no ))
>     >>
>     >>      > would simplify usage for userRs in a useful way.
>     >>
>
>     > That was a copyediting bug on my part, it seems I hit send with my
> message
>     > only half-edited/proofread. Apologies.
>
>     > That should have said that making the case where test is the one
> that will
>     > be recycled (because it is shorter than either yes or no) an
error.
> My
>     > claim is that the fact that test itself can be recycled, rather
than
> just
>     > yes or no, is confusing to many R users. If we are writing an
> ifelse2 we
>     > might want to drop that feature and just throw an error in that
case.
>     > (Users could still use the original ifelse if they understand and
>     > specifically want that behavior).
>
>     > Does that make more sense?
>
>
>
>     >>
>     >>      > Also, If we combine a stricter contract that the
output will
> always
>     >> be of
>     >>      > length with the suggestion of a specified output
class
>     >>
>     >>
>     > Here, again, I was talking about the restriction that the output
be
>     > guaranteed to be the length of test, regardless of the length of
yes
> and
>     > no. That, combined with a specific, guaranteed output class would
> make a
>     > much narrower/more restricted but also (I argue) much easier to
> understand
>     > function. Particularly for beginning and intermediate users.
>
>     > I do hear what you're saying about silent conversion, though,
so
> what I'm
>     > describing might be a third function (ifelse3 for lack of a better
> name for
>     > now), as you pointed out.
>
>
>     >> that was not my intent here.... but would be another
interesting
>     >> extension. However, I would like to keep  R-semantic silent
> coercions
>     >> such as
>     >>            logical < integer < double < complex
>     >>
>     >> and your pseudo code below would not work so easily I think.
>     >>
>     >>      > the pseudo code could be
>     >>
>     >> (I'm changing assignment '=' to  '<-'
...  [please!] )
>     >>
>     >>      > ifelse2 <- function(test, yes, no, outclass) {
>     >>      >   lenout  <- length(test)
>     >>      >   out <- as( rep(yes, length.out <- lenout),
outclass)
>     >>      >   out[!test] <- as(rep(no, length.out =
lenout)[!test],
> outclass)
>     >>      >   # handle NA stuff
>     >>      >   out
>     >>      > }
>     >>
>     >>
>     >>      > NAs could be tricky if outclass were allowed to be
completely
>     >> general, but
>     >>      > doable, I think? Another approach  if we ARE
fast-passing
> while
>     >> leaving
>     >>      > ifelse intact is that maybe NA's in test just
aren't allowed
> in
>     >> ifelse2.
>     >>      > I'm not saying we should definitely do that, but
it's
> possible and
>     >> would
>     >>      > make things faster.
>     >>
>     >>      > Finally, In terms of efficiency, with the stuff that
Luke
> and I are
>     >> working
>     >>      > on, the NA detection could be virtually free in
certain
> cases, which
>     >> could
>     >>      > give a nice boost for long vectors  that don't
have any NAs
> (and
>     >> 'know'
>     >>      > that they don't).
>     >>
>     >> That *is* indeed a very promising prospect!
>     >> Thank you in advance!
>     >>
>     >>      > Best,
>     >>      > ~G
>     >>
>     >> I still am bit disappointed by the fact that it seems nobody
has
>     >> taken a good look at my ifelse2() proposal.
>     >>
>
>     > I plan to look at it soon. Thanks again for all your work.
>
>     > ~G
>
>
>     >>
>     >> I really would like an alternative to ifelse() in *addition*
to
>     >> the current ifelse(), but hopefully in the future being used
in
>     >> quite a few places instead of ifelse()
>     >> efficiency but for changed semantics, namely working for
> considerably
>     >> more "vector like" classes of  'yes' and
'no'  than the current
>     >> ifelse().
>     >>
>     >> As I said, the current proposal works for objects of class
>     >>     "Date", "POSIXct",
"POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
>     >> and hopefully for "sparseVector" (in a next version
of the 'Matrix'
> pkg).
>     >>
>     >> Martin
>     >>
>     >>      > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler
<
>     >> maechler at stat.math.ethz.ch
>     >>      >> wrote:
>     >>
>     >>      >> Finally getting back to this :
>     >>      >>
>     >>      >> >>>>> Hadley Wickham
<h.wickham at gmail.com>
>     >>      >> >>>>>     on Mon, 15 Aug 2016
07:51:35 -0500 writes:
>     >>      >>
>     >>      >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley
Wickham
>     >>      >> > <h.wickham at gmail.com> wrote:
>     >>      >> >>> >> One possibility would also
be to consider a
>     >>      >> >>> "numbers-only" or
>> rather "same type"-only {e.g.,
>     >>      >> >>> would also work for characters}
>> version.
>     >>      >> >>>
>     >>      >> >>> > I don't know what you mean
by these.
>     >>      >> >>>
>     >>      >> >>> In the mean time, Bob Rudis
mentioned dplyr::if_else(),
>     >>      >> >>> which is very relevant, thank you
Bob!
>     >>      >> >>>
>     >>      >> >>> As I have found, that actually
works in such a "same
>     >>      >> >>> type"-only way: It does not
try to coerce, but gives an
>     >>      >> >>> error when the classes differ, even
in this somewhat
>     >>      >> >>> debatable case :
>     >>      >> >>>
>     >>      >> >>> > dplyr::if_else(c(TRUE, FALSE),
2:3, 0+10:11) Error:
>     >>      >> >>> `false` has type 'double'
not 'integer'
>     >>      >> >>> >
>     >>      >> >>>
>     >>      >> >>> As documented, if_else() is clearly
stricter than
>     >>      >> >>> ifelse() and e.g., also does no
recycling (but of
>     >>      >> >>> length() 1).
>     >>      >> >>
>     >>      >> >> I agree that if_else() is currently too
strict - it's
>     >>      >> >> particularly annoying if you want to
replace some values
>     >>      >> >> with a missing:
>     >>      >> >>
>     >>      >> >> x <- sample(10) if_else(x > 5,
NA, x) # Error: `false`
>     >>      >> >> has type 'integer' not
'logical'
>     >>      >> >>
>     >>      >> >> But I would like to make sure that this
remains an error:
>     >>      >> >>
>     >>      >> >> if_else(x > 5, x, "BLAH")
>     >>      >> >>
>     >>      >> >> Because that seems more likely to be a
user error (but
>     >>      >> >> reasonable people might certainly
believe that it should
>     >>      >> >> just work)
>     >>      >> >>
>     >>      >> >> dplyr is more accommodating in other
places (i.e. in
>     >>      >> >> bind_rows(), collapse() and the joins)
but it's
>     >>      >> >> surprisingly hard to get all the
details right. For
>     >>      >> >> example, what should the result of this
call be?
>     >>      >> >>
>     >>      >> >> if_else(c(TRUE, FALSE),
factor(c("a", "b")),
>     >>      >> >> factor(c("c", "b"))
>     >>      >> >>
>     >>      >> >> Strictly speaking I think you could
argue it's an error,
>     >>      >> >> but that's not very user-friendly.
Should it be a factor
>     >>      >> >> with the union of the levels? Should it
be a character
>     >>      >> >> vector + warning? Should the behaviour
change if one set
>     >>      >> >> of levels is a subset of the other set?
>     >>      >> >>
>     >>      >> >> There are similar issues for POSIXct
(if the time zones
>     >>      >> >> are different, which should win?), and
difftimes
>     >>      >> >> (similarly for units).  Ideally
you'd like the behaviour
>     >>      >> >> to be extensible for new S3 classes,
which suggests it
>     >>      >> >> should be a generic (and for the most
general case, it
>     >>      >> >> would need to dispatch on both
arguments).
>     >>      >>
>     >>      >> > One possible principle would be to use c()
-
>     >>      >> > i.e. construct out as
>     >>      >>
>     >>      >> > out <- c(yes[0], no[0]
>     >>      >> > length(out) <- max(length(yes),
length(no))
>     >>      >>
>     >>      >> yes; this would require that a  `length<-`
method works for
> the
>     >>      >> class of the result.
>     >>      >>
>     >>      >> Duncan Murdoch mentioned a version of this, in
his very
>     >>      >> first reply:
>     >>      >>
>     >>      >> ans <- c(yes, no)[seq_along(test)]
>     >>      >> ans <- ans[seq_along(test)]
>     >>      >>
>     >>      >> which is less efficient for atomic vectors, but
requires
>     >>      >> less from the class: it "only" needs
`c` and `[` to work
>     >>      >>
>     >>      >> and a mixture of your two proposals would be
possible too:
>     >>      >>
>     >>      >> ans <- c(yes[0], no[0])
>     >>      >> ans <- ans[seq_along(test)]
>     >>      >>
>     >>      >> which does *not* work for my "mpfr"
numbers (CRAN package
> 'Rmpfr'),
>     >>      >> but that's a buglet in the  c.mpfr()
implementation of my
> Rmpfr
>     >>      >> package... (which has already been fixed in the
development
> version
>     >> on
>     >>      >> R-forge,
>     >>      >> https://r-forge.r-project.org/R/?group_id=386)
>     >>      >>
>     >>      >> > But of course that wouldn't help with
factor responses.
>     >>      >>
>     >>      >> Yes.  However, a version of Duncan's
suggestion -- of
> treating
>     >> 'yes' first
>     >>      >> -- does help in that case.
>     >>      >>
>     >>      >> For once, mainly as "feasability
experiment",
>     >>      >> I have created a github gist to make my current
ifelse2()
> proposal
>     >>      >> available
>     >>      >> for commenting, cloning, pullrequesting, etc:
>     >>      >>
>     >>      >> Consisting of 2 files
>     >>      >> - ifelse-def.R :  Functions definitions only,
basically all
> the
>     >> current
>     >>      >> proposals, called  ifelse*()
>     >>      >> - ifelse-checks.R : A simplistic checking
function
>     >>      >> and examples calling it, notably demonstrating
that my
>     >>      >> ifelse2()  does work with
>     >>      >> "Date", <dateTime> (i.e.
"POSIXct" and "POSIXlt"), factors,
>     >>      >> and "mpfr" (the arbitrary-precision
numbers in my package
> "Rmpfr")
>     >>      >>
>     >>      >> Also if you are not on github, you can quickly
get to the
> ifelse2()
>     >>      >> definition :
>     >>      >>
>     >>      >> https://gist.github.com/mmaechler/
> 9cfc3219c4b89649313bfe6853d878
>     >>      >> 94#file-ifelse-def-r-L168
>     >>      >>
>     >>      >> > Also, if you're considering an improved
ifelse(), I'd
>     >>      >> > strongly urge you to consider adding an
`na` argument,
>     >>      >>
>     >>      >> I now did (called it 'NA.').
>     >>      >>
>     >>      >> > so that you can use ifelse() to transform
all three
>     >>      >> > possible values in a logical vector.
>     >>      >>
>     >>      >> > Hadley
>     >>      >> > -- http://hadley.nz
>     >>      >>
>     >>      >> For those who really hate GH (and don't want
or cannot
> easily
>     >> follow the
>     >>      >> above URL), here's my current definition:
>     >>      >>
>     >>      >>
>     >>      >> ##' Martin Maechler, 14. Nov 2016 --- taking
into account
> Duncan M.
>     >> and
>     >>      >> Hadley's
>     >>      >> ##' ideas in the R-devel thread starting at
(my mom's 86th
>     >> birthday):
>     >>      >> ##'
https://stat.ethz.ch/pipermail/r-devel/2016-August/
> 072970.html
>     >>      >> ifelse2 <- function (test, yes, no, NA. = NA)
{
>     >>      >> if(!is.logical(test)) {
>     >>      >> if(is.atomic(test))
>     >>      >> storage.mode(test) <- "logical"
>     >>      >> else ## typically a "class";
storage.mode<-() typically
> fails
>     >>      >> test <- if(isS4(test)) methods::as(test,
"logical") else
>     >>      >> as.logical(test)
>     >>      >> }
>     >>      >>
>     >>      >> ## No longer optimize the  "if (a) x else
y"  cases:
>     >>      >> ## Only "non-good" R users use
ifelse(.) instead of if(.)
> in these
>     >>      >> cases.
>     >>      >>
>     >>      >> ans <-
>     >>      >> tryCatch(rep(if(is.object(yes) &&
identical(class(yes),
> class(no)))
>     >>      >> ## as c(o) or o[0] may not work for the class
>     >>      >> yes else c(yes[0], no[0]), length.out >    
>>      >> length(test)),
>     >>      >> error = function(e) { ## try asymmetric,
yes-leaning
>     >>      >> r <- yes
>     >>      >> r[!test] <- no[!test]
>     >>      >> r
>     >>      >> })
>     >>      >> ok <- !(nas <- is.na(test))
>     >>      >> if (any(test[ok]))
>     >>      >> ans[test & ok] <- rep(yes, length.out =
length(ans))[test &
> ok]
>     >>      >> if (any(!test[ok]))
>     >>      >> ans[!test & ok] <- rep(no, length.out =
length(ans))[!test
> & ok]
>     >>      >> ans[nas] <- NA. # possibly coerced to
class(ans)
>     >>      >> ans
>     >>      >> }
>     >>      >>
>     >>      >> ______________________________________________
>     >>      >> R-devel at r-project.org mailing list
>     >>      >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>      >>
>     >>
>     >>
>     >>
>     >>      > --
>     >>      > Gabriel Becker, PhD
>     >>      > Associate Scientist (Bioinformatics)
>     >>      > Genentech Research
>     >>
>     >>      > [[alternative HTML version deleted]]
>     >>
>     >>
>
>
>     > --
>     > Gabriel Becker, PhD
>     > Associate Scientist (Bioinformatics)
>     > Genentech Research
>
>     >     [[alternative HTML version deleted]]
>
>     > ______________________________________________
>     > R-devel at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more possibly parallel threads

R devel - Nov 2016 - ifelse() woes ... can we agree on a ifelse2() ?

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Seemingly Similar Threads