thr3ads.net - R devel - [Rd] Time to revisit ifelse ? [Aug 2025]

If this information is useful, please help other people find it:
Share via:

Mikael Jagan

2025-Jul-11 08:41 UTC

[Rd] Time to revisit ifelse ?

I don't mind putting together a minimal package with some prototypes, tests,
comparisons, etc.  But perhaps we should aim for consensus on a few issues
beforehand.  (Sorry if these have been discussed to death already elsewhere.
In that case, links to relevant threads would be helpful ...)

     1. Should the type and class attribute of the return value be exactly the
        type and class attribute of c(yes[0L], no[0L]), independent of
'test'?
        Or something else?

     2. What should be the attributes of the return value (other than
'class')?

        base::ifelse keeps attributes(test) if 'test' is atomic, which
seems
        like desirable behaviour, though dplyr and data.table seem to think
        otherwise:

            > x <- diag(TRUE, 4L)
            > base::ifelse(x, 1, -1)
                 [,1] [,2] [,3] [,4]
            [1,]    1   -1   -1   -1
            [2,]   -1    1   -1   -1
            [3,]   -1   -1    1   -1
            [4,]   -1   -1   -1    1
            > dplyr::if_else(x, 1, -1)
            Error in if (n_processed == n_conditions && any(are_unused))
{ :
              missing value where TRUE/FALSE needed
            > data.table::fifelse(x, 1, -1)
             [1]  1 -1 -1 -1 -1  1 -1 -1 -1 -1  1 -1 -1 -1 -1  1

     3. Should the new function be stricter and/or more verbose?  E.g., should
        it signal a condition if length(yes) or length(no) is not equal to 1
        nor length(test)?

     4. Should the most common case, in which neither 'yes' nor
'no' has a
        'class' attribute, be handled in C?  The remaining cases might
rely on
        method dispatch and thus require a separate "generic"
implementation in
        R.  How much faster/more efficient would the C implementation have to
        be to justify the cost (more maintenance for R-core, more obfuscation
        for the average user)?

FWIW, my first (and untested) approximation of an ifelse2 is just this:

     function (test, yes, no)
     {
         if (is.atomic(test)) {
             if (!is.logical(test))
                 storage.mode(test) <- "logical"
         }
         else test <- if (isS4(test)) methods::as(test, "logical")
else
as.logical(test)
         nt <- length(test)
         if (nt == 1L) {
             ans <-
                 if (is.na(test))
                     c(yes[0L], no[0L])[1L]
                 else if (test)
                     c(yes[1L], no[0L])
                 else c(yes[0L], no[1L])
         } else {
             ans <- rep(c(yes[0L], no[0L]), length.out = nt)
             ny <- length(yes)
             nn <- length( no)
             jy <- which( test)
             jn <- which(!test)
             if (length(jy))
                 ans[jy] <- if (ny == 1L) yes else if (ny >= nt) yes[jy]
else
rep(yes, length.out = nt)[jy]
             if (length(jn))
                 ans[jn] <- if (nn == 1L)  no else if (nn >= nt)  no[jn]
else
rep( no, length.out = nt)[jn]
         }
         at <- attributes(test)
         if (!is.null(at)) {
             at[["class"]] <- oldClass(ans)
             attributes(ans) <- at
         }
         ans
     }

Mikael
> Date: Wed, 9 Jul 2025 12:06:49 +0200
> From: Martin Maechler <maechler at stat.math.ethz.ch>
> 
>>>>>> Mikko Marttila via R-devel
>>>>>>      on Wed, 09 Jul 2025 09:02:38 +0000 writes:
> 
>      > Thanks Antoine for starting this discussion. It would indeed be
great to see
>      > an improved `ifelse()` in base R.
> 
>      > I also agree with Duncan's suggestion that the way to proceed
would be to
>      > create a package where the improved version could be drafted,
discussed and
>      > refined so that R Core would have a concrete proposal to consider
in the end.
> 
>      > Some initial thoughts on what should be considered:
> 
>      > Performance has been mentioned a few times. While it would of
course be nice
>      > to see improvements there I think the main goal is in the API.
The goal for
>      > performance should rather be that it doesn't deteriorate
unacceptably.
> 
>      > While data.table's and dplyr's ifelse variants may serve
as a good starting
>      > point for identifiying the improvements needed, I don't think
either is a good
>      > candidate for simply copying as the base R candidate. A function
in base R
>      > should adhere to the conventions in base R; neither of the
packages does that.
>      > They instead have their own stricter requirements. For example:
> 
>      > * Incompatible lengths: Base R recycles with a warning, both
packages error out.
>      > * Different classes: Base R coerces loosely, dplyr uses stricter
coercion rules
>      > based on vctrs, and data.table doesn't allow any coercion.
> 
>      > Another point to consider is the handling of attributes for the
result.
>      > data.table copies from the first non-NA input from left to right,
while dplyr
>      > delegates to vctrs again for merging the attributes gracefully.
This matters
>      > for example for factors, where data.table special-cases them to
require the
>      > same levels, wherease dplyr merges them. For a base R solution,
it would make
>      > sense to delegate the attribute handling to `c()` somehow, as
that's conceptually
>      > what should be happening; we're combining values from the
`yes` and `no` objects.
> 
>      > I'm sure there are many other points to consider, but as I
said this is what
>      > comes to mind at first. Best of luck with the effort.
> 
>      > Kind regards,
> 
>      > Mikko
> 
>      [..........]
> 
>      >> -----Original Message-----
>      >> From: R-devel r-devel-bounces at r-project.org On Behalf Of
Duncan Murdoch
>      >> Sent: Tuesday, July 8, 2025 3:06 PM
>      >> To: Josiah Parry josiah.parry at gmail.com; Avraham Adler
avraham.adler at gmail.com
>      >> Cc: r-devel at r-project.org
>      >> Subject: Re: [Rd] Time to revisit ifelse ?
>      >>
>      >> Since you and Antoine are volunteering to do the work, why
not start in
>      >> the way I suggested? Write up a comparison of the known
ifelse
>      >> implementations, and either pick the best one, or choose the
best parts
>      >> of each. Put the result in a package containing nothing else,
and
>      >> invite comment from the wider community.
>      >>
>      >> My only comment in advance is that the package should have no
>      >> dependencies other than base packages, for two reasons:
>      >>
>      >> 1. The hope is to have it adopted in base R, and for that it
can't have
>      >> any other dependencies.
>      >>
>      >> 2. If it's never adopted by R Core, I might still want to
use it, but I
>      >> don't want to add extra dependencies for just one little
function.
>      >>
>      >> Duncan Murdoch
> 
>      [................]
> 
> Thank you, Mikko, Antoine, Duncan, etc
> I'm trying to summarize the things I agree / or find important.
> Note that we had ifelse() discussions in the past (on this
> mailing list and/or possibly on R-help); I did get involved and
> spent many hours on coding myself, with no convincing result
> IIRC,  but I do vaguely remember I got very convinced we should
> *not* plan to replace ifelse() but add a second version, say
> if.else() (as  "if_else" is already taken by dplyr).
> 
> 1) Antoine Fabri proposed that base R should get *another*
>    version of ifelse() *in addition* to ifelse().   The issue
>    hence is *NOT* replacing ifelse() by something incompatible.
> 
> 2) Duncan Murdoch's points are *very* much to the point, most
>     importantly:
>     
>      Propose (with discussion / RFC / ...) a function in a (single
>      function) package which only depends on R's base package.
> 
>   I'd add to that that you should probably use the GPL-2 licence
>   or are willing to donate it with that licence to R and do say so;
>   e.g., we cannot add MIT-licenced things to R.
> 
> 3) Ben Bolker's offer to "host" such a function in his
'gtools'
>     package (w/ 0-dependency) would also be acceptable to me,
>     even though it is against DM's "2. If it's never adopted
by R Core, .."
> 
> Best,
> Martin
> 
> --
> Martin Maechler
> ETH Zurich  and  R Core team
>

Ivan Krylov

2025-Jul-11 20:01 UTC

head link

[Rd] Time to revisit ifelse ?

On Fri, 11 Jul 2025 04:41:13 -0400
Mikael Jagan <jaganmn2 at gmail.com> wrote:
> But perhaps we should aim for consensus on a few issues beforehand.
Thank you for raising this topic!
> (Sorry if these have been discussed to death already elsewhere. In
> that case, links to relevant threads would be helpful ...)
The data.table::fifelse issue [1] comes to mind together with the vctrs
article section about the need for a less strict ifelse() [2]. 
>      1. Should the type and class attribute of the return value be
> exactly the type and class attribute of c(yes[0L], no[0L]),
> independent of 'test'? Or something else?
Can we afford an escape hatch for cases when one of the ifelse()
branches is NA or other special value handled by the '[<-' method
belonging to the class of the other branch? data.table::fifelse() has a
not exactly documented special case where it coerces NA_LOGICAL to the
appropriate type, so that data.table::fifelse(runif(10) < .5,
Sys.Date(), NA) works as intended, and dplyr::if_else also supports
this case, but none of the other ifelses I tested do that.

Can we say that if only some of the 'yes' / 'no' / 'na'
arguments have
classes, those must match and they determine the class of the return
value? It could be convenient, and it also could be a source of bugs.
>      2. What should be the attributes of the return value (other than
> 'class')?
data.table::fifelse (and kit::iif, which shares a lot of the code) also
preserve the names, but neither dplyr nor hutils do. I think it would
be reasonable to preserve the 'dim' attribute and thus the
'dimnames'
attribute too.
>      3. Should the new function be stricter and/or more verbose?
> E.g., should it signal a condition if length(yes) or length(no) is
> not equal to 1 nor length(test)?
Leaning towards yes, but only because I haven't met any uses for
recycling of non-length-1 inputs myself. An allow.recycle=FALSE option
is probably overkill, right?
>      4. Should the most common case, in which neither 'yes' nor
'no'
> has a 'class' attribute, be handled in C?
This could be a very reasonable performance-correctness trade-off.
> FWIW, my first (and untested) approximation of an ifelse2 is just
> this:
> 
>      function (test, yes, no)
I think a widely asked-for feature is a separate 'na' branch.

-- 
Best regards,
Ivan

[1] https://github.com/rdatatable/data.table/issues/3657

[2] https://vctrs.r-lib.org/articles/stability.html#ifelse

GILLIBERT, Andre

2025-Aug-01 17:13 UTC

head link

[Rd] Time to revisit ifelse ?

Martin Maechler <maechler at stat.math.ethz.ch>
wrote:> I don't mind putting together a minimal package with some prototypes,
tests,
> comparisons, etc.  But perhaps we should aim for consensus on a few issues
> beforehand.  (Sorry if these have been discussed to death already
elsewhere.
> In that case, links to relevant threads would be helpful ...)
>
>      1. Should the type and class attribute of the return value be exactly
the
>         type and class attribute of c(yes[0L], no[0L]), independent of
'test'?
>         Or something else?
>
>      2. What should be the attributes of the return value (other than
'class')?
>
>         base::ifelse keeps attributes(test) if 'test' is atomic,
which seems
>         like desirable behaviour, though dplyr and data.table seem to think
>         otherwise:
In my experience, base::ifelse keeping attributes of 'test' is useful
for names.
It may also be useful for dimensions, but for other attributes, it may be a
dangerous feature.
Otherwise, attributes of c(yes, no) should be mostly preserved in my opinion.
> 3. Should the new function be stricter and/or more verbose?  E.g., should
>         it signal a condition if length(yes) or length(no) is not equal to
1
>         nor length(test)?
To be consistent with base R, it should warn if length(yes), length(no) and
length(test) are not divisors of the longest, otherwise silently repeat the
three vectors to get the same sizes.
This would work consistently with mathematical operators such as test+yes+no.

In my personal experience, the truncation of 'yes' and 'no' to
length(test) if the most dangerous feature of ifelse().
>      4. Should the most common case, in which neither 'yes' nor
'no' has a
>         'class' attribute, be handled in C?  The remaining cases
might rely on
>        method dispatch and thus require a separate "generic"
implementation in
>      R.  How much faster/more efficient would the C implementation have to
>        be to justify the cost (more maintenance for R-core, more
obfuscation
>       for the average user)?
If the function is not much slower than today ifelse(), it is not worth
rewriting in C in my opinion.

Thank you for an implementation!
A few examples of misbehaviors (in my opinion):
> ifelse2(c(a=TRUE), factor("a"), factor("b")) Error in as.character.factor(x) : malformed factor
> ifelse2(TRUE, factor(c("a","b")),
factor(c("b","a")))[1] a
Levels: a b

I would expect this one to output
[1] a b
Levels: a b

I tried to develop a function that behaves like mathematical operators (e.g.
test+yes+no) for length & dimensions coercion rules.
Please, find the function and a few tests below:

ifelse2 <- function (test, yes, no) {
	# forces evaluation of arguments in order
	test
	yes
	no

	if (is.atomic(test)) {
		if (!is.logical(test))
			storage.mode(test) <- "logical"
	}
	else test <- if (isS4(test)) methods::as(test, "logical") else
as.logical(test)

	ntest <- length(test)
	nyes <- length(yes)
	nno <- length(no)

	nn <- c(ntest, nyes, nno)
      nans <- max(nn)

	ans <- rep(c(yes[0L], no[0L]), length.out=nans)

	# check dimension consistency for arrays
	has.dim <- FALSE
	if (length(dim(test)) | length(dim(yes)) | length(dim(no))) {
		lparams <- list(test, yes, no)
		ldims <- lapply(lparams, dim)
		ldims <- ldims[!sapply(ldims, is.null)]
		ldimnames <- lapply(lparams, dimnames)
		ldimnames <- ldimnames[!sapply(ldimnames, is.null)]

		rdim <- ldims[[1]]
		rdimnames <- ldimnames[[1]]
		for(d in ldims) {
			if (!identical(d, rdim)) {
				stop(gettext("non-conformable arrays"))
			}
		}
		has.dim <- TRUE
	}

	if (any(nans %% nn)) {
		warning(gettext("longer object length is not a multiple of shorter object
length"))
	}

	if (ntest != nans) {test <- rep(test, length.out=nans)}
	if (nyes != nans) {yes <- rep(yes, length.out=nans)}
	if (nno != nans) {no <- rep(no, length.out=nans)}

	idx <- which( test)
	ans[idx] <- yes[idx]

	idx <- which(!test)
	ans[idx] <- no[idx]

	if (has.dim) {
		dim(ans) <- rdim
		dimnames(ans) <- rdimnames
	}

	if (!is.null(names(test))) {
		names(ans) <- names(test)
	}

	ans
}


ifelse2(c(alpha=TRUE,beta=TRUE,gamma=FALSE),factor(c("A","B","C","X")),factor(c("A","B","C","D")))
ifelse2(c(TRUE,FALSE), as.Date("2025-04-01"),
c("2020-07-05", "2022-07-05"))
ifelse2(c(a=TRUE, b=FALSE,c=TRUE,d=TRUE), list(42), list(40,45))
ifelse2(rbind(alpha=c(a=TRUE, b=FALSE),beta=c(c=TRUE,d=FALSE)), list(1:10),
list(2:20,3:30))
a=rbind(alpha=c(a=TRUE, b=FALSE),beta=c(TRUE,TRUE))
b=rbind(ALPHA=c(A=TRUE, B=FALSE),BETA=c(C=TRUE,D=TRUE))
c=rbind(ALPHA2=c(A2=TRUE, B2=FALSE),BETA2=c(C2=TRUE,D2=TRUE))
ifelse2(a,b,c)
dimnames(a) <- NULL
ifelse2(a,b,c)
dimnames(b) <- NULL
ifelse2(a,b,c)

-- 
Sincerely
Andr? GILLIBERT

R devel - Aug 2025 - Time to revisit ifelse ?

[Rd] Time to revisit ifelse ?

[Rd] Time to revisit ifelse ?

[Rd] Time to revisit ifelse ?