thr3ads.net - R devel - [Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444? [Aug 2025]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2025-Aug-25 15:56 UTC

[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?

>>>>> Dirk Eddelbuettel 
>>>>>     on Mon, 25 Aug 2025 05:46:56 -0500 writes:
    > On 25 August 2025 at 11:54, Martin Maechler wrote:
    > | I still agree we should address this:  We do have a discrepancy
    > | with mean() i.e.,  mean.default()  which does "exactly"
what you
    > | do "by hand" above,  and hence using is.na() for the full
    > | complex vector, and not *separately* for Re() and Im() parts;
    > | ... and I do tend to agree that  colMeans(*, na.rm=TRUE) etc
    > | probably should be adapted to *not* work coordinate-wise but
    > | drop all "complex NAs" both for Re and Im.

    > I agree with that.

    > The culprit is the per-component call (here from colMeans, same for
rowMeans)

    > z <- if(is.complex(x))
    >       .Internal(colMeans(Re(x), n, prod(dn), na.rm)) +
    >  1i * .Internal(colMeans(Im(x), n, prod(dn), na.rm))
    > else  .Internal(colMeans(x, n, prod(dn), na.rm))

    > where we may need to add a na.omit() if na.rm is TRUE.

unfortunately, no:  na.omit() cannot be the solution easily for
_both_ colMeans() and rowMeans()  { colSums() & rowSums() !
which already make use of na.rm,
as indeed  na.omit() treats matrices as data matrices, i.e.,
rows treated very differently than columns,

but in any case, it would drop full rows (or columns if you try to transpose
first) and
that may be very wrong as different rows/columns may have NAs in verydifferent
locations.

I really think a solution must happen in C code.

Further, note that the

   if(is.complex(x))
	    .Internal(colMeans(Re(x), n, prod(dn), na.rm)) +
       1i * .Internal(colMeans(Im(x), n, prod(dn), na.rm))

with na.rm = TRUE  has been *wrong* always, for all cases where
is.na(Re(x))  and
is.na(Im(x))  have differed ...

So this problem is really a new one, only tangentially related to the
original PR #18918  and also only loosely related Mikael's / my
patch to fix PR #18918.

As you have unearthed it, I'm fine if you open an R bugzilla
issue for it;  otherwise, I'd do it myself.

Martin

Dirk Eddelbuettel

2025-Aug-25 22:09 UTC

head link

[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?

On 25 August 2025 at 17:56, Martin Maechler wrote:
| >>>>> Dirk Eddelbuettel 
| >>>>>     on Mon, 25 Aug 2025 05:46:56 -0500 writes:
| I really think a solution must happen in C code.

I tend to agree here. If I had seen an easy and obvious path to a fix for
this, I would have proposed a patch from the outset. But as I did not, this
email thread started instead to begin with a discussion.

An 'obvious' solution is the only one I can think of now: we could pass
both
the real and the imaginary parts down to compute the mean of the imaginary
part. A new one-off function just for complex data seems like overkill but
maybe it is the way to go (doing one pass and computing real and imaginary
parts in that pass).

| Further, note that the
| 
|    if(is.complex(x))
| 	    .Internal(colMeans(Re(x), n, prod(dn), na.rm)) +
|        1i * .Internal(colMeans(Im(x), n, prod(dn), na.rm))
| 
| with na.rm = TRUE  has been *wrong* always, for all cases where
| is.na(Re(x))  and
| is.na(Im(x))  have differed ...

Prior to r88444, the NA assignment for complex data assigned to both the real
and imaginary part, making it unlikely both differed. As best as I can tell a
user would have had to explicitly set that. 

| So this problem is really a new one, only tangentially related to the
| original PR #18918  and also only loosely related Mikael's / my
| patch to fix PR #18918.

I see this differently.

Prior to r88444 / #18918 we could reliably call rowMeans() and colMeans() on
complex-valued matrices containing NA values. A unit test for Rcpp relied on
it (to validate its own row/col means method). Following r88444 this test in
Rcpp now breaks. And it breaks because the R side is now incorrect as we pass
the imaginary alone, with a finite value of zero where the real part has a NA
rendering the divisor for the mean values invalid. It really looks like a bug
that is a consequence of r88444 to me. Note that R also did not have a test
case for row or column means of complex valued data with NA or else this
would have bubbled up sooner. So apart from fixing the issue we should
probably add a test too.
 
| As you have unearthed it, I'm fine if you open an R bugzilla
| issue for it;  otherwise, I'd do it myself.

It doesn't matter to me. Maybe it is easier if you do it.

r88444 was a good change. Everybody seems to agree that assigning NA only to
the real part is cleaner. But we are having a small side effect here so
let's
see about fixing that too.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org

R devel - Aug 2025 - Do rowMeans and colMeans of complex vars need adjusting following r88444?

[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?

[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?