Dirk Eddelbuettel
2025-Aug-25 10:46 UTC
[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?
On 25 August 2025 at 11:54, Martin Maechler wrote: | I still agree we should address this: We do have a discrepancy | with mean() i.e., mean.default() which does "exactly" what you | do "by hand" above, and hence using is.na() for the full | complex vector, and not *separately* for Re() and Im() parts; | ... and I do tend to agree that colMeans(*, na.rm=TRUE) etc | probably should be adapted to *not* work coordinate-wise but | drop all "complex NAs" both for Re and Im. I agree with that. The culprit is the per-component call (here from colMeans, same for rowMeans) z <- if(is.complex(x)) .Internal(colMeans(Re(x), n, prod(dn), na.rm)) + 1i * .Internal(colMeans(Im(x), n, prod(dn), na.rm)) else .Internal(colMeans(x, n, prod(dn), na.rm)) where we may need to add a na.omit() if na.rm is TRUE. But I didn't like that idea well enough yet yesterday to propose a patch along those lines as I did not think about the comparison with mean.default which is quite convincing. Having results differ even within r-devel is quite bad indeed, and mean.default, swept by hand across rows or columns, also return what r-release returns here for a matrix containing NAs. So maybe na.omit() is the best we can do here? Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Martin Maechler
2025-Aug-25 15:56 UTC
[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?
>>>>> Dirk Eddelbuettel >>>>> on Mon, 25 Aug 2025 05:46:56 -0500 writes:> On 25 August 2025 at 11:54, Martin Maechler wrote: > | I still agree we should address this: We do have a discrepancy > | with mean() i.e., mean.default() which does "exactly" what you > | do "by hand" above, and hence using is.na() for the full > | complex vector, and not *separately* for Re() and Im() parts; > | ... and I do tend to agree that colMeans(*, na.rm=TRUE) etc > | probably should be adapted to *not* work coordinate-wise but > | drop all "complex NAs" both for Re and Im. > I agree with that. > The culprit is the per-component call (here from colMeans, same for rowMeans) > z <- if(is.complex(x)) > .Internal(colMeans(Re(x), n, prod(dn), na.rm)) + > 1i * .Internal(colMeans(Im(x), n, prod(dn), na.rm)) > else .Internal(colMeans(x, n, prod(dn), na.rm)) > where we may need to add a na.omit() if na.rm is TRUE. unfortunately, no: na.omit() cannot be the solution easily for _both_ colMeans() and rowMeans() { colSums() & rowSums() ! which already make use of na.rm, as indeed na.omit() treats matrices as data matrices, i.e., rows treated very differently than columns, but in any case, it would drop full rows (or columns if you try to transpose first) and that may be very wrong as different rows/columns may have NAs in verydifferent locations. I really think a solution must happen in C code. Further, note that the if(is.complex(x)) .Internal(colMeans(Re(x), n, prod(dn), na.rm)) + 1i * .Internal(colMeans(Im(x), n, prod(dn), na.rm)) with na.rm = TRUE has been *wrong* always, for all cases where is.na(Re(x)) and is.na(Im(x)) have differed ... So this problem is really a new one, only tangentially related to the original PR #18918 and also only loosely related Mikael's / my patch to fix PR #18918. As you have unearthed it, I'm fine if you open an R bugzilla issue for it; otherwise, I'd do it myself. Martin