I've noticed that the max.col function with the default "random"
option often gives unexpected results. For instance, in this test, it
seems clear what the answer should be:
> # second col should always be max
> x1 = cbind(1:10, 2:11, -Inf)
>
> # this works fine
> max.col(x1, "first")
[1] 2 2 2 2 2 2 2 2 2 2>
> # this gives random answers
> max.col(x1)
> [1] 3 1 1 2 3 3 1 3 1 1
Ouch! max.col is randomizing across all values.
Even without infinite values, something similar can happen:
> # test 2 --- tolerance problems
>
> # clearly column 3 is the max
> x1 = cbind(-1e9 * 1:10, 1:10, 2:11)
>
> # again, first method works:
> max.col(x1, "first")
[1] 3 3 3 3 3 3 3 3 3 3>
> # but random doesn't
> max.col(x1)
[1] 2 3 2 3 3 2 2 2 3 2>
The max.col docs say " there is a relative tolerance of 1e-5, relative
to the largest entry in the row", but it's really using the maximum
absolute value entry in the row (appl/maxcol.c, line 35 in R 2.4.0).
Is this necessary for some sort of S-plus compatibility? If so, I
think it would be good to make this absolute value property very clear
in the docs, since it can cause subtle bugs (and did for me).
Personally, I think the behavior is much nicer with the following patch:
--- rplain/R-2.4.0/src/appl/maxcol.c 2006-04-09 18:19:58.000000000 -0400
+++ R-2.4.0/src/appl/maxcol.c 2006-12-14 15:30:56.000000000 -0500
@@ -26,13 +26,14 @@
do_rand = *ties_meth == 1;
for (r = 0; r < n_r; r++) {
- /* first check row for any NAs and find the largest abs(entry) */
+ /* first check row for any NAs and find the largest entry */
large = 0.0;
isna = FALSE;
for (c = 0; c < *nc; c++) {
a = matrix[r + c * n_r];
if (ISNAN(a)) { isna = TRUE; break; }
- if (do_rand) large = fmax2(large, fabs(a));
+ if (!R_FINITE(a)) continue;
+ if (do_rand) large = fmax2(large, a);
}
if (isna) { maxes[r] = NA_INTEGER; continue; }
--------------------------------------- END
-------------------------------------------------
This gives the expected behavior in the two examples above.