thr3ads.net - R devel - [Rd] The function cummax() seems to have a bug. [May 2015]

If this information is useful, please help other people find it:
Share via:

Dongcan Jiang

2015-May-17 11:13 UTC

[Rd] The function cummax() seems to have a bug.

Hi,

The function cummax() seems to have a bug.
> x <- c(NA, 0)
> storage.mode(x) <- "integer"
> cummax(x)[1] NA  0

The correct result of this case should be NA NA. The mistake in [
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
the reason.

Best Regards,
Dongcan

-- 
Dongcan Jiang
Team of Search Engine & Web Mining
School of Electronic Engineering & Computer Science
Peking University, Beijing, 100871, P.R.China

	[[alternative HTML version deleted]]

Henrik Bengtsson

2015-May-18 00:22 UTC

head link

[Rd] The function cummax() seems to have a bug.

Below is some further troubleshooting on this:
>From code inspection this bug happens for only:
* for integer values
* when the first element is NA_integer_ and the second is not.


Examples:

# Numeric/doubles works as expected> cummax(c(NA_real_, 0, 1, 2, 3))[1] NA NA NA NA NA

# It does not occur when the first value is non-NA> cummax(c(0L, NA_integer_, 1L, 2L, 3L))[1]  0 NA NA NA NA

# When first value is NA, it is not "remembered"
# (because internal for loop starts with 2nd element)> cummax(c(NA_integer_, 0L, 1L, 2L, 3L))[1] NA  0  1  2  3

The problem is not there for cummin():
> cummin(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA> cummin(c(NA_integer_, 0L, 1L, 2L, 3L))[1] NA NA NA NA NA

but that is just "pure luck" due to the fact how NA_integer_ is
internally represented as the smallest possible 4-byte signed integer,
i.e.

LibExtern int    R_NaInt;   /* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER  R_NaInt

Note the comment, which implies that code should not rely on the
assumption that NA_integer_ == NA_INTEGER == R_NaInt == INT_MIN; it
could equally well have been INT_MAX, which in case cummin()would
return the wrong result whereas cummax() wouldn't. So, cummin() makes
the same mistake ascummax(), where the for-loop skips the test for NA
of the first element, cf.
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L145-L148

The simple solution is probably to do (cf. native icumsum):

[HB-X201]{hb}: svn diff src/main/cum.c
Index: src/main/cum.c
==================================================================---
src/main/cum.c      (revision 68378)
+++ src/main/cum.c      (working copy)
@@ -130,7 +130,7 @@
     int *ix = INTEGER(x), *is = INTEGER(s);
     int max = ix[0];
     is[0] = max;
-    for (R_xlen_t i = 1 ; i < xlength(x) ; i++) {
+    for (R_xlen_t i = 0 ; i < xlength(x) ; i++) {
        if(ix[i] == NA_INTEGER) break;
        is[i] = max = (max > ix[i]) ? max : ix[i];
     }
@@ -142,7 +142,7 @@
     int *ix = INTEGER(x), *is = INTEGER(s);
     int min = ix[0];
     is[0] = min;
-    for (R_xlen_t i = 1 ; i < xlength(x) ; i++ ) {
+    for (R_xlen_t i = 0 ; i < xlength(x) ; i++ ) {
        if(ix[i] == NA_INTEGER) break;
        is[i] = min = (min < ix[i]) ? min : ix[i];
     }

/Henrik

On Sun, May 17, 2015 at 4:13 AM, Dongcan Jiang <dongcan.jiang at
gmail.com> wrote:> Hi,
>
> The function cummax() seems to have a bug.
>
>> x <- c(NA, 0)
>> storage.mode(x) <- "integer"
>> cummax(x)
> [1] NA  0
>
> The correct result of this case should be NA NA. The mistake in
> [https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may
be
> the reason.
>
> Best Regards,
> Dongcan
>
> --
> Dongcan Jiang
> Team of Search Engine & Web Mining
> School of Electronic Engineering & Computer Science
> Peking University, Beijing, 100871, P.R.China

Maybe Matching Threads

Search for more apparently analagous threads

R devel - May 2015 - The function cummax() seems to have a bug.

[Rd] The function cummax() seems to have a bug.

[Rd] The function cummax() seems to have a bug.

Maybe Matching Threads