Henrik Bengtsson
2015-Jun-01 00:02 UTC
[Rd] sum(..., na.rm=FALSE): Summing over NA_real_ values much more expensive than non-NAs for na.rm=FALSE? Hmm...
I'm observing that base::sum(x, na.rm=FALSE) for typeof(x) == "double" is much more time consuming when there are missing values versus when there are not. I'm observing this on both Window and Linux, but it's quite surprising to me. Currently, my main suspect is settings in on how R was built. The second suspect is my brain. I hope that someone can clarify the below results and confirm or not whether they see the same. Note, this is for "doubles", so I'm not expecting early-stopping as for "integers" (where testing for NA is cheap). On R 3.2.0, on Windows (using the official CRAN builds), on Linux (local built), and on OS X (official AT&T builds), I get:> x <- rep(0, 1e8) > stopifnot(typeof(x) == "double") > system.time(sum(x, na.rm=FALSE))user system elapsed 0.19 0.01 0.20> y <- rep(NA_real_, 1e8) > stopifnot(typeof(y) == "double") > system.time(sum(y, na.rm=FALSE))user system elapsed 9.54 0.00 9.55> z <- x; z[length(z)/2] <- NA_real_ > stopifnot(typeof(z) == "double") > system.time(sum(z, na.rm=FALSE))user system elapsed 4.49 0.00 4.51 Following the source code, I'm pretty sure the code (https://github.com/wch/r-source/blob/trunk/src/main/summary.c#L112-L128) performing the calculation is: static Rboolean rsum(double *x, R_xlen_t n, double *value, Rboolean narm) { LDOUBLE s = 0.0; Rboolean updated = FALSE; for (R_xlen_t i = 0; i < n; i++) { if (!narm || !ISNAN(x[i])) { if(!updated) updated = TRUE; s += x[i]; } } if(s > DBL_MAX) *value = R_PosInf; else if (s < -DBL_MAX) *value = R_NegInf; else *value = (double) s; return updated; } In other words, when na.rm=FALSE, that inner for loop: for (R_xlen_t i = 0; i < n; i++) { if (!narm || !ISNAN(x[i])) { if(!updated) updated = TRUE; s += x[i]; } } should effectively become (because !ISNAN(x[i]) "does not make a difference"): for (R_xlen_t i = 0; i < n; i++) { if (!narm) { if(!updated) updated = TRUE; s += x[i]; } } That is, sum(x, na.rm=FALSE) basically spends time on `s += x[i]`. Now, I have always been under impression that summing with NA:s is *not* more expensive that summing over regular (double) values, which is confirmed by the below example, but the above benchmarking disagree. It looks like there is a big overhead keeping track of the sum `s` being NA, which is supported by the fact that summing over 'z' is costs half of 'y'. Now, I *cannot* reproduce the above using the following 'inline' example:> sum2 <- inline::cfunction(sig=c(x="double", narm="logical"), body='double *x_ = REAL(x); int narm_ = asLogical(narm); int n = length(x); double sum = 0; for (R_xlen_t i = 0; i < n; i++) { if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; } return ScalarReal(sum); ')> x <- rep(0, 1e8) > stopifnot(typeof(x) == "double") > system.time(sum2(x, narm=FALSE))user system elapsed 0.16 0.00 0.16> y <- rep(NA_real_, 1e8) > stopifnot(typeof(y) == "double") > system.time(sum2(y, narm=FALSE))user system elapsed 0.16 0.00 0.15> z <- x; z[length(z)/2] <- NA_real_ > stopifnot(typeof(z) == "double") > system.time(sum2(z, narm=FALSE))user system elapsed 0.16 0.00 0.15 This is why I suspect it's related to how R was configured when it was built. What's going on? Can someone please bring some light on this? Thanks Henrik
Henrik Bengtsson
2015-Jun-01 01:08 UTC
[Rd] sum(..., na.rm=FALSE): Summing over NA_real_ values much more expensive than non-NAs for na.rm=FALSE? Hmm...
This is a great example how you cannot figure it out after spending two hours troubleshooting, but a few minutes after you post to R-devel, it's just jumps to you (is there a word for this other than "impatient"?); Let me answer my own question. The discrepancy between my sum2() code and the internal code for base::sum() is that the latter uses LDOUBLE = long double (on some system it's only double, cf. https://github.com/wch/r-source/blob/trunk/src/nmath/nmath.h#L28-L33), whereas my sum2() code uses double. So using long double, I can reproduce the penalty of having NA_real_ with na.rm=FALSE;> sum3 <- inline::cfunction(sig=c(x="double", narm="logical"), body='#define LDOUBLE long double double *x_ = REAL(x); int narm_ = asLogical(narm); int n = length(x); LDOUBLE sum = 0.0; for (R_xlen_t i = 0; i < n; i++) { if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; } return ScalarReal((double)sum); ')> x <- rep(0, 1e8) > stopifnot(typeof(x) == "double") > system.time(sum3(x, narm=FALSE))user system elapsed 0.40 0.00 0.44> y <- rep(NA_real_, 1e8) > stopifnot(typeof(y) == "double") > system.time(sum3(y, narm=FALSE))user system elapsed 9.80 0.00 9.84> z <- x; z[length(z)/2] <- NA_real_ > stopifnot(typeof(z) == "double") > system.time(sum3(z, narm=FALSE))user system elapsed 4.49 0.00 4.50 This might even be what the following comment refers to: /* Required by C99 but might be slow */ #ifdef HAVE_LONG_DOUBLE # define LDOUBLE long double #else # define LDOUBLE double #endif So now I should rephrase my question: Is there away to avoid this penalty when using 'long double'? Is this something the compiler can be clever about, or is the only solution to not use 'long double'? /Henrik On Sun, May 31, 2015 at 5:02 PM, Henrik Bengtsson <henrik.bengtsson at ucsf.edu> wrote:> I'm observing that base::sum(x, na.rm=FALSE) for typeof(x) == "double" > is much more time consuming when there are missing values versus when > there are not. I'm observing this on both Window and Linux, but it's > quite surprising to me. Currently, my main suspect is settings in on > how R was built. The second suspect is my brain. I hope that someone > can clarify the below results and confirm or not whether they see the > same. Note, this is for "doubles", so I'm not expecting > early-stopping as for "integers" (where testing for NA is cheap). > > On R 3.2.0, on Windows (using the official CRAN builds), on Linux > (local built), and on OS X (official AT&T builds), I get: > >> x <- rep(0, 1e8) >> stopifnot(typeof(x) == "double") >> system.time(sum(x, na.rm=FALSE)) > user system elapsed > 0.19 0.01 0.20 > >> y <- rep(NA_real_, 1e8) >> stopifnot(typeof(y) == "double") >> system.time(sum(y, na.rm=FALSE)) > user system elapsed > 9.54 0.00 9.55 > >> z <- x; z[length(z)/2] <- NA_real_ >> stopifnot(typeof(z) == "double") >> system.time(sum(z, na.rm=FALSE)) > user system elapsed > 4.49 0.00 4.51 > > Following the source code, I'm pretty sure the code > (https://github.com/wch/r-source/blob/trunk/src/main/summary.c#L112-L128) > performing the calculation is: > > static Rboolean rsum(double *x, R_xlen_t n, double *value, Rboolean narm) > { > LDOUBLE s = 0.0; > Rboolean updated = FALSE; > for (R_xlen_t i = 0; i < n; i++) { > if (!narm || !ISNAN(x[i])) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > if(s > DBL_MAX) *value = R_PosInf; > else if (s < -DBL_MAX) *value = R_NegInf; > else *value = (double) s; > return updated; > } > > In other words, when na.rm=FALSE, that inner for loop: > > for (R_xlen_t i = 0; i < n; i++) { > if (!narm || !ISNAN(x[i])) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > > should effectively become (because !ISNAN(x[i]) "does not make a difference"): > > for (R_xlen_t i = 0; i < n; i++) { > if (!narm) { > if(!updated) updated = TRUE; > s += x[i]; > } > } > > That is, sum(x, na.rm=FALSE) basically spends time on `s += x[i]`. > Now, I have always been under impression that summing with NA:s is > *not* more expensive that summing over regular (double) values, which > is confirmed by the below example, but the above benchmarking > disagree. It looks like there is a big overhead keeping track of the > sum `s` being NA, which is supported by the fact that summing over 'z' > is costs half of 'y'. > > Now, I *cannot* reproduce the above using the following 'inline' example: > >> sum2 <- inline::cfunction(sig=c(x="double", narm="logical"), body=' > double *x_ = REAL(x); > int narm_ = asLogical(narm); > int n = length(x); > double sum = 0; > for (R_xlen_t i = 0; i < n; i++) { > if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; > } > return ScalarReal(sum); > ') > >> x <- rep(0, 1e8) >> stopifnot(typeof(x) == "double") >> system.time(sum2(x, narm=FALSE)) > user system elapsed > 0.16 0.00 0.16 > >> y <- rep(NA_real_, 1e8) >> stopifnot(typeof(y) == "double") >> system.time(sum2(y, narm=FALSE)) > user system elapsed > 0.16 0.00 0.15 > >> z <- x; z[length(z)/2] <- NA_real_ >> stopifnot(typeof(z) == "double") >> system.time(sum2(z, narm=FALSE)) > user system elapsed > 0.16 0.00 0.15 > > This is why I suspect it's related to how R was configured when it was > built. What's going on? Can someone please bring some light on this? > > Thanks > > Henrik
Henrik Bengtsson
2015-Jun-06 04:28 UTC
[Rd] sum(..., na.rm=FALSE): Summing over NA_real_ values much more expensive than non-NAs for na.rm=FALSE? Hmm...
I should summarize what I've learned after my more recent post on this: With 'long double', it is extremely expensive to keep "track of" non-finites (NA, NaN and Inf) when doing basic arithmetics. The most extreme cost is seen when the first value is non-finite, because there is penalty added in all of the following iterations. EXAMPLE: ## All finite values> x <- rep(0.1, 1e8) > system.time(sum(x, narm=FALSE))user system elapsed 0.28 0.00 0.28 ## First value is NA (maximum penalty)> y <- x; x[1] <- NA > system.time(sum(y, narm=FALSE))user system elapsed 23.47 0.00 23.93 (sic!) ## Last value is NA (minimum penalty)> z <- x; x[length(z)] <- NA > system.time(sum(z, narm=FALSE))user system elapsed 0.35 0.00 0.34 ## Silly, but proof of concept by summing rev(y) ## such that NA ends up last> system.time(sum(rev(y), narm=FALSE))user system elapsed 2.40 0.85 3.31 This penalty of having non-finite values is only observed with 'long double', but not with 'double' (see previous message). WORKAROUND: To workaround this penalty, one can test the running result for being non-finite and return early, e.g. if na.rm=FALSE then as soon as the running sum is NA_REAL, return NA_REAL. Unfortunately, the cost for testing for NA_REAL is expensive, so the penality of doing every iteration would be too expensive. A better approach is to test for early stopping at some interval. For example:> sum4 <- inline::cfunction(sig=c(x="double", narm="logical"), body='#define LDOUBLE long double double *x_ = REAL(x); int narm_ = asLogical(narm); int n = length(x); LDOUBLE sum = 0.0; for (R_xlen_t i = 0; i < n; i++) { /* If narm=FALSE, always sum */ if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; /* Early stopping check when sum non-finite */ if (!R_FINITE(sum)) break; } return ScalarReal((double)sum); ') ## See previous email for non-early stopping version> system.time(sum3(x, narm=FALSE))user system elapsed 1.05 0.00 1.05 ## Early stopping every iteration adds quite a bit ## of overhead if never used> system.time(sum4(x, narm=FALSE))user system elapsed 1.79 0.00 1.86 ## Early stopping once in a while> sum5 <- inline::cfunction(sig=c(x="double", narm="logical"), body='#define LDOUBLE long double double *x_ = REAL(x); int narm_ = asLogical(narm); int n = length(x); LDOUBLE sum = 0.0; for (R_xlen_t i = 0; i < n; i++) { /* If narm=FALSE, always sum */ if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; /* Early stopping check when sum non-finite */ if (i % 1048576 == 0 && !R_FINITE(sum)) break; } return ScalarReal((double)sum); ') ## Low extra cost even without non-finites> system.time(sum5(x, narm=FALSE))user system elapsed 1.05 0.00 1.06 ## ...and can still do early stopping> system.time(sum5(y, narm=FALSE))user system elapsed 0 0 0 /Henrik On Sun, May 31, 2015 at 6:08 PM, Henrik Bengtsson <henrik.bengtsson at ucsf.edu> wrote:> This is a great example how you cannot figure it out after spending > two hours troubleshooting, but a few minutes after you post to > R-devel, it's just jumps to you (is there a word for this other than > "impatient"?); > > Let me answer my own question. The discrepancy between my sum2() code > and the internal code for base::sum() is that the latter uses LDOUBLE > = long double (on some system it's only double, cf. > https://github.com/wch/r-source/blob/trunk/src/nmath/nmath.h#L28-L33), > whereas my sum2() code uses double. So using long double, I can > reproduce the penalty of having NA_real_ with na.rm=FALSE; > >> sum3 <- inline::cfunction(sig=c(x="double", narm="logical"), body=' > #define LDOUBLE long double > double *x_ = REAL(x); > int narm_ = asLogical(narm); > int n = length(x); > LDOUBLE sum = 0.0; > for (R_xlen_t i = 0; i < n; i++) { > if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; > } > return ScalarReal((double)sum); > ') > >> x <- rep(0, 1e8) >> stopifnot(typeof(x) == "double") >> system.time(sum3(x, narm=FALSE)) > user system elapsed > 0.40 0.00 0.44 >> y <- rep(NA_real_, 1e8) >> stopifnot(typeof(y) == "double") >> system.time(sum3(y, narm=FALSE)) > user system elapsed > 9.80 0.00 9.84 >> z <- x; z[length(z)/2] <- NA_real_ >> stopifnot(typeof(z) == "double") >> system.time(sum3(z, narm=FALSE)) > user system elapsed > 4.49 0.00 4.50 > > This might even be what the following comment refers to: > > /* Required by C99 but might be slow */ > #ifdef HAVE_LONG_DOUBLE > # define LDOUBLE long double > #else > # define LDOUBLE double > #endif > > So now I should rephrase my question: Is there away to avoid this > penalty when using 'long double'? Is this something the compiler can > be clever about, or is the only solution to not use 'long double'? > > /Henrik > > On Sun, May 31, 2015 at 5:02 PM, Henrik Bengtsson > <henrik.bengtsson at ucsf.edu> wrote: >> I'm observing that base::sum(x, na.rm=FALSE) for typeof(x) == "double" >> is much more time consuming when there are missing values versus when >> there are not. I'm observing this on both Window and Linux, but it's >> quite surprising to me. Currently, my main suspect is settings in on >> how R was built. The second suspect is my brain. I hope that someone >> can clarify the below results and confirm or not whether they see the >> same. Note, this is for "doubles", so I'm not expecting >> early-stopping as for "integers" (where testing for NA is cheap). >> >> On R 3.2.0, on Windows (using the official CRAN builds), on Linux >> (local built), and on OS X (official AT&T builds), I get: >> >>> x <- rep(0, 1e8) >>> stopifnot(typeof(x) == "double") >>> system.time(sum(x, na.rm=FALSE)) >> user system elapsed >> 0.19 0.01 0.20 >> >>> y <- rep(NA_real_, 1e8) >>> stopifnot(typeof(y) == "double") >>> system.time(sum(y, na.rm=FALSE)) >> user system elapsed >> 9.54 0.00 9.55 >> >>> z <- x; z[length(z)/2] <- NA_real_ >>> stopifnot(typeof(z) == "double") >>> system.time(sum(z, na.rm=FALSE)) >> user system elapsed >> 4.49 0.00 4.51 >> >> Following the source code, I'm pretty sure the code >> (https://github.com/wch/r-source/blob/trunk/src/main/summary.c#L112-L128) >> performing the calculation is: >> >> static Rboolean rsum(double *x, R_xlen_t n, double *value, Rboolean narm) >> { >> LDOUBLE s = 0.0; >> Rboolean updated = FALSE; >> for (R_xlen_t i = 0; i < n; i++) { >> if (!narm || !ISNAN(x[i])) { >> if(!updated) updated = TRUE; >> s += x[i]; >> } >> } >> if(s > DBL_MAX) *value = R_PosInf; >> else if (s < -DBL_MAX) *value = R_NegInf; >> else *value = (double) s; >> return updated; >> } >> >> In other words, when na.rm=FALSE, that inner for loop: >> >> for (R_xlen_t i = 0; i < n; i++) { >> if (!narm || !ISNAN(x[i])) { >> if(!updated) updated = TRUE; >> s += x[i]; >> } >> } >> >> should effectively become (because !ISNAN(x[i]) "does not make a difference"): >> >> for (R_xlen_t i = 0; i < n; i++) { >> if (!narm) { >> if(!updated) updated = TRUE; >> s += x[i]; >> } >> } >> >> That is, sum(x, na.rm=FALSE) basically spends time on `s += x[i]`. >> Now, I have always been under impression that summing with NA:s is >> *not* more expensive that summing over regular (double) values, which >> is confirmed by the below example, but the above benchmarking >> disagree. It looks like there is a big overhead keeping track of the >> sum `s` being NA, which is supported by the fact that summing over 'z' >> is costs half of 'y'. >> >> Now, I *cannot* reproduce the above using the following 'inline' example: >> >>> sum2 <- inline::cfunction(sig=c(x="double", narm="logical"), body=' >> double *x_ = REAL(x); >> int narm_ = asLogical(narm); >> int n = length(x); >> double sum = 0; >> for (R_xlen_t i = 0; i < n; i++) { >> if (!narm_ || !ISNAN(x_[i])) sum += x_[i]; >> } >> return ScalarReal(sum); >> ') >> >>> x <- rep(0, 1e8) >>> stopifnot(typeof(x) == "double") >>> system.time(sum2(x, narm=FALSE)) >> user system elapsed >> 0.16 0.00 0.16 >> >>> y <- rep(NA_real_, 1e8) >>> stopifnot(typeof(y) == "double") >>> system.time(sum2(y, narm=FALSE)) >> user system elapsed >> 0.16 0.00 0.15 >> >>> z <- x; z[length(z)/2] <- NA_real_ >>> stopifnot(typeof(z) == "double") >>> system.time(sum2(z, narm=FALSE)) >> user system elapsed >> 0.16 0.00 0.15 >> >> This is why I suspect it's related to how R was configured when it was >> built. What's going on? Can someone please bring some light on this? >> >> Thanks >> >> Henrik
Possibly Parallel Threads
- sum(..., na.rm=FALSE): Summing over NA_real_ values much more expensive than non-NAs for na.rm=FALSE? Hmm...
- Speeding up sum and prod
- Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)
- Speex ARM4 patch
- Faster way of summing values up based on expand.grid