Hervé Pagès
2011-Dec-02 01:40 UTC
[Rd] 1.6x speedup for requal() function (in R/src/main/unique.c)
Hi, FWIW: /* Taken from R/src/main/unique.c */ static int requal(SEXP x, int i, SEXP y, int j) { if (i < 0 || j < 0) return 0; if (!ISNAN(REAL(x)[i]) && !ISNAN(REAL(y)[j])) return (REAL(x)[i] == REAL(y)[j]); else if (R_IsNA(REAL(x)[i]) && R_IsNA(REAL(y)[j])) return 1; else if (R_IsNaN(REAL(x)[i]) && R_IsNaN(REAL(y)[j])) return 1; else return 0; } /* Between 1.34x and 1.37x faster on my 64-bit Ubuntu laptop */ static int requal2(SEXP x, int i, SEXP y, int j) { double xi, yj; if (i < 0 || j < 0) return 0; xi = REAL(x)[i]; yj = REAL(y)[j]; if (!ISNAN(xi) && !ISNAN(yj)) return xi == yj; if (R_IsNA(xi) && R_IsNA(yj)) return 1; if (R_IsNaN(xi) && R_IsNaN(yj)) return 1; return 0; } /* Another extra 1.18x speedup. So overall requal3() is about 1.6x faster than requal() for me. requal3() uses a simpler logic than requal() but this logic should be equivalent to the logic used by requal(), based on the following facts: (a) If *one* of xi or yi is a number (i.e. not NA or NaN), then xi and yi can be compared with xi == yi. They don't need to *both* be numbers for this comparison to be valid. (b) Otherwise (i.e. if each of them is not a number) then each of them is either NA or NaN (only 2 possible values for each), so comparing them with R_IsNA(xi) == R_IsNA(yj) should do the trick. */ static int requal3(SEXP x, int i, SEXP y, int j) { double xi, yj; if (i < 0 || j < 0) return 0; xi = REAL(x)[i]; yj = REAL(y)[j]; if (!ISNAN(xi) || !ISNAN(yj)) return xi == yj; return R_IsNA(xi) == R_IsNA(yj); } The logic of the cequal() function (in the same file) could also be cleaned up in a similar way, probably for an even greater speedup. This will benefit duplicated(), anyDuplicated() and unique() on numeric and complex vectors. Cheers, H. -- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Duncan Murdoch
2011-Dec-02 03:13 UTC
[Rd] 1.6x speedup for requal() function (in R/src/main/unique.c)
On 11-12-01 8:40 PM, Herv? Pag?s wrote:> Hi, > > FWIW: > > /* Taken from R/src/main/unique.c */ > static int requal(SEXP x, int i, SEXP y, int j) > { > if (i< 0 || j< 0) return 0; > if (!ISNAN(REAL(x)[i])&& !ISNAN(REAL(y)[j])) > return (REAL(x)[i] == REAL(y)[j]); > else if (R_IsNA(REAL(x)[i])&& R_IsNA(REAL(y)[j])) return 1; > else if (R_IsNaN(REAL(x)[i])&& R_IsNaN(REAL(y)[j])) return 1; > else return 0; > } > > /* Between 1.34x and 1.37x faster on my 64-bit Ubuntu laptop */ > static int requal2(SEXP x, int i, SEXP y, int j) > { > double xi, yj; > > if (i< 0 || j< 0) return 0; > xi = REAL(x)[i]; > yj = REAL(y)[j]; > if (!ISNAN(xi)&& !ISNAN(yj)) return xi == yj; > if (R_IsNA(xi)&& R_IsNA(yj)) return 1; > if (R_IsNaN(xi)&& R_IsNaN(yj)) return 1; > return 0; > }That looks like a valid improvement.> > /* Another extra 1.18x speedup. So overall requal3() is about 1.6x > faster than requal() for me. requal3() uses a simpler logic than > requal() but this logic should be equivalent to the logic used > by requal(), based on the following facts: > (a) If *one* of xi or yi is a number (i.e. not NA or NaN), > then xi and yi can be compared with xi == yi. They don't > need to *both* be numbers for this comparison to be valid. > (b) Otherwise (i.e. if each of them is not a number) then each > of them is either NA or NaN (only 2 possible values for > each), so comparing them with R_IsNA(xi) == R_IsNA(yj) > should do the trick. */I think this one is probably correct, but it's too tricky for my taste.> static int requal3(SEXP x, int i, SEXP y, int j) > { > double xi, yj; > > if (i< 0 || j< 0) return 0; > xi = REAL(x)[i]; > yj = REAL(y)[j]; > if (!ISNAN(xi) || !ISNAN(yj)) return xi == yj; > return R_IsNA(xi) == R_IsNA(yj); > }Duncan Murdoch> > The logic of the cequal() function (in the same file) could also be > cleaned up in a similar way, probably for an even greater speedup. > > This will benefit duplicated(), anyDuplicated() and unique() on numeric > and complex vectors. > > Cheers, > H. >