Hello folks! Consider the expression (x % d) == c where d and c are constants. For simplicity let us assume that x is unsigned and 0 <= c < d. Let us further assume that d = a * (1 << b) and a is odd. Then our expression can be transformed to rotate_right(x-c, b) * inverse_mul(a) <= (high_value(x) - c) / d . Example [(x % 250) == 3]: sub eax,3 ror eax,1 imul eax,eax,0x26e978d5 // multiplicative inverse of 125 cmp eax,17179869 // 0xffffffff / 250 jbe OK A range check for x can be embedded as well with no additional code. For signed values a similar transformation is possible. For more details see my comment on Hacker's Delight (http://hackersdelight.org/corres.txt) and/or our paper about hashing (http://programming.sirrida.de/hashsuper.pdf). The current version of Clang / LLVM (clang -O3 -S) translates it to the following (GCC produces similar code): movl %edi, %eax imulq $274877907, %rax, %rax # imm = 0x10624DD3 shrq $36, %rax imull $250, %eax, %eax movl %edi, %ecx subl %eax, %ecx cmpl $3, %ecx je OK Please note that there are 2 multiply operations and one of them produces a double width result (multiply high). Best regards Jasper
Benjamin Kramer
2014-Feb-27 17:01 UTC
[LLVMdev] Idea for optimization (test for remainder)
On 27.02.2014, at 17:07, Jasper Neumann <jn at sirrida.de> wrote:> Hello folks! > > Consider the expression (x % d) == c where d and c are constants. > For simplicity let us assume that x is unsigned and 0 <= c < d. > Let us further assume that d = a * (1 << b) and a is odd. > Then our expression can be transformed to > rotate_right(x-c, b) * inverse_mul(a) <= (high_value(x) - c) / d . > Example [(x % 250) == 3]: > sub eax,3 > ror eax,1 > imul eax,eax,0x26e978d5 // multiplicative inverse of 125 > cmp eax,17179869 // 0xffffffff / 250 > jbe OK > > A range check for x can be embedded as well with no additional code. > For signed values a similar transformation is possible. > > For more details see my comment on Hacker's Delight (http://hackersdelight.org/corres.txt) and/or our paper about hashing (http://programming.sirrida.de/hashsuper.pdf). > > The current version of Clang / LLVM (clang -O3 -S) translates it to the following (GCC produces similar code): > movl %edi, %eax > imulq $274877907, %rax, %rax # imm = 0x10624DD3 > shrq $36, %rax > imull $250, %eax, %eax > movl %edi, %ecx > subl %eax, %ecx > cmpl $3, %ecx > je OK > Please note that there are 2 multiply operations and one of them produces a double width result (multiply high).Yep, this is a long-standing issue in the peephole optimizer. It's not easily fixed because 1. We don't want to do it early (i.e. before codegen) because the resulting expression is harder to analyze wrt. value range. 2. We can't do it late (in DAGCombiner) because it works top-down and has already expanded the operation into the code you posted above by the time it sees the compare. - Ben