suyog sarda
2014-Aug-07 16:46 UTC
[LLVMdev] Efficient Pattern matching in Instruction Combine
Hi, All, Duncan, Rafael, David, Nick. This is regarding pattern matching in InstructionCombine pass. We use 'match' functions many times, but it doesn't do the pattern matching effectively. e.x. Lets take pattern : (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C Both the patterns above are same, since ^ is commutative in Op0. But, 'match' pattern has to be written for both the patterns separately since 'match' identifies pattern as it (like regular expression) and doesn't have the logic to determine that 'A^B' and 'B^A' are same. I propose to improve 'match' functions where we can include the logic to determine if an operation is commutative or not. If it is commutative, then a single 'match' call should be able to detect both 'A^B' and 'B^A'. So, basically we will modify the commutative 'match' functions. There will be various permutations of it where one of the operand might be a constant (I guess this is handled already as constant are re-associated to RHS). I will try to dig more on this. Inputs/suggestions/comments on improving match functions are most awaited. :) Regards, Suyog -- With regards, Suyog Sarda -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/18c3e6ec/attachment.html>
Duncan P. N. Exon Smith
2014-Aug-07 18:34 UTC
[LLVMdev] Efficient Pattern matching in Instruction Combine
> On 2014-Aug-07, at 09:46, suyog sarda <sardask01 at gmail.com> wrote: > > Hi, > > All, Duncan, Rafael, David, Nick. > > This is regarding pattern matching in InstructionCombine pass. > > We use 'match' functions many times, but it doesn't do the pattern matching > effectively. > > e.x. Lets take pattern : > > (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C > > (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C > > Both the patterns above are same, since ^ is commutative in Op0.It'd be interesting if you could find a design that also treated these the same: (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C I.e., `^` is also associative.> > But, 'match' pattern has to be written for both the patterns separately > since 'match' identifies pattern as it (like regular expression) and doesn't > have the logic to determine that 'A^B' and 'B^A' are same. > > I propose to improve 'match' functions where we can include the logic > to determine if an operation is commutative or not. If it is commutative, > then a single 'match' call should be able to detect both 'A^B' and 'B^A'. > So, basically we will modify the commutative 'match' functions. > > There will be various permutations of it where one of the operand might be > a constant (I guess this is handled already as constant are re-associated to > RHS). > > I will try to dig more on this. > > Inputs/suggestions/comments on improving match functions are most awaited. :) > > Regards, > Suyog > > -- > With regards, > Suyog Sarda
Sean Silva
2014-Aug-07 21:01 UTC
[LLVMdev] Efficient Pattern matching in Instruction Combine
Can we handle this by just having a canonical ordering? Or is that too difficult to maintain through various instcombines? On Thu, Aug 7, 2014 at 9:46 AM, suyog sarda <sardask01 at gmail.com> wrote:> Hi, > > All, Duncan, Rafael, David, Nick. > > This is regarding pattern matching in InstructionCombine pass. > > We use 'match' functions many times, but it doesn't do the pattern matching > effectively. > > e.x. Lets take pattern : > > (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C > > (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C > > Both the patterns above are same, since ^ is commutative in Op0. > > But, 'match' pattern has to be written for both the patterns separately > since 'match' identifies pattern as it (like regular expression) and > doesn't > have the logic to determine that 'A^B' and 'B^A' are same. > > I propose to improve 'match' functions where we can include the logic > to determine if an operation is commutative or not. If it is commutative, > then a single 'match' call should be able to detect both 'A^B' and 'B^A'. > So, basically we will modify the commutative 'match' functions. > > There will be various permutations of it where one of the operand might be > a constant (I guess this is handled already as constant are re-associated > to > RHS). > > I will try to dig more on this. > > Inputs/suggestions/comments on improving match functions are most awaited. > :) > > Regards, > Suyog > > -- > With regards, > Suyog Sarda > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/27a07013/attachment.html>
David Majnemer
2014-Aug-08 06:10 UTC
[LLVMdev] Efficient Pattern matching in Instruction Combine
On Thu, Aug 7, 2014 at 9:46 AM, suyog sarda <sardask01 at gmail.com> wrote:> Hi, > > All, Duncan, Rafael, David, Nick. > > This is regarding pattern matching in InstructionCombine pass. > > We use 'match' functions many times, but it doesn't do the pattern matching > effectively. > > e.x. Lets take pattern : > > (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C > > (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C > > Both the patterns above are same, since ^ is commutative in Op0. > > But, 'match' pattern has to be written for both the patterns separately > since 'match' identifies pattern as it (like regular expression) and > doesn't > have the logic to determine that 'A^B' and 'B^A' are same. > > I propose to improve 'match' functions where we can include the logic > to determine if an operation is commutative or not. If it is commutative, > then a single 'match' call should be able to detect both 'A^B' and 'B^A'. > So, basically we will modify the commutative 'match' functions. > > There will be various permutations of it where one of the operand might be > a constant (I guess this is handled already as constant are re-associated > to > RHS). > > I will try to dig more on this. > > Inputs/suggestions/comments on improving match functions are most awaited. > :) > > Regards, > Suyog > > -- > With regards, > Suyog Sarda >It seems to me that we could rejigger Reassociate to reduce the number of possibilities that InstCombine could expect. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/bc667898/attachment.html>
suyog sarda
2014-Aug-08 20:34 UTC
[LLVMdev] Efficient Pattern matching in Instruction Combine
Hi Duncan, David, Sean. Thanks for your reply.> It'd be interesting if you could find a design that also treated these > the same: > > (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C > (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C > (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C > > I.e., `^` is also associative.Agree with Duncan on including associative operation too.> Can we handle this by just having a canonical ordering? Or is that toodifficult to maintain through various instcombines? Yes, its the easiest way to do that. If i am not wrong, what Sean is suggesting is that if we get something like (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C and we have written pass for pattern (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C we could just swap 'B' and 'A' before matching the pattern i.e. we do canonical re-ordering (Sean, Correct me if i am wrong on my understanding of what you meant by canonical ordering). I have seen in the "instcombine" module that many times std::swap was used for the same. But it becomes too much repetitive and somewhat irritating to keep doing this for every pattern. Shouldn't we do this in more generic way where we do not have to worry about swapping operands? (How to do it - i don't know for now. I looked at the matchers implementation for modification/improvement, but didn't find any clue on it).> It seems to me that we could rejigger Reassociate to reduce the number ofpossibilities that InstCombine could expect.>Agree with David that infact it should be reassociate pass which should handle this. But following is something interesting and rather worrying things i have found : (I have omitted unimportant code and highlighted important one in following example) *e.x. * ((~A & B) | A) --> (A | B) ; Code is implemented for this already *C code :* suyog at suyog-Inspiron-N5010:~$ cat 1.c #include<stdio.h> int cal(int a, int b) { *return ((~a & b) | a);* } int main(){ int a, b; scanf("%d %d", &a, &b); return cal(a,b); } LLVM IR at O0 : suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O0 -emit-llvm 1.c ; Function Attrs: nounwind *define i32 @cal(i32 %a, i32 %b) #0 { %1 = xor i32 %a, -1 %2 = and i32 %1, %b %3 = or i32 %2, %a ret i32 %3* } ; Function Attrs: nounwind define i32 @main() #0 { .. .. %4 = call i32 @cal(i32 %2, i32 %3) ret i32 %4 } *LLVM IR at instcombine :* suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -instcombine 1.ll *; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 { %1 = or i32 %a, %b ret i32 %1}* .. .. .. which is OK as per expected transform. *LLVM IR at reassociate and instcombine :*suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -reassociate -instcombine 2.ll *; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 { %1 = xor i32 %a, -1 %2 = and i32 %b, %1 %3 = or i32 %2, %a ret i32 %3}* .. .. Reassociate converted (~a&b) to (b&~a) and instruction combine failed to optimize. As evident, sometimes, reassociate pass actually harms the optimization opportunity !! Some more interesting findings on how can this affect the final machine code: *Test case :* 1.c : #include<stdio.h> int cal(int a, int b) { *return ((~a & b) | a);* } int main(){ int a, b; scanf("%d %d", &a, &b); return cal(a,b); } *X86 .s file with clang at O2 for above program :* suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c main: # @main # BB#0: subl $28, %esp leal 20(%esp), %eax movl %eax, 8(%esp) leal 24(%esp), %eax movl %eax, 4(%esp) movl $.L.str, (%esp) calll __isoc99_scanf movl 20(%esp), %eax * orl 24(%esp), %eax* addl $28, %esp retl As seen, optimization happened at IR level itself reflected in .s file. *GCC output for the same:* suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c main: .LFB23: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $32, %esp leal 28(%esp), %eax movl %eax, 8(%esp) leal 24(%esp), %eax movl %eax, 4(%esp) movl $.LC0, (%esp) call __isoc99_scanf movl 24(%esp), %eax * orl 28(%esp), %eax* leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc GCC also did the optimization. Now we just *slightly flip* the test case : *1.c Test case:*#include<stdio.h> int cal(int a, int b) { *return ((b & ~a) | a);* } int main(){ int a, b; scanf("%d %d", &a, &b); return cal(a,b); } *X86 .s file with clang at O2 for above program :*suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c main: # @main # BB#0: subl $28, %esp leal 20(%esp), %eax movl %eax, 8(%esp) leal 24(%esp), %eax movl %eax, 4(%esp) movl $.L.str, (%esp) calll __isoc99_scanf movl 24(%esp), %ecx movl %ecx, %eax *notl %eax andl 20(%esp), %eax orl %ecx, %eax* addl $28, %esp retl We lost the Optimization opportunity here !! *GCC output for the same:* suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c main: .LFB23: .cfi_startproc pushl %ebp .cfi_def_cfa_offset 8 .cfi_offset 5, -8 movl %esp, %ebp .cfi_def_cfa_register 5 andl $-16, %esp subl $32, %esp leal 28(%esp), %eax movl %eax, 8(%esp) leal 24(%esp), %eax movl %eax, 4(%esp) movl $.LC0, (%esp) call __isoc99_scanf movl 24(%esp), %eax * orl 28(%esp), %eax* leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc GCC still optimized it !! Clearly evident that llvm is losing because of naive approach of pattern matching. How can we improve this so that LLVM also recognizes commutative/associative pattern efficiently with single 'matchers' call? I am not having any idea so far. I will think more on it. Any help would be really helpful. Any suggestions/comments/rectification are most awaited :) -- With regards, Suyog Sarda -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140809/418a1055/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Efficient Pattern matching in Instruction Combine
- [LLVMdev] Efficient Pattern matching in Instruction Combine
- [LLVMdev] Efficient Pattern matching in Instruction Combine
- [LLVMdev] Can LLVM vectorize <2 x i32> type
- [LLVMdev] Can LLVM vectorize <2 x i32> type