thr3ads.net - llvm dev - [LLVMdev] Efficient Pattern matching in Instruction Combine [Aug 2014]

If this information is useful, please help other people find it:
Share via:

suyog sarda

2014-Aug-07 16:46 UTC

[LLVMdev] Efficient Pattern matching in Instruction Combine

Hi,

All, Duncan, Rafael, David, Nick.

This is regarding pattern matching in InstructionCombine pass.

We use 'match' functions many times, but it doesn't do the pattern
matching
effectively.

e.x. Lets take pattern :

(A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C

(B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C

Both the patterns above are same, since ^ is commutative in Op0.

But, 'match' pattern has to be written for both the patterns separately
since 'match' identifies pattern as it (like regular expression) and
doesn't
have the logic to determine that 'A^B' and 'B^A' are same.

I propose to improve 'match' functions where we can include the logic
to determine if an operation is commutative or not. If it is commutative,
then a single 'match' call should be able to detect both 'A^B'
and 'B^A'.
So, basically we will modify the commutative 'match' functions.

There will be various permutations of it where one of the operand might be
a constant (I guess this is handled already as constant are re-associated to
RHS).

I will try to dig more on this.

Inputs/suggestions/comments on improving match functions are most awaited.
:)

Regards,
Suyog

-- 
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/18c3e6ec/attachment.html>

Duncan P. N. Exon Smith

2014-Aug-07 18:34 UTC

head link

[LLVMdev] Efficient Pattern matching in Instruction Combine

> On 2014-Aug-07, at 09:46, suyog sarda <sardask01 at gmail.com> wrote:
> 
> Hi,
> 
> All, Duncan, Rafael, David, Nick.
> 
> This is regarding pattern matching in InstructionCombine pass.
> 
> We use 'match' functions many times, but it doesn't do the
pattern matching
> effectively.
> 
> e.x. Lets take pattern :
> 
> (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C
> 
> (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
> 
> Both the patterns above are same, since ^ is commutative in Op0. 
It'd be interesting if you could find a design that also treated these
the same:

    (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C
    (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
    (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C

I.e., `^` is also associative.
> 
> But, 'match' pattern has to be written for both the patterns
separately
> since 'match' identifies pattern as it (like regular expression)
and doesn't
> have the logic to determine that 'A^B' and 'B^A' are same.
> 
> I propose to improve 'match' functions where we can include the
logic
> to determine if an operation is commutative or not. If it is commutative,
> then a single 'match' call should be able to detect both
'A^B' and 'B^A'.
> So, basically we will modify the commutative 'match' functions.
> 
> There will be various permutations of it where one of the operand might be
> a constant (I guess this is handled already as constant are re-associated
to
> RHS). 
> 
> I will try to dig more on this.
> 
> Inputs/suggestions/comments on improving match functions are most awaited.
:)
> 
> Regards,
> Suyog
> 
> -- 
> With regards,
> Suyog Sarda

Sean Silva

2014-Aug-07 21:01 UTC

head link

[LLVMdev] Efficient Pattern matching in Instruction Combine

Can we handle this by just having a canonical ordering? Or is that too
difficult to maintain through various instcombines?


On Thu, Aug 7, 2014 at 9:46 AM, suyog sarda <sardask01 at gmail.com>
wrote:
> Hi,
>
> All, Duncan, Rafael, David, Nick.
>
> This is regarding pattern matching in InstructionCombine pass.
>
> We use 'match' functions many times, but it doesn't do the
pattern matching
> effectively.
>
> e.x. Lets take pattern :
>
> (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C
>
> (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
>
> Both the patterns above are same, since ^ is commutative in Op0.
>
> But, 'match' pattern has to be written for both the patterns
separately
> since 'match' identifies pattern as it (like regular expression)
and
> doesn't
> have the logic to determine that 'A^B' and 'B^A' are same.
>
> I propose to improve 'match' functions where we can include the
logic
> to determine if an operation is commutative or not. If it is commutative,
> then a single 'match' call should be able to detect both
'A^B' and 'B^A'.
> So, basically we will modify the commutative 'match' functions.
>
> There will be various permutations of it where one of the operand might be
> a constant (I guess this is handled already as constant are re-associated
> to
> RHS).
>
> I will try to dig more on this.
>
> Inputs/suggestions/comments on improving match functions are most awaited.
> :)
>
> Regards,
> Suyog
>
> --
> With regards,
> Suyog Sarda
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/27a07013/attachment.html>

David Majnemer

2014-Aug-08 06:10 UTC

head link

[LLVMdev] Efficient Pattern matching in Instruction Combine

On Thu, Aug 7, 2014 at 9:46 AM, suyog sarda <sardask01 at gmail.com>
wrote:
> Hi,
>
> All, Duncan, Rafael, David, Nick.
>
> This is regarding pattern matching in InstructionCombine pass.
>
> We use 'match' functions many times, but it doesn't do the
pattern matching
> effectively.
>
> e.x. Lets take pattern :
>
> (A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C
>
> (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
>
> Both the patterns above are same, since ^ is commutative in Op0.
>
> But, 'match' pattern has to be written for both the patterns
separately
> since 'match' identifies pattern as it (like regular expression)
and
> doesn't
> have the logic to determine that 'A^B' and 'B^A' are same.
>
> I propose to improve 'match' functions where we can include the
logic
> to determine if an operation is commutative or not. If it is commutative,
> then a single 'match' call should be able to detect both
'A^B' and 'B^A'.
> So, basically we will modify the commutative 'match' functions.
>
> There will be various permutations of it where one of the operand might be
> a constant (I guess this is handled already as constant are re-associated
> to
> RHS).
>
> I will try to dig more on this.
>
> Inputs/suggestions/comments on improving match functions are most awaited.
> :)
>
> Regards,
> Suyog
>
> --
> With regards,
> Suyog Sarda
>

It seems to me that we could rejigger Reassociate to reduce the number of
possibilities that InstCombine could expect.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140807/bc667898/attachment.html>

suyog sarda

2014-Aug-08 20:34 UTC

head link

[LLVMdev] Efficient Pattern matching in Instruction Combine

Hi Duncan, David, Sean.

Thanks for your reply.
> It'd be interesting if you could find a design that also treated these
> the same:
>
>   (B ^ A) | ((A ^ B) ^ C) -> (A ^ B) | C
>   (B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C
>   (B ^ A) | ((C ^ A) ^ B) -> (A ^ B) | C
>
> I.e., `^` is also associative.
Agree with Duncan on including associative operation too.
> Can we handle this by just having a canonical ordering? Or is that toodifficult to maintain through various instcombines?

Yes, its the easiest way to do that. If i am not wrong, what Sean is
suggesting is that if we get

something like

(B ^ A) | ((B ^ C) ^ A) -> (A ^ B) | C

and we have written pass for pattern

(A ^ B) | ((B ^ C) ^ A) -> (A ^ B) | C

we could just swap 'B' and 'A' before matching the pattern i.e.
we do
canonical re-ordering
(Sean, Correct me if i am wrong on my understanding of what you meant by
canonical ordering).

I have seen in the "instcombine" module that many times std::swap was
used
for the same.

But it becomes too much repetitive and somewhat irritating to keep doing
this for every pattern.
Shouldn't we do this in more generic way where we do not have to worry
about swapping operands?
(How to do it - i don't know for now. I looked at the matchers
implementation for modification/improvement,
but didn't find any clue on it).
> It seems to me that we could rejigger Reassociate to reduce the number ofpossibilities that InstCombine could expect.
>Agree with David that infact it should be reassociate pass which should
handle this.

But following is something interesting and rather worrying things i have
found :
(I have omitted unimportant code and highlighted important one in following
example)
*e.x. *

((~A & B) | A)  -->  (A | B) ; Code is implemented for this already

*C code :*

suyog at suyog-Inspiron-N5010:~$ cat 1.c
#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}

int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}

LLVM IR at O0 :

suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O0 -emit-llvm 1.c

; Function Attrs: nounwind




*define i32 @cal(i32 %a, i32 %b) #0 {  %1 = xor i32 %a, -1  %2 = and i32
%1, %b  %3 = or i32 %2, %a  ret i32 %3*
}

; Function Attrs: nounwind
define i32 @main() #0 {
  ..
  ..
  %4 = call i32 @cal(i32 %2, i32 %3)
  ret i32 %4
}

*LLVM IR at instcombine :*

suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/opt -S -instcombine 1.ll





*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 {  %1 = or
i32 %a, %b  ret i32 %1}*
..
..
..

which is OK as per expected transform.



*LLVM IR at reassociate and instcombine :*suyog at suyog-Inspiron-N5010:~$
Open/rbuild/bin/opt -S -reassociate -instcombine 2.ll







*; Function Attrs: nounwinddefine i32 @cal(i32 %a, i32 %b) #0 {  %1 = xor
i32 %a, -1  %2 = and i32 %b, %1  %3 = or i32 %2, %a  ret i32 %3}*

..
..

Reassociate converted (~a&b) to (b&~a) and instruction combine failed to
optimize.

As evident, sometimes, reassociate pass actually harms the optimization
opportunity !!

Some more interesting findings on how can this affect the final machine
code:

*Test case :*
1.c :

#include<stdio.h>
int cal(int a, int b) {
*return ((~a & b) | a);*
}

int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}

*X86 .s file with clang at O2 for above program :*

suyog at suyog-Inspiron-N5010:~$ Open/rbuild/bin/clang -S -O2 1.c

main:                                   # @main
# BB#0:
    subl    $28, %esp
    leal    20(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.L.str, (%esp)
    calll    __isoc99_scanf
    movl    20(%esp), %eax
*    orl    24(%esp), %eax*
    addl    $28, %esp
    retl

As seen, optimization happened at IR level itself reflected in .s file.

*GCC output for the same:*

suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c

main:
.LFB23:
    .cfi_startproc
    pushl    %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    leal    28(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    __isoc99_scanf
    movl    24(%esp), %eax
*    orl    28(%esp), %eax*
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc

GCC also did the optimization.

Now we just *slightly flip* the test case :



*1.c Test case:*#include<stdio.h>
int cal(int a, int b) {
*return ((b & ~a) | a);*
}

int main(){
int a, b;
scanf("%d %d", &a, &b);
return cal(a,b);
}



*X86 .s file with clang at O2 for above program :*suyog at
suyog-Inspiron-N5010:~$
Open/rbuild/bin/clang -S -O2 1.c

main:                                   # @main
# BB#0:
    subl    $28, %esp
    leal    20(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.L.str, (%esp)
    calll    __isoc99_scanf
    movl    24(%esp), %ecx
    movl    %ecx, %eax


*notl    %eax    andl    20(%esp), %eax    orl    %ecx, %eax*
    addl    $28, %esp
    retl

We lost the Optimization opportunity here !!

*GCC output for the same:*

suyog at suyog-Inspiron-N5010:~$ gcc -S -O2 1.c

main:
.LFB23:
    .cfi_startproc
    pushl    %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    andl    $-16, %esp
    subl    $32, %esp
    leal    28(%esp), %eax
    movl    %eax, 8(%esp)
    leal    24(%esp), %eax
    movl    %eax, 4(%esp)
    movl    $.LC0, (%esp)
    call    __isoc99_scanf
    movl    24(%esp), %eax
*    orl    28(%esp), %eax*
    leave
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc

GCC still optimized it !!

Clearly evident that llvm is losing because of naive approach of pattern
matching.

How can we improve this so that LLVM also recognizes
commutative/associative
pattern  efficiently with single 'matchers' call? I am not having any
idea
so far.
I will think more on it. Any help would be really helpful.

Any suggestions/comments/rectification are most awaited :)

-- 
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140809/418a1055/attachment.html>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Aug 2014 - [LLVMdev] Efficient Pattern matching in Instruction Combine

[LLVMdev] Efficient Pattern matching in Instruction Combine

[LLVMdev] Efficient Pattern matching in Instruction Combine

[LLVMdev] Efficient Pattern matching in Instruction Combine

[LLVMdev] Efficient Pattern matching in Instruction Combine

[LLVMdev] Efficient Pattern matching in Instruction Combine

Seemingly Similar Threads