thr3ads.net - llvm dev - [llvm-dev] [RFC] Optimizing Comparisons Chains [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Clement Courbet via llvm-dev

2017-Jun-07 16:00 UTC

[llvm-dev] [RFC] Optimizing Comparisons Chains

Hi all,


I'm working on a new pass to optimize comparison chains.


Motivation



Clang currently generates inefficient code when dealing with contiguous
member-by-member structural equality. Consider:



struct A {

 bool operator==(const A& o) const { return i == o.i && j == o.j; }

 uint32 i;

 uint32 j;

};



This generates:



 mov     eax, dword ptr [rdi]

 cmp     eax, dword ptr [rsi]

 jne     .LBB0_1

 mov     eax, dword ptr [rdi + 4]

 cmp     eax, dword ptr [rsi + 4]

 sete    al

 ret

.LBB0_1:

 xor     eax, eax

 ret



I’ve been working on an LLVM pass that detects this pattern at IR level and
turns it into a memcmp() call. This generates more efficient code:



 mov     rax, qword ptr [rdi]

 cmp     rax, qword ptr [rsi]

 sete    al

 ret



And thanks to recent improvements <https://reviews.llvm.org/D28637> in the
memcmp codegen, this can be made to work for all sizes.



Impact of the change



I’ve measured the change on std:pair/std::tuple. The pass typically makes
the code 2-3 times faster with code that’s typically 2-3x times smaller.



A more detailed description can be found here
<https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit#>
and a proof of concept can be seen here <https://reviews.llvm.org/D33987>.



Do you see any aspect of this that I may have missed?

For now I’ve implemented this as a separate pass. Would there be a better
way to integrate it?


Thanks !
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/ce3bb305/attachment.html>

Sanjay Patel via llvm-dev

2017-Jun-07 17:03 UTC

head link

[llvm-dev] [RFC] Optimizing Comparisons Chains

Hi Clement -

I started looking at CGP memcmp expansion for x86 more closely yesterday
with:
https://reviews.llvm.org/D33963

And just made another change here:
https://reviews.llvm.org/rL304923

This is part of solving:
https://bugs.llvm.org/show_bug.cgi?id=33325
https://bugs.llvm.org/show_bug.cgi?id=33329

So we want to enable the CGP expansion without regressing the optimal x86
memcmp codegen for the power-of-2 cases that are currently handled by SDAG
builder. If this works out, we'll abandon the memcmp SDAG transforms for
x86 (and hopefully other targets too) because we'll take care of all memcmp
expansion in CGP.

I didn't look closely at your new pass proposal, but I think you'll see
bigger improvements once we have the optimal x86 memcmp expansion in place
for all sizes.


On Wed, Jun 7, 2017 at 10:00 AM, Clement Courbet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
>
> I'm working on a new pass to optimize comparison chains.
>
>
> Motivation
>
>
>
> Clang currently generates inefficient code when dealing with contiguous
> member-by-member structural equality. Consider:
>
>
>
> struct A {
>
>  bool operator==(const A& o) const { return i == o.i && j ==
o.j; }
>
>  uint32 i;
>
>  uint32 j;
>
> };
>
>
>
> This generates:
>
>
>
>  mov     eax, dword ptr [rdi]
>
>  cmp     eax, dword ptr [rsi]
>
>  jne     .LBB0_1
>
>  mov     eax, dword ptr [rdi + 4]
>
>  cmp     eax, dword ptr [rsi + 4]
>
>  sete    al
>
>  ret
>
> .LBB0_1:
>
>  xor     eax, eax
>
>  ret
>
>
>
> I’ve been working on an LLVM pass that detects this pattern at IR level
> and turns it into a memcmp() call. This generates more efficient code:
>
>
>
>  mov     rax, qword ptr [rdi]
>
>  cmp     rax, qword ptr [rsi]
>
>  sete    al
>
>  ret
>
>
>
> And thanks to recent improvements <https://reviews.llvm.org/D28637>
in
> the memcmp codegen, this can be made to work for all sizes.
>
>
>
> Impact of the change
>
>
>
> I’ve measured the change on std:pair/std::tuple. The pass typically makes
> the code 2-3 times faster with code that’s typically 2-3x times smaller.
>
>
>
> A more detailed description can be found here
>
<https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit#>
> and a proof of concept can be seen here
<https://reviews.llvm.org/D33987>.
>
>
>
> Do you see any aspect of this that I may have missed?
>
> For now I’ve implemented this as a separate pass. Would there be a better
> way to integrate it?
>
>
> Thanks !
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/3c227656/attachment.html>

Clement Courbet via llvm-dev

2017-Jun-08 06:42 UTC

head link

[llvm-dev] [RFC] Optimizing Comparisons Chains

That sounds perfect, thanks. Indeed my pass currently improves performance
only for small powers of two, and I'm waiting for the CGP approach to be
enabled to make it work for all sizes !

On Wed, Jun 7, 2017 at 7:03 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> Hi Clement -
>
> I started looking at CGP memcmp expansion for x86 more closely yesterday
> with:
> https://reviews.llvm.org/D33963
>
> And just made another change here:
> https://reviews.llvm.org/rL304923
>
> This is part of solving:
> https://bugs.llvm.org/show_bug.cgi?id=33325
> https://bugs.llvm.org/show_bug.cgi?id=33329
>
> So we want to enable the CGP expansion without regressing the optimal x86
> memcmp codegen for the power-of-2 cases that are currently handled by SDAG
> builder. If this works out, we'll abandon the memcmp SDAG transforms
for
> x86 (and hopefully other targets too) because we'll take care of all
memcmp
> expansion in CGP.
>
> I didn't look closely at your new pass proposal, but I think you'll
see
> bigger improvements once we have the optimal x86 memcmp expansion in place
> for all sizes.
>
>
> On Wed, Jun 7, 2017 at 10:00 AM, Clement Courbet via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>>
>> I'm working on a new pass to optimize comparison chains.
>>
>>
>> Motivation
>>
>>
>>
>> Clang currently generates inefficient code when dealing with contiguous
>> member-by-member structural equality. Consider:
>>
>>
>>
>> struct A {
>>
>>  bool operator==(const A& o) const { return i == o.i && j
== o.j; }
>>
>>  uint32 i;
>>
>>  uint32 j;
>>
>> };
>>
>>
>>
>> This generates:
>>
>>
>>
>>  mov     eax, dword ptr [rdi]
>>
>>  cmp     eax, dword ptr [rsi]
>>
>>  jne     .LBB0_1
>>
>>  mov     eax, dword ptr [rdi + 4]
>>
>>  cmp     eax, dword ptr [rsi + 4]
>>
>>  sete    al
>>
>>  ret
>>
>> .LBB0_1:
>>
>>  xor     eax, eax
>>
>>  ret
>>
>>
>>
>> I’ve been working on an LLVM pass that detects this pattern at IR level
>> and turns it into a memcmp() call. This generates more efficient code:
>>
>>
>>
>>  mov     rax, qword ptr [rdi]
>>
>>  cmp     rax, qword ptr [rsi]
>>
>>  sete    al
>>
>>  ret
>>
>>
>>
>> And thanks to recent improvements
<https://reviews.llvm.org/D28637> in
>> the memcmp codegen, this can be made to work for all sizes.
>>
>>
>>
>> Impact of the change
>>
>>
>>
>> I’ve measured the change on std:pair/std::tuple. The pass typically
makes
>> the code 2-3 times faster with code that’s typically 2-3x times
smaller.
>>
>>
>>
>> A more detailed description can be found here
>>
<https://docs.google.com/document/d/1CKp8cIfURXbPLSap0jFio7LW4suzR10u5gX4RBV0k7c/edit#>
>> and a proof of concept can be seen here
<https://reviews.llvm.org/D33987>
>> .
>>
>>
>>
>> Do you see any aspect of this that I may have missed?
>>
>> For now I’ve implemented this as a separate pass. Would there be a
better
>> way to integrate it?
>>
>>
>> Thanks !
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170608/75cfde0f/attachment.html>

llvm dev - Jun 2017 - [RFC] Optimizing Comparisons Chains

[llvm-dev] [RFC] Optimizing Comparisons Chains

[llvm-dev] [RFC] Optimizing Comparisons Chains

[llvm-dev] [RFC] Optimizing Comparisons Chains