thr3ads.net - llvm dev - [llvm-dev] llvm emits unoptimized code [Oct 2019]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2019-Oct-31 18:26 UTC

[llvm-dev] llvm emits unoptimized code

On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Devs,
>> Consider testcase here
>> https://godbolt.org/z/qHZzqw
>> When optimization is O1 or above it produces unoptimized code
>> because it calls __tls_get_address in loops.
>> While with optimization disabled
>> It produce single call to __tls_get_address outside of loop.
>> is this a missed optimization by llvm?
>>
>
> It's interesting to me that there's a big difference in -fpie and
-fpic.
>
> https://godbolt.org/z/klX3q3
>
> In particular, with -fpie, no call to __tls_get_addr is needed, so the
> underlying considerations for optimization change.  This feels like the
> optimizer isn't taking in to account the overhead of -fpic, when
> determining whether to hoist the address calculation out of the loop.
>
> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Looks pretty similar to the GCC generated code
>
>
> Challenge accepted => https://godbolt.org/z/8PX2La
>
Which challenge? Sorry, could've linked to the godbolt I was looking at
when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang
trunk on the code linked in the original post. Looked/looks fairly similar
to me. But yeah, I don't know much beyond that.

>
> -- Jorg
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/143aa026/attachment.html>

Jorg Brown via llvm-dev

2019-Oct-31 18:36 UTC

head link

[llvm-dev] llvm emits unoptimized code

On Thu, Oct 31, 2019 at 11:26 AM David Blaikie <dblaikie at gmail.com>
wrote:
> On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi Devs,
>>> Consider testcase here
>>> https://godbolt.org/z/qHZzqw
>>> When optimization is O1 or above it produces unoptimized code
>>> because it calls __tls_get_address in loops.
>>> While with optimization disabled
>>> It produce single call to __tls_get_address outside of loop.
>>> is this a missed optimization by llvm?
>>>
>>
>> It's interesting to me that there's a big difference in -fpie
and -fpic.
>>
>> https://godbolt.org/z/klX3q3
>>
>> In particular, with -fpie, no call to __tls_get_addr is needed, so the
>> underlying considerations for optimization change.  This feels like the
>> optimizer isn't taking in to account the overhead of -fpic, when
>> determining whether to hoist the address calculation out of the loop.
>>
>> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Looks pretty similar to the GCC generated code
>>
>>
>> Challenge accepted => https://godbolt.org/z/8PX2La
>>
>
> Which challenge? Sorry, could've linked to the godbolt I was looking at
> when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang
> trunk on the code linked in the original post.
>
Right, your example showed where gcc and clang were similar.

My example https://godbolt.org/z/8PX2La showed where gcc produced code that
was possibly twice as fast as clang's code.

-- Jorg
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/ed1029e9/attachment.html>

kamlesh kumar via llvm-dev

2019-Nov-01 09:11 UTC

head link

[llvm-dev] llvm emits unoptimized code

Looks like,
CodeGenPrepare::optimizeMemoryInst is sinking address computation into
users basic block.
so if we disable this(-mllvm -disable-cgp) we get  same code as gcc.
see here https://godbolt.org/z/bMvIsx

On Fri, Nov 1, 2019 at 12:06 AM Jorg Brown <jorg.brown at gmail.com>
wrote:>
> On Thu, Oct 31, 2019 at 11:26 AM David Blaikie <dblaikie at
gmail.com> wrote:
>>
>> On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>>
>>> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Hi Devs,
>>>> Consider testcase here
>>>> https://godbolt.org/z/qHZzqw
>>>> When optimization is O1 or above it produces unoptimized code
>>>> because it calls __tls_get_address in loops.
>>>> While with optimization disabled
>>>> It produce single call to __tls_get_address outside of loop.
>>>> is this a missed optimization by llvm?
>>>
>>>
>>> It's interesting to me that there's a big difference in
-fpie and -fpic.
>>>
>>> https://godbolt.org/z/klX3q3
>>>
>>> In particular, with -fpie, no call to __tls_get_addr is needed, so
the underlying considerations for optimization change.  This feels like the
optimizer isn't taking in to account the overhead of -fpic, when determining
whether to hoist the address calculation out of the loop.
>>>
>>> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> Looks pretty similar to the GCC generated code
>>>
>>>
>>> Challenge accepted => https://godbolt.org/z/8PX2La
>>
>>
>> Which challenge? Sorry, could've linked to the godbolt I was
looking at when I said that: https://godbolt.org/z/_07tOk - comparing GCC and
Clang trunk on the code linked in the original post.
>
>
> Right, your example showed where gcc and clang were similar.
>
> My example https://godbolt.org/z/8PX2La showed where gcc produced code that
was possibly twice as fast as clang's code.
>
> -- Jorg

llvm dev - Oct 2019 - llvm emits unoptimized code

[llvm-dev] llvm emits unoptimized code

[llvm-dev] llvm emits unoptimized code

[llvm-dev] llvm emits unoptimized code