On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Devs, >> Consider testcase here >> https://godbolt.org/z/qHZzqw >> When optimization is O1 or above it produces unoptimized code >> because it calls __tls_get_address in loops. >> While with optimization disabled >> It produce single call to __tls_get_address outside of loop. >> is this a missed optimization by llvm? >> > > It's interesting to me that there's a big difference in -fpie and -fpic. > > https://godbolt.org/z/klX3q3 > > In particular, with -fpie, no call to __tls_get_addr is needed, so the > underlying considerations for optimization change. This feels like the > optimizer isn't taking in to account the overhead of -fpic, when > determining whether to hoist the address calculation out of the loop. > > On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Looks pretty similar to the GCC generated code > > > Challenge accepted => https://godbolt.org/z/8PX2La >Which challenge? Sorry, could've linked to the godbolt I was looking at when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang trunk on the code linked in the original post. Looked/looks fairly similar to me. But yeah, I don't know much beyond that.> > -- Jorg > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/143aa026/attachment.html>
On Thu, Oct 31, 2019 at 11:26 AM David Blaikie <dblaikie at gmail.com> wrote:> On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hi Devs, >>> Consider testcase here >>> https://godbolt.org/z/qHZzqw >>> When optimization is O1 or above it produces unoptimized code >>> because it calls __tls_get_address in loops. >>> While with optimization disabled >>> It produce single call to __tls_get_address outside of loop. >>> is this a missed optimization by llvm? >>> >> >> It's interesting to me that there's a big difference in -fpie and -fpic. >> >> https://godbolt.org/z/klX3q3 >> >> In particular, with -fpie, no call to __tls_get_addr is needed, so the >> underlying considerations for optimization change. This feels like the >> optimizer isn't taking in to account the overhead of -fpic, when >> determining whether to hoist the address calculation out of the loop. >> >> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Looks pretty similar to the GCC generated code >> >> >> Challenge accepted => https://godbolt.org/z/8PX2La >> > > Which challenge? Sorry, could've linked to the godbolt I was looking at > when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang > trunk on the code linked in the original post. >Right, your example showed where gcc and clang were similar. My example https://godbolt.org/z/8PX2La showed where gcc produced code that was possibly twice as fast as clang's code. -- Jorg -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191031/ed1029e9/attachment.html>
Looks like, CodeGenPrepare::optimizeMemoryInst is sinking address computation into users basic block. so if we disable this(-mllvm -disable-cgp) we get same code as gcc. see here https://godbolt.org/z/bMvIsx On Fri, Nov 1, 2019 at 12:06 AM Jorg Brown <jorg.brown at gmail.com> wrote:> > On Thu, Oct 31, 2019 at 11:26 AM David Blaikie <dblaikie at gmail.com> wrote: >> >> On Thu, Oct 31, 2019 at 11:17 AM Jorg Brown via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> On Thu, Oct 31, 2019 at 8:50 AM kamlesh kumar via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> >>>> Hi Devs, >>>> Consider testcase here >>>> https://godbolt.org/z/qHZzqw >>>> When optimization is O1 or above it produces unoptimized code >>>> because it calls __tls_get_address in loops. >>>> While with optimization disabled >>>> It produce single call to __tls_get_address outside of loop. >>>> is this a missed optimization by llvm? >>> >>> >>> It's interesting to me that there's a big difference in -fpie and -fpic. >>> >>> https://godbolt.org/z/klX3q3 >>> >>> In particular, with -fpie, no call to __tls_get_addr is needed, so the underlying considerations for optimization change. This feels like the optimizer isn't taking in to account the overhead of -fpic, when determining whether to hoist the address calculation out of the loop. >>> >>> On Thu, Oct 31, 2019 at 10:36 AM David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> >>>> Looks pretty similar to the GCC generated code >>> >>> >>> Challenge accepted => https://godbolt.org/z/8PX2La >> >> >> Which challenge? Sorry, could've linked to the godbolt I was looking at when I said that: https://godbolt.org/z/_07tOk - comparing GCC and Clang trunk on the code linked in the original post. > > > Right, your example showed where gcc and clang were similar. > > My example https://godbolt.org/z/8PX2La showed where gcc produced code that was possibly twice as fast as clang's code. > > -- Jorg