Rob via llvm-dev
2019-May-15 15:27 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
I am developing in C for an extremely memory constrained AARCH64 embedded environment. Sometime between llvm 6 and 7, I'm seeing a code size regression when I make multiple accesses into a global struct. Specifically, I have functions that perform several reads/writes into this global struct. In older versions (5/6) - a single ADRP/ADD combo is issued at the beginning of a function to get my structure address into a register - that register is preserved throughout the function - subsequent accesses into this structure are done as LDR/STR with offset from the preserved register In later versions (7/8) - the ADRP/ADD combo is performed every time I try to access something inside the struct. The net result is slightly larger code that has the potential to cause me issues. There are plenty of unused registers that could be used for the purpose of not constantly re-loading the address of my struct. My current suspicion is that later versions are presuming fewer registers are not being preserved by other function calls, and therefore can't be relied upon to hold the address of my struct. Assuming this is right, is there some way to encourage the behavior of the older versions? Thanks, Robert M -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/592837ce/attachment.html>
Florian Hahn via llvm-dev
2019-May-15 15:38 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
Hi,> On May 15, 2019, at 16:27, Rob via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I am developing in C for an extremely memory constrained AARCH64 embedded environment. Sometime between llvm 6 and 7, I'm seeing a code size regression when I make multiple accesses into a global struct. Specifically, I have functions that perform several reads/writes into this global struct. > > In older versions (5/6) > - a single ADRP/ADD combo is issued at the beginning of a function to get my structure address into a register > - that register is preserved throughout the function > - subsequent accesses into this structure are done as LDR/STR with offset from the preserved register > > In later versions (7/8) > - the ADRP/ADD combo is performed every time I try to access something inside the struct. > > The net result is slightly larger code that has the potential to cause me issues. There are plenty of unused registers that could be used for the purpose of not constantly re-loading the address of my struct. My current suspicion is that later versions are presuming fewer registers are not being preserved by other function calls, and therefore can't be relied upon to hold the address of my struct. Assuming this is right, is there some way to encourage the behavior of the older versions?Is the IR that gets fed into the backend equivalent between 5/6 and 7/8? This sounds like something could go wrong earlier, e.g. failing to eliminate congruent address computations in GVN. In any case, to get to the bottom of this, a reproducer would be helpful. Cheers, Florian
Rob via llvm-dev
2019-May-15 16:21 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
I can't provide my exact problematic code, but here is a trivial example of a .c file that produces different results in clang 6 vs 7. The compiled output from v7 generates two extra adrp instructions. I compiled just with "clang -Oz -c" for both versions. Rob M On Wed, May 15, 2019 at 11:39 AM Florian Hahn <florian_hahn at apple.com> wrote:> Hi, > > > On May 15, 2019, at 16:27, Rob via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > > I am developing in C for an extremely memory constrained AARCH64 > embedded environment. Sometime between llvm 6 and 7, I'm seeing a code > size regression when I make multiple accesses into a global struct. > Specifically, I have functions that perform several reads/writes into this > global struct. > > > > In older versions (5/6) > > - a single ADRP/ADD combo is issued at the beginning of a function to > get my structure address into a register > > - that register is preserved throughout the function > > - subsequent accesses into this structure are done as LDR/STR with > offset from the preserved register > > > > In later versions (7/8) > > - the ADRP/ADD combo is performed every time I try to access something > inside the struct. > > > > The net result is slightly larger code that has the potential to cause > me issues. There are plenty of unused registers that could be used for the > purpose of not constantly re-loading the address of my struct. My current > suspicion is that later versions are presuming fewer registers are not > being preserved by other function calls, and therefore can't be relied upon > to hold the address of my struct. Assuming this is right, is there some > way to encourage the behavior of the older versions? > > > Is the IR that gets fed into the backend equivalent between 5/6 and 7/8? > This sounds like something could go wrong earlier, e.g. failing to > eliminate congruent address computations in GVN. > > In any case, to get to the bottom of this, a reproducer would be helpful. > > Cheers, > Florian >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/8f06b996/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: test.c Type: application/octet-stream Size: 580 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/8f06b996/attachment.obj>
Rob via llvm-dev
2019-May-15 17:57 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
I did a bit more poking, using llvm/clang versions 6-8. The IR in all cases appears fundamentally identical. I ran the IR generated by version 6 through llc on all three versions. llc-7/8 produced the extra ADRPs, llc-6 did not. So (to my untrained eyes), the IR is generated the same, it is in the IR->AARCH64 asm pass that the extra instructions are being generated. On Wed, May 15, 2019 at 11:39 AM Florian Hahn <florian_hahn at apple.com> wrote:> Hi, > > > On May 15, 2019, at 16:27, Rob via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > > I am developing in C for an extremely memory constrained AARCH64 > embedded environment. Sometime between llvm 6 and 7, I'm seeing a code > size regression when I make multiple accesses into a global struct. > Specifically, I have functions that perform several reads/writes into this > global struct. > > > > In older versions (5/6) > > - a single ADRP/ADD combo is issued at the beginning of a function to > get my structure address into a register > > - that register is preserved throughout the function > > - subsequent accesses into this structure are done as LDR/STR with > offset from the preserved register > > > > In later versions (7/8) > > - the ADRP/ADD combo is performed every time I try to access something > inside the struct. > > > > The net result is slightly larger code that has the potential to cause > me issues. There are plenty of unused registers that could be used for the > purpose of not constantly re-loading the address of my struct. My current > suspicion is that later versions are presuming fewer registers are not > being preserved by other function calls, and therefore can't be relied upon > to hold the address of my struct. Assuming this is right, is there some > way to encourage the behavior of the older versions? > > > Is the IR that gets fed into the backend equivalent between 5/6 and 7/8? > This sounds like something could go wrong earlier, e.g. failing to > eliminate congruent address computations in GVN. > > In any case, to get to the bottom of this, a reproducer would be helpful. > > Cheers, > Florian >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/5b7541cb/attachment.html>
Possibly Parallel Threads
- AARCH64 Code Size regression between 6/7
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
- Aarch64: unaligned access despite -mstrict-align