Rob via llvm-dev
2019-May-15 17:57 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
I did a bit more poking, using llvm/clang versions 6-8. The IR in all cases appears fundamentally identical. I ran the IR generated by version 6 through llc on all three versions. llc-7/8 produced the extra ADRPs, llc-6 did not. So (to my untrained eyes), the IR is generated the same, it is in the IR->AARCH64 asm pass that the extra instructions are being generated. On Wed, May 15, 2019 at 11:39 AM Florian Hahn <florian_hahn at apple.com> wrote:> Hi, > > > On May 15, 2019, at 16:27, Rob via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > > I am developing in C for an extremely memory constrained AARCH64 > embedded environment. Sometime between llvm 6 and 7, I'm seeing a code > size regression when I make multiple accesses into a global struct. > Specifically, I have functions that perform several reads/writes into this > global struct. > > > > In older versions (5/6) > > - a single ADRP/ADD combo is issued at the beginning of a function to > get my structure address into a register > > - that register is preserved throughout the function > > - subsequent accesses into this structure are done as LDR/STR with > offset from the preserved register > > > > In later versions (7/8) > > - the ADRP/ADD combo is performed every time I try to access something > inside the struct. > > > > The net result is slightly larger code that has the potential to cause > me issues. There are plenty of unused registers that could be used for the > purpose of not constantly re-loading the address of my struct. My current > suspicion is that later versions are presuming fewer registers are not > being preserved by other function calls, and therefore can't be relied upon > to hold the address of my struct. Assuming this is right, is there some > way to encourage the behavior of the older versions? > > > Is the IR that gets fed into the backend equivalent between 5/6 and 7/8? > This sounds like something could go wrong earlier, e.g. failing to > eliminate congruent address computations in GVN. > > In any case, to get to the bottom of this, a reproducer would be helpful. > > Cheers, > Florian >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/5b7541cb/attachment.html>
Sjoerd Meijer via llvm-dev
2019-May-15 19:29 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
Bit of a drive-by comment as I haven't looked at the test case, but could the tiny code model be helpful here (or is it perhaps related to that)? Option -mcmodel=tiny was added not that long ago, possibly around that time. Cheers, Sjoerd. ________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Rob via llvm-dev <llvm-dev at lists.llvm.org> Sent: 15 May 2019 18:57 To: Florian Hahn Cc: llvm-dev Subject: Re: [llvm-dev] AARCH64 Code Size regression between 6/7 I did a bit more poking, using llvm/clang versions 6-8. The IR in all cases appears fundamentally identical. I ran the IR generated by version 6 through llc on all three versions. llc-7/8 produced the extra ADRPs, llc-6 did not. So (to my untrained eyes), the IR is generated the same, it is in the IR->AARCH64 asm pass that the extra instructions are being generated. On Wed, May 15, 2019 at 11:39 AM Florian Hahn <florian_hahn at apple.com<mailto:florian_hahn at apple.com>> wrote: Hi,> On May 15, 2019, at 16:27, Rob via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: > > I am developing in C for an extremely memory constrained AARCH64 embedded environment. Sometime between llvm 6 and 7, I'm seeing a code size regression when I make multiple accesses into a global struct. Specifically, I have functions that perform several reads/writes into this global struct. > > In older versions (5/6) > - a single ADRP/ADD combo is issued at the beginning of a function to get my structure address into a register > - that register is preserved throughout the function > - subsequent accesses into this structure are done as LDR/STR with offset from the preserved register > > In later versions (7/8) > - the ADRP/ADD combo is performed every time I try to access something inside the struct. > > The net result is slightly larger code that has the potential to cause me issues. There are plenty of unused registers that could be used for the purpose of not constantly re-loading the address of my struct. My current suspicion is that later versions are presuming fewer registers are not being preserved by other function calls, and therefore can't be relied upon to hold the address of my struct. Assuming this is right, is there some way to encourage the behavior of the older versions?Is the IR that gets fed into the backend equivalent between 5/6 and 7/8? This sounds like something could go wrong earlier, e.g. failing to eliminate congruent address computations in GVN. In any case, to get to the bottom of this, a reproducer would be helpful. Cheers, Florian IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/4f606cc8/attachment.html>
Rob via llvm-dev
2019-May-15 19:45 UTC
[llvm-dev] AARCH64 Code Size regression between 6/7
An good thought. While the tiny model (which I think originates in llvm8?) does compile for this trivial test case, I am unable to use it in my actual case because some of my addresses are too far way for ADRs. Additionally, just trying it on this test case still seems to generate more ADRPs than I would normally think necessary. On Wed, May 15, 2019 at 3:29 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote:> Bit of a drive-by comment as I haven't looked at the test case, > but could the tiny code model be helpful here (or is it perhaps related to > that)? > Option -mcmodel=tiny was added not that long ago, possibly around that > time. > > Cheers, > Sjoerd. > > ------------------------------ > *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Rob via > llvm-dev <llvm-dev at lists.llvm.org> > *Sent:* 15 May 2019 18:57 > *To:* Florian Hahn > *Cc:* llvm-dev > *Subject:* Re: [llvm-dev] AARCH64 Code Size regression between 6/7 > > I did a bit more poking, using llvm/clang versions 6-8. The IR in all > cases appears fundamentally identical. I ran the IR generated by version 6 > through llc on all three versions. llc-7/8 produced the extra ADRPs, llc-6 > did not. So (to my untrained eyes), the IR is generated the same, it is in > the IR->AARCH64 asm pass that the extra instructions are being generated. > > On Wed, May 15, 2019 at 11:39 AM Florian Hahn <florian_hahn at apple.com> > wrote: > > Hi, > > > On May 15, 2019, at 16:27, Rob via llvm-dev <llvm-dev at lists.llvm.org> > wrote: > > > > I am developing in C for an extremely memory constrained AARCH64 > embedded environment. Sometime between llvm 6 and 7, I'm seeing a code > size regression when I make multiple accesses into a global struct. > Specifically, I have functions that perform several reads/writes into this > global struct. > > > > In older versions (5/6) > > - a single ADRP/ADD combo is issued at the beginning of a function to > get my structure address into a register > > - that register is preserved throughout the function > > - subsequent accesses into this structure are done as LDR/STR with > offset from the preserved register > > > > In later versions (7/8) > > - the ADRP/ADD combo is performed every time I try to access something > inside the struct. > > > > The net result is slightly larger code that has the potential to cause > me issues. There are plenty of unused registers that could be used for the > purpose of not constantly re-loading the address of my struct. My current > suspicion is that later versions are presuming fewer registers are not > being preserved by other function calls, and therefore can't be relied upon > to hold the address of my struct. Assuming this is right, is there some > way to encourage the behavior of the older versions? > > > Is the IR that gets fed into the backend equivalent between 5/6 and 7/8? > This sounds like something could go wrong earlier, e.g. failing to > eliminate congruent address computations in GVN. > > In any case, to get to the bottom of this, a reproducer would be helpful. > > Cheers, > Florian > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190515/6f6e37fd/attachment.html>