On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at jamesmolloy.co.uk> wrote:> > Hi, > > We'd need to look precisely at what's causing the code size bloat. Themidend commit pointed out by Steve shouldn't cause bloat in and of itself - it should reduce code size. It removes a load of stores and branches.> > I know a backend change I made to ARM isn't behaving as well as it could,and I have patches to fix that. Speculatively reverting midend patches isn't the best way to approach this, in my opinion! :)> For i586, the effect of r252152 seems to cause cmoves instead of branches. Code size increase is +35% for i586. Unfortunately the object files are wildly different in a way that does not seem to occur in other workloads. I tried to clip a concise before and after case. Before : As a reference point, I found OR $0x408 and OR $0x810 in close proximity. 278: 81 ca 10 08 00 00 or $0x810,%edx 27e: 89 10 mov %edx,(%eax) 280: f6 c1 40 test $0x40,%cl 283: 74 08 je 28d <t_run_test+0x28d> 285: 81 ca 08 04 00 00 or $0x408,%edx 28b: 89 10 mov %edx,(%eax) 28d: 84 c9 test %cl,%cl 28f: 0f 89 34 01 00 00 jns 3c9 <t_run_test+0x3c9> After r252152: Note that the OR $0x408 and OR $0x810 come now in reverse order. 35d: 81 c9 08 04 00 00 or $0x408,%ecx 363: 89 4c 24 28 mov %ecx,0x28(%esp) 367: 89 df mov %ebx,%edi 369: 83 e7 10 and $0x10,%edi 36c: 89 7c 24 20 mov %edi,0x20(%esp) 370: 0f 45 d1 cmovne %ecx,%edx 373: 89 d7 mov %edx,%edi 375: 81 cf 10 08 00 00 or $0x810,%edi 37b: 89 7c 24 14 mov %edi,0x14(%esp) 37f: 89 d9 mov %ebx,%ecx 381: 83 e1 20 and $0x20,%ecx 384: 89 4c 24 1c mov %ecx,0x1c(%esp) 388: 0f 45 d7 cmovne %edi,%edx 38b: 89 d7 mov %edx,%edi HTH, -steve -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151120/e41234db/attachment-0001.html>
On Fri, Nov 20, 2015 at 5:15 PM, Steve King <steve at metrokings.com> wrote:> > On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at jamesmolloy.co.uk>wrote:> > > > Hi, > > > > We'd need to look precisely at what's causing the code size bloat. Themidend commit pointed out by Steve shouldn't cause bloat in and of itself - it should reduce code size. It removes a load of stores and branches.> > > > I know a backend change I made to ARM isn't behaving as well as itcould, and I have patches to fix that. Speculatively reverting midend patches isn't the best way to approach this, in my opinion! :)> > > > For i586, the effect of r252152 seems to cause cmoves instead of branches. > Code size increase is +35% for i586. > Unfortunately the object files are wildly different in a way that doesnot seem to occur in other workloads. I tried to clip a concise before and after case.> > Before > : > As a reference point, I found OR $0x408 and OR $0x810 in close proximity. > > > 278: 81 ca 10 08 00 00 or $0x810,%edx > 27e: 89 10 mov %edx,(%eax) > 280: f6 c1 40 > > test $0x40,%cl > 283: 74 08 je 28d <t_run_test+0x28d> > 285: 81 ca 08 04 00 00 or $0x408,%edx > 28b: 89 10 mov %edx,(%eax) > 28d: 84 c9 test %cl,%cl > 28f: 0f 89 34 01 00 00 jns 3c9 <t_run_test+0x3c9> > > > After > r252152: > > Note that the OR $0x408 and OR $0x810 come > now > in reverse order. > > > 35d: 81 c9 08 04 00 00 or $0x408,%ecx > 363: 89 4c 24 28 mov %ecx,0x28(%esp) > 367: 89 df mov %ebx,%edi > 369: 83 e7 10 > > and $0x10,%edi > 36c: 89 7c 24 20 mov %edi,0x20(%esp) > 370: 0f 45 d1 > > cmovne %ecx,%edx > 373: 89 d7 mov %edx,%edi > 375: 81 cf 10 08 00 00 or $0x810,%edi > 37b: 89 7c 24 14 mov %edi,0x14(%esp) > 37f: 89 d9 mov %ebx,%ecx > 381: 83 e1 20 > > and $0x20,%ecx > 384: 89 4c 24 1c mov %ecx,0x1c(%esp) > 388: 0f 45 d7 > > cmovne %edi,%edx > 38b: 89 d7 mov %edx,%edi > > > HTH, > -steve >And the ll source for this snippet: %or105 = or i32 %.or83.or94, 1032 %.or83.or94.or105 = select i1 %tobool98, i32 %.or83.or94, i32 %or105 %and108 = and i32 %12, 32 %tobool109 = icmp eq i32 %and108, 0 %or116 = or i32 %.or83.or94.or105, 2064 %.or83.or94.or105.or116 = select i1 %tobool109, i32 %.or83.or94.or105, i32 %or116 %and119 = and i32 %12, 64 %tobool120 = icmp eq i32 %and119, 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151120/0d63bc91/attachment.html>
James Molloy via llvm-dev
2015-Nov-21 01:32 UTC
[llvm-dev] Recent -Os code size regressions
Hi Steve, That bitmnp01 is affected strongly by this commit is no accident- it was the main target of the patch. By if converting, many more opportunities for good codegen are exposed because the benchmark is essentially just moving and copying bit masks around. The biggest uplift comes from identifying a bit reversal idiom which I have patches for and will submit next week. The second biggest uplift comes from identifying bit trickery and emitting good codegen for it - the ARM backend uses the BFI instruction for this. Your snippet appears to show a lot more spills and moves but not worse code excluding those - in fact a store, test, jump sequence has become an and + cmov. So it looks like the x86 backend is doing a poor job here. James On Sat, 21 Nov 2015 at 01:23, Steve King <steve at metrokings.com> wrote:> On Fri, Nov 20, 2015 at 5:15 PM, Steve King <steve at metrokings.com> wrote: > > > > On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at jamesmolloy.co.uk> > wrote: > > > > > > Hi, > > > > > > We'd need to look precisely at what's causing the code size bloat. The > midend commit pointed out by Steve shouldn't cause bloat in and of itself - > it should reduce code size. It removes a load of stores and branches. > > > > > > I know a backend change I made to ARM isn't behaving as well as it > could, and I have patches to fix that. Speculatively reverting midend > patches isn't the best way to approach this, in my opinion! :) > > > > > > > For i586, the effect of r252152 seems to cause cmoves instead of > branches. > > Code size increase is +35% for i586. > > Unfortunately the object files are wildly different in a way that does > not seem to occur in other workloads. I tried to clip a concise before and > after case. > > > > Before > > : > > As a reference point, I found OR $0x408 and OR $0x810 in close proximity. > > > > > > 278: 81 ca 10 08 00 00 or $0x810,%edx > > 27e: 89 10 mov %edx,(%eax) > > 280: f6 c1 40 > > > > test $0x40,%cl > > 283: 74 08 je 28d <t_run_test+0x28d> > > 285: 81 ca 08 04 00 00 or $0x408,%edx > > 28b: 89 10 mov %edx,(%eax) > > 28d: 84 c9 test %cl,%cl > > 28f: 0f 89 34 01 00 00 jns 3c9 <t_run_test+0x3c9> > > > > > > After > > r252152: > > > > Note that the OR $0x408 and OR $0x810 come > > now > > in reverse order. > > > > > > 35d: 81 c9 08 04 00 00 or $0x408,%ecx > > 363: 89 4c 24 28 mov %ecx,0x28(%esp) > > 367: 89 df mov %ebx,%edi > > 369: 83 e7 10 > > > > and $0x10,%edi > > 36c: 89 7c 24 20 mov %edi,0x20(%esp) > > 370: 0f 45 d1 > > > > cmovne %ecx,%edx > > 373: 89 d7 mov %edx,%edi > > 375: 81 cf 10 08 00 00 or $0x810,%edi > > 37b: 89 7c 24 14 mov %edi,0x14(%esp) > > 37f: 89 d9 mov %ebx,%ecx > > 381: 83 e1 20 > > > > and $0x20,%ecx > > 384: 89 4c 24 1c mov %ecx,0x1c(%esp) > > 388: 0f 45 d7 > > > > cmovne %edi,%edx > > 38b: 89 d7 mov %edx,%edi > > > > > > HTH, > > -steve > > > > And the ll source for this snippet: > > %or105 = or i32 %.or83.or94, 1032 > %.or83.or94.or105 = select i1 %tobool98, i32 %.or83.or94, i32 %or105 > %and108 = and i32 %12, 32 > %tobool109 = icmp eq i32 %and108, 0 > %or116 = or i32 %.or83.or94.or105, 2064 > %.or83.or94.or105.or116 = select i1 %tobool109, i32 %.or83.or94.or105, > i32 %or116 > %and119 = and i32 %12, 64 > %tobool120 = icmp eq i32 %and119, 0 > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151121/0c6c7897/attachment.html>