thr3ads.net - llvm dev - [llvm-dev] Recent -Os code size regressions [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Steve King via llvm-dev

2015-Nov-21 01:15 UTC

[llvm-dev] Recent -Os code size regressions

On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at jamesmolloy.co.uk>
wrote:>
> Hi,
>
> We'd need to look precisely at what's causing the code size bloat.
Themidend commit pointed out by Steve shouldn't cause bloat in and of itself -
it should reduce code size. It removes a load of stores and
branches.>
> I know a backend change I made to ARM isn't behaving as well as it
could,and I have patches to fix that. Speculatively reverting midend patches
isn't the best way to approach this, in my opinion!
:)>

For i586, the effect of r252152 seems to cause cmoves instead of branches.
  Code size increase is +35% for i586.
Unfortunately the object files are wildly different in a way that does not
seem to occur in other workloads.  I tried to clip a concise before and
after case.

Before
:
As a reference point, I found OR $0x408 and OR $0x810 in close proximity.


 278: 81 ca 10 08 00 00     or     $0x810,%edx
 27e: 89 10                 mov    %edx,(%eax)
 280: f6 c1 40
          
    test   $0x40,%cl
 283: 74 08                 je     28d <t_run_test+0x28d>
 285: 81 ca 08 04 00 00     or     $0x408,%edx
 28b: 89 10                 mov    %edx,(%eax)
 28d: 84 c9                 test   %cl,%cl
 28f: 0f 89 34 01 00 00     jns    3c9 <t_run_test+0x3c9>


After
 r252152:

Note that the OR $0x408 and OR $0x810 come
now 
in reverse order.


35d: 81 c9 08 04 00 00     or     $0x408,%ecx
363: 89 4c 24 28           mov    %ecx,0x28(%esp)
367: 89 df                 mov    %ebx,%edi
369: 83 e7 10
        
      and    $0x10,%edi
36c: 89 7c 24 20           mov    %edi,0x20(%esp)
370: 0f 45 d1
           
   cmovne %ecx,%edx
373: 89 d7                 mov    %edx,%edi
375: 81 cf 10 08 00 00     or     $0x810,%edi
37b: 89 7c 24 14           mov    %edi,0x14(%esp)
37f: 89 d9                 mov    %ebx,%ecx
381: 83 e1 20
       
       and    $0x20,%ecx
384: 89 4c 24 1c           mov    %ecx,0x1c(%esp)
388: 0f 45 d7
        
      cmovne %edi,%edx
38b: 89 d7                 mov    %edx,%edi


HTH,
-steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151120/e41234db/attachment-0001.html>

Steve King via llvm-dev

2015-Nov-21 01:22 UTC

head link

[llvm-dev] Recent -Os code size regressions

On Fri, Nov 20, 2015 at 5:15 PM, Steve King <steve at metrokings.com>
wrote:>
> On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at
jamesmolloy.co.uk>
wrote:> >
> > Hi,
> >
> > We'd need to look precisely at what's causing the code size
bloat. Themidend commit pointed out by Steve shouldn't cause bloat in and of itself -
it should reduce code size. It removes a load of stores and
branches.> >
> > I know a backend change I made to ARM isn't behaving as well as itcould, and I have patches to fix that. Speculatively reverting midend
patches isn't the best way to approach this, in my opinion!
:)> >
>
> For i586, the effect of r252152 seems to cause cmoves instead of branches.
>  Code size increase is +35% for i586.
> Unfortunately the object files are wildly different in a way that doesnot seem to occur in other workloads.  I tried to clip a concise before and
after case.>
> Before
> :
> As a reference point, I found OR $0x408 and OR $0x810 in close proximity.
>
>
>  278: 81 ca 10 08 00 00     or     $0x810,%edx
>  27e: 89 10                 mov    %edx,(%eax)
>  280: f6 c1 40
>
>     test   $0x40,%cl
>  283: 74 08                 je     28d <t_run_test+0x28d>
>  285: 81 ca 08 04 00 00     or     $0x408,%edx
>  28b: 89 10                 mov    %edx,(%eax)
>  28d: 84 c9                 test   %cl,%cl
>  28f: 0f 89 34 01 00 00     jns    3c9 <t_run_test+0x3c9>
>
>
> After
> r252152:
>
> Note that the OR $0x408 and OR $0x810 come
> now
> in reverse order.
>
>
> 35d: 81 c9 08 04 00 00     or     $0x408,%ecx
> 363: 89 4c 24 28           mov    %ecx,0x28(%esp)
> 367: 89 df                 mov    %ebx,%edi
> 369: 83 e7 10
>
>       and    $0x10,%edi
> 36c: 89 7c 24 20           mov    %edi,0x20(%esp)
> 370: 0f 45 d1
>
>    cmovne %ecx,%edx
> 373: 89 d7                 mov    %edx,%edi
> 375: 81 cf 10 08 00 00     or     $0x810,%edi
> 37b: 89 7c 24 14           mov    %edi,0x14(%esp)
> 37f: 89 d9                 mov    %ebx,%ecx
> 381: 83 e1 20
>
>        and    $0x20,%ecx
> 384: 89 4c 24 1c           mov    %ecx,0x1c(%esp)
> 388: 0f 45 d7
>
>       cmovne %edi,%edx
> 38b: 89 d7                 mov    %edx,%edi
>
>
> HTH,
> -steve
>
And the ll source for this snippet:

  %or105 = or i32 %.or83.or94, 1032
  %.or83.or94.or105 = select i1 %tobool98, i32 %.or83.or94, i32 %or105
  %and108 = and i32 %12, 32
  %tobool109 = icmp eq i32 %and108, 0
  %or116 = or i32 %.or83.or94.or105, 2064
  %.or83.or94.or105.or116 = select i1 %tobool109, i32 %.or83.or94.or105,
i32 %or116
  %and119 = and i32 %12, 64
  %tobool120 = icmp eq i32 %and119, 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151120/0d63bc91/attachment.html>

James Molloy via llvm-dev

2015-Nov-21 01:32 UTC

head link

[llvm-dev] Recent -Os code size regressions

Hi Steve,

That bitmnp01 is affected strongly by this commit is no accident- it was
the main target of the patch. By if converting, many more opportunities for
good codegen are exposed because the benchmark is essentially just moving
and copying bit masks around.

The biggest uplift comes from identifying a bit reversal idiom which I have
patches for and will submit next week. The second biggest uplift comes from
identifying bit trickery and emitting good codegen for it - the ARM backend
uses the BFI instruction for this.

Your snippet appears to show a lot more spills and moves but not worse code
excluding those - in fact a store, test, jump sequence has become an and +
cmov. So it looks like the x86 backend is doing a poor job here.

James
On Sat, 21 Nov 2015 at 01:23, Steve King <steve at metrokings.com> wrote:
> On Fri, Nov 20, 2015 at 5:15 PM, Steve King <steve at metrokings.com>
wrote:
> >
> > On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at
jamesmolloy.co.uk>
> wrote:
> > >
> > > Hi,
> > >
> > > We'd need to look precisely at what's causing the code
size bloat. The
> midend commit pointed out by Steve shouldn't cause bloat in and of
itself -
> it should reduce code size. It removes a load of stores and branches.
> > >
> > > I know a backend change I made to ARM isn't behaving as well
as it
> could, and I have patches to fix that. Speculatively reverting midend
> patches isn't the best way to approach this, in my opinion! :)
> > >
> >
> > For i586, the effect of r252152 seems to cause cmoves instead of
> branches.
> >  Code size increase is +35% for i586.
> > Unfortunately the object files are wildly different in a way that does
> not seem to occur in other workloads.  I tried to clip a concise before and
> after case.
> >
> > Before
> > :
> > As a reference point, I found OR $0x408 and OR $0x810 in close
proximity.
> >
> >
> >  278: 81 ca 10 08 00 00     or     $0x810,%edx
> >  27e: 89 10                 mov    %edx,(%eax)
> >  280: f6 c1 40
> >
> >     test   $0x40,%cl
> >  283: 74 08                 je     28d <t_run_test+0x28d>
> >  285: 81 ca 08 04 00 00     or     $0x408,%edx
> >  28b: 89 10                 mov    %edx,(%eax)
> >  28d: 84 c9                 test   %cl,%cl
> >  28f: 0f 89 34 01 00 00     jns    3c9 <t_run_test+0x3c9>
> >
> >
> > After
> > r252152:
> >
> > Note that the OR $0x408 and OR $0x810 come
> > now
> > in reverse order.
> >
> >
> > 35d: 81 c9 08 04 00 00     or     $0x408,%ecx
> > 363: 89 4c 24 28           mov    %ecx,0x28(%esp)
> > 367: 89 df                 mov    %ebx,%edi
> > 369: 83 e7 10
> >
> >       and    $0x10,%edi
> > 36c: 89 7c 24 20           mov    %edi,0x20(%esp)
> > 370: 0f 45 d1
> >
> >    cmovne %ecx,%edx
> > 373: 89 d7                 mov    %edx,%edi
> > 375: 81 cf 10 08 00 00     or     $0x810,%edi
> > 37b: 89 7c 24 14           mov    %edi,0x14(%esp)
> > 37f: 89 d9                 mov    %ebx,%ecx
> > 381: 83 e1 20
> >
> >        and    $0x20,%ecx
> > 384: 89 4c 24 1c           mov    %ecx,0x1c(%esp)
> > 388: 0f 45 d7
> >
> >       cmovne %edi,%edx
> > 38b: 89 d7                 mov    %edx,%edi
> >
> >
> > HTH,
> > -steve
> >
>
> And the ll source for this snippet:
>
>   %or105 = or i32 %.or83.or94, 1032
>   %.or83.or94.or105 = select i1 %tobool98, i32 %.or83.or94, i32 %or105
>   %and108 = and i32 %12, 32
>   %tobool109 = icmp eq i32 %and108, 0
>   %or116 = or i32 %.or83.or94.or105, 2064
>   %.or83.or94.or105.or116 = select i1 %tobool109, i32 %.or83.or94.or105,
> i32 %or116
>   %and119 = and i32 %12, 64
>   %tobool120 = icmp eq i32 %and119, 0
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151121/0c6c7897/attachment.html>

llvm dev - Nov 2015 - Recent -Os code size regressions

[llvm-dev] Recent -Os code size regressions

[llvm-dev] Recent -Os code size regressions

[llvm-dev] Recent -Os code size regressions