thr3ads.net - llvm dev - [LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Roman Levenstein

2009-Feb-05 16:08 UTC

[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

Hi,

While testing my new register allocators on some test-cases, I've
noticed that LLVM misses sometimes some optimization opportunities:

1) LocalSpiller::RewriteMBB seems not to propagate the information
about e.g. Spills between MBBs.In many cases, where MBB B1 has only
one predecessor MBB B2, B1 could reuse the information about the
physical registers that are in the live-out set of B2. This could help
to e.g. eliminate some useless reloads from spill slots, if the value
is available on the required physical register already. For example,
in the example below, the marked "movl    12(%esp), %ecx" instruction
could be eliminated.

.LBB2_2:        # bb31
        movl    12(%esp), %ecx
        movl    8(%esp), %eax
        cmpl    $0, up+28(%eax,%ecx,4)
        je      .LBB2_9 # bb569
.LBB2_3:        # bb41         ; <--- bb31 is the only predecessor of bb41
        movl    12(%esp), %ecx ; <--- This could be eliminated!!!
        movl    4(%esp), %eax
        cmpl    $0, down(%eax,%ecx,4)
        je      .LBB2_9 # bb569


It is also worth mentioning, that currently reloads from spill slots
are not recorded in the Spills set using the addAvailable method, as
far as I can see. Wouldn't it make sense?

I have the feeling that  these improvements are rather easy to achieve
and would not require too much changes to the LocalSpiller. Probably,
we just need to keep the live-out set of the MBB around after
rewriting it, so that its successors can use it in some cases as
initial value for the Spills set.

Any opinions?

2) Moving of sub-expressions from loops and replacement of array
accesses via pointer-based induction variables is also not optimal in
some situations.
   In the example mentioned above, both blocks are executed inside a
loop enclosing them. And they keep evaluating  e.g. the
down(%eax,%ecx,4) expression on every iteration. GCC at the same time
hoists this expression outside of the loop and replaces it with a
simple pointer, as you can see below:

 .LBB2_2:
        movl    -32(%ebp), %edx
        movl    28(%edx), %eax
        testl   %eax, %eax
        je      .L5

.LBB2_3:
        movl    -48(%ebp), %eax
        movl    (%eax), %edi
        testl   %edi, %edi
        je      .L5


To make it possible for you to analyze this test-case, I attach the
source file, the BC file and the output of the code produced by LLVM
and by  "GCC -O6".

-Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.c.s
Type: application/octet-stream
Size: 10448 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.s.gcc
Type: application/octet-stream
Size: 12532 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.c.bc
Type: application/octet-stream
Size: 4720 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.c
Type: application/octet-stream
Size: 595 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0003.obj>

Evan Cheng

2009-Feb-06 06:40 UTC

head link

[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

Thanks. Can you file bugzilla reports? I'll look at the first one soon.

Evan
On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote:
> Hi,
>
> While testing my new register allocators on some test-cases, I've
> noticed that LLVM misses sometimes some optimization opportunities:
>
> 1) LocalSpiller::RewriteMBB seems not to propagate the information
> about e.g. Spills between MBBs.In many cases, where MBB B1 has only
> one predecessor MBB B2, B1 could reuse the information about the
> physical registers that are in the live-out set of B2. This could help
> to e.g. eliminate some useless reloads from spill slots, if the value
> is available on the required physical register already. For example,
> in the example below, the marked "movl    12(%esp), %ecx"
instruction
> could be eliminated.
>
> .LBB2_2:        # bb31
>        movl    12(%esp), %ecx
>        movl    8(%esp), %eax
>        cmpl    $0, up+28(%eax,%ecx,4)
>        je      .LBB2_9 # bb569
> .LBB2_3:        # bb41         ; <--- bb31 is the only predecessor  
> of bb41
>        movl    12(%esp), %ecx ; <--- This could be eliminated!!!
>        movl    4(%esp), %eax
>        cmpl    $0, down(%eax,%ecx,4)
>        je      .LBB2_9 # bb569
>
>
> It is also worth mentioning, that currently reloads from spill slots
> are not recorded in the Spills set using the addAvailable method, as
> far as I can see. Wouldn't it make sense?
>
> I have the feeling that  these improvements are rather easy to achieve
> and would not require too much changes to the LocalSpiller. Probably,
> we just need to keep the live-out set of the MBB around after
> rewriting it, so that its successors can use it in some cases as
> initial value for the Spills set.
>
> Any opinions?
>
> 2) Moving of sub-expressions from loops and replacement of array
> accesses via pointer-based induction variables is also not optimal in
> some situations.
>   In the example mentioned above, both blocks are executed inside a
> loop enclosing them. And they keep evaluating  e.g. the
> down(%eax,%ecx,4) expression on every iteration. GCC at the same time
> hoists this expression outside of the loop and replaces it with a
> simple pointer, as you can see below:
>
> .LBB2_2:
>        movl    -32(%ebp), %edx
>        movl    28(%edx), %eax
>        testl   %eax, %eax
>        je      .L5
>
> .LBB2_3:
>        movl    -48(%ebp), %eax
>        movl    (%eax), %edi
>        testl   %edi, %edi
>        je      .L5
>
>
> To make it possible for you to analyze this test-case, I attach the
> source file, the BC file and the output of the code produced by LLVM
> and by  "GCC -O6".
>
> -Roman
>
<8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c>

Roman Levenstein

2009-Feb-06 08:43 UTC

head link

[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

Done.

Please check these Bugzilla entries:

http://llvm.org/bugs/show_bug.cgi?id=3495 (LocalSpiller problems)

http://llvm.org/bugs/show_bug.cgi?id=3496 (Loop optimization problems)

-Roman

2009/2/6 Evan Cheng <echeng at apple.com>:> Thanks. Can you file bugzilla reports? I'll look at the first one soon.
>
> Evan
> On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote:
>
>> Hi,
>>
>> While testing my new register allocators on some test-cases, I've
>> noticed that LLVM misses sometimes some optimization opportunities:
>>
>> 1) LocalSpiller::RewriteMBB seems not to propagate the information
>> about e.g. Spills between MBBs.In many cases, where MBB B1 has only
>> one predecessor MBB B2, B1 could reuse the information about the
>> physical registers that are in the live-out set of B2. This could help
>> to e.g. eliminate some useless reloads from spill slots, if the value
>> is available on the required physical register already. For example,
>> in the example below, the marked "movl    12(%esp), %ecx"
instruction
>> could be eliminated.
>>
>> .LBB2_2:        # bb31
>>       movl    12(%esp), %ecx
>>       movl    8(%esp), %eax
>>       cmpl    $0, up+28(%eax,%ecx,4)
>>       je      .LBB2_9 # bb569
>> .LBB2_3:        # bb41         ; <--- bb31 is the only predecessor
of bb41
>>       movl    12(%esp), %ecx ; <--- This could be eliminated!!!
>>       movl    4(%esp), %eax
>>       cmpl    $0, down(%eax,%ecx,4)
>>       je      .LBB2_9 # bb569
>>
>>
>> It is also worth mentioning, that currently reloads from spill slots
>> are not recorded in the Spills set using the addAvailable method, as
>> far as I can see. Wouldn't it make sense?
>>
>> I have the feeling that  these improvements are rather easy to achieve
>> and would not require too much changes to the LocalSpiller. Probably,
>> we just need to keep the live-out set of the MBB around after
>> rewriting it, so that its successors can use it in some cases as
>> initial value for the Spills set.
>>
>> Any opinions?
>>
>> 2) Moving of sub-expressions from loops and replacement of array
>> accesses via pointer-based induction variables is also not optimal in
>> some situations.
>>  In the example mentioned above, both blocks are executed inside a
>> loop enclosing them. And they keep evaluating  e.g. the
>> down(%eax,%ecx,4) expression on every iteration. GCC at the same time
>> hoists this expression outside of the loop and replaces it with a
>> simple pointer, as you can see below:
>>
>> .LBB2_2:
>>       movl    -32(%ebp), %edx
>>       movl    28(%edx), %eax
>>       testl   %eax, %eax
>>       je      .L5
>>
>> .LBB2_3:
>>       movl    -48(%ebp), %eax
>>       movl    (%eax), %edi
>>       testl   %edi, %edi
>>       je      .L5
>>
>>
>> To make it possible for you to analyze this test-case, I attach the
>> source file, the BC file and the output of the code produced by LLVM
>> and by  "GCC -O6".
>>
>> -Roman
>>
<8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c>
>
>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Feb 2009 - [LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC

Possibly Parallel Threads