Roman Levenstein
2009-Feb-05 16:08 UTC
[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
Hi,
While testing my new register allocators on some test-cases, I've
noticed that LLVM misses sometimes some optimization opportunities:
1) LocalSpiller::RewriteMBB seems not to propagate the information
about e.g. Spills between MBBs.In many cases, where MBB B1 has only
one predecessor MBB B2, B1 could reuse the information about the
physical registers that are in the live-out set of B2. This could help
to e.g. eliminate some useless reloads from spill slots, if the value
is available on the required physical register already. For example,
in the example below, the marked "movl 12(%esp), %ecx" instruction
could be eliminated.
.LBB2_2: # bb31
movl 12(%esp), %ecx
movl 8(%esp), %eax
cmpl $0, up+28(%eax,%ecx,4)
je .LBB2_9 # bb569
.LBB2_3: # bb41 ; <--- bb31 is the only predecessor of bb41
movl 12(%esp), %ecx ; <--- This could be eliminated!!!
movl 4(%esp), %eax
cmpl $0, down(%eax,%ecx,4)
je .LBB2_9 # bb569
It is also worth mentioning, that currently reloads from spill slots
are not recorded in the Spills set using the addAvailable method, as
far as I can see. Wouldn't it make sense?
I have the feeling that these improvements are rather easy to achieve
and would not require too much changes to the LocalSpiller. Probably,
we just need to keep the live-out set of the MBB around after
rewriting it, so that its successors can use it in some cases as
initial value for the Spills set.
Any opinions?
2) Moving of sub-expressions from loops and replacement of array
accesses via pointer-based induction variables is also not optimal in
some situations.
In the example mentioned above, both blocks are executed inside a
loop enclosing them. And they keep evaluating e.g. the
down(%eax,%ecx,4) expression on every iteration. GCC at the same time
hoists this expression outside of the loop and replaces it with a
simple pointer, as you can see below:
.LBB2_2:
movl -32(%ebp), %edx
movl 28(%edx), %eax
testl %eax, %eax
je .L5
.LBB2_3:
movl -48(%ebp), %eax
movl (%eax), %edi
testl %edi, %edi
je .L5
To make it possible for you to analyze this test-case, I attach the
source file, the BC file and the output of the code produced by LLVM
and by "GCC -O6".
-Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.c.s
Type: application/octet-stream
Size: 10448 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.s.gcc
Type: application/octet-stream
Size: 12532 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.c.bc
Type: application/octet-stream
Size: 4720 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 8q_speed.c
Type: application/octet-stream
Size: 595 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0003.obj>
Evan Cheng
2009-Feb-06 06:40 UTC
[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
Thanks. Can you file bugzilla reports? I'll look at the first one soon. Evan On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote:> Hi, > > While testing my new register allocators on some test-cases, I've > noticed that LLVM misses sometimes some optimization opportunities: > > 1) LocalSpiller::RewriteMBB seems not to propagate the information > about e.g. Spills between MBBs.In many cases, where MBB B1 has only > one predecessor MBB B2, B1 could reuse the information about the > physical registers that are in the live-out set of B2. This could help > to e.g. eliminate some useless reloads from spill slots, if the value > is available on the required physical register already. For example, > in the example below, the marked "movl 12(%esp), %ecx" instruction > could be eliminated. > > .LBB2_2: # bb31 > movl 12(%esp), %ecx > movl 8(%esp), %eax > cmpl $0, up+28(%eax,%ecx,4) > je .LBB2_9 # bb569 > .LBB2_3: # bb41 ; <--- bb31 is the only predecessor > of bb41 > movl 12(%esp), %ecx ; <--- This could be eliminated!!! > movl 4(%esp), %eax > cmpl $0, down(%eax,%ecx,4) > je .LBB2_9 # bb569 > > > It is also worth mentioning, that currently reloads from spill slots > are not recorded in the Spills set using the addAvailable method, as > far as I can see. Wouldn't it make sense? > > I have the feeling that these improvements are rather easy to achieve > and would not require too much changes to the LocalSpiller. Probably, > we just need to keep the live-out set of the MBB around after > rewriting it, so that its successors can use it in some cases as > initial value for the Spills set. > > Any opinions? > > 2) Moving of sub-expressions from loops and replacement of array > accesses via pointer-based induction variables is also not optimal in > some situations. > In the example mentioned above, both blocks are executed inside a > loop enclosing them. And they keep evaluating e.g. the > down(%eax,%ecx,4) expression on every iteration. GCC at the same time > hoists this expression outside of the loop and replaces it with a > simple pointer, as you can see below: > > .LBB2_2: > movl -32(%ebp), %edx > movl 28(%edx), %eax > testl %eax, %eax > je .L5 > > .LBB2_3: > movl -48(%ebp), %eax > movl (%eax), %edi > testl %edi, %edi > je .L5 > > > To make it possible for you to analyze this test-case, I attach the > source file, the BC file and the output of the code produced by LLVM > and by "GCC -O6". > > -Roman > <8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c>
Roman Levenstein
2009-Feb-06 08:43 UTC
[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
Done. Please check these Bugzilla entries: http://llvm.org/bugs/show_bug.cgi?id=3495 (LocalSpiller problems) http://llvm.org/bugs/show_bug.cgi?id=3496 (Loop optimization problems) -Roman 2009/2/6 Evan Cheng <echeng at apple.com>:> Thanks. Can you file bugzilla reports? I'll look at the first one soon. > > Evan > On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote: > >> Hi, >> >> While testing my new register allocators on some test-cases, I've >> noticed that LLVM misses sometimes some optimization opportunities: >> >> 1) LocalSpiller::RewriteMBB seems not to propagate the information >> about e.g. Spills between MBBs.In many cases, where MBB B1 has only >> one predecessor MBB B2, B1 could reuse the information about the >> physical registers that are in the live-out set of B2. This could help >> to e.g. eliminate some useless reloads from spill slots, if the value >> is available on the required physical register already. For example, >> in the example below, the marked "movl 12(%esp), %ecx" instruction >> could be eliminated. >> >> .LBB2_2: # bb31 >> movl 12(%esp), %ecx >> movl 8(%esp), %eax >> cmpl $0, up+28(%eax,%ecx,4) >> je .LBB2_9 # bb569 >> .LBB2_3: # bb41 ; <--- bb31 is the only predecessor of bb41 >> movl 12(%esp), %ecx ; <--- This could be eliminated!!! >> movl 4(%esp), %eax >> cmpl $0, down(%eax,%ecx,4) >> je .LBB2_9 # bb569 >> >> >> It is also worth mentioning, that currently reloads from spill slots >> are not recorded in the Spills set using the addAvailable method, as >> far as I can see. Wouldn't it make sense? >> >> I have the feeling that these improvements are rather easy to achieve >> and would not require too much changes to the LocalSpiller. Probably, >> we just need to keep the live-out set of the MBB around after >> rewriting it, so that its successors can use it in some cases as >> initial value for the Spills set. >> >> Any opinions? >> >> 2) Moving of sub-expressions from loops and replacement of array >> accesses via pointer-based induction variables is also not optimal in >> some situations. >> In the example mentioned above, both blocks are executed inside a >> loop enclosing them. And they keep evaluating e.g. the >> down(%eax,%ecx,4) expression on every iteration. GCC at the same time >> hoists this expression outside of the loop and replaces it with a >> simple pointer, as you can see below: >> >> .LBB2_2: >> movl -32(%ebp), %edx >> movl 28(%edx), %eax >> testl %eax, %eax >> je .L5 >> >> .LBB2_3: >> movl -48(%ebp), %eax >> movl (%eax), %edi >> testl %edi, %edi >> je .L5 >> >> >> To make it possible for you to analyze this test-case, I attach the >> source file, the BC file and the output of the code produced by LLVM >> and by "GCC -O6". >> >> -Roman >> <8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c> > >
Reasonably Related Threads
- [LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
- [LLVMdev] Problems compiling llvm-gcc4 frontend on x86_64
- [LLVMdev] Problems compiling llvm-gcc4 frontend on x86_64
- [LLVMdev] Problems compiling llvm-gcc4 frontend on x86_64
- [LLVMdev] Complicated Remat Question