Roman Levenstein
2009-Feb-05 16:08 UTC
[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
Hi, While testing my new register allocators on some test-cases, I've noticed that LLVM misses sometimes some optimization opportunities: 1) LocalSpiller::RewriteMBB seems not to propagate the information about e.g. Spills between MBBs.In many cases, where MBB B1 has only one predecessor MBB B2, B1 could reuse the information about the physical registers that are in the live-out set of B2. This could help to e.g. eliminate some useless reloads from spill slots, if the value is available on the required physical register already. For example, in the example below, the marked "movl 12(%esp), %ecx" instruction could be eliminated. .LBB2_2: # bb31 movl 12(%esp), %ecx movl 8(%esp), %eax cmpl $0, up+28(%eax,%ecx,4) je .LBB2_9 # bb569 .LBB2_3: # bb41 ; <--- bb31 is the only predecessor of bb41 movl 12(%esp), %ecx ; <--- This could be eliminated!!! movl 4(%esp), %eax cmpl $0, down(%eax,%ecx,4) je .LBB2_9 # bb569 It is also worth mentioning, that currently reloads from spill slots are not recorded in the Spills set using the addAvailable method, as far as I can see. Wouldn't it make sense? I have the feeling that these improvements are rather easy to achieve and would not require too much changes to the LocalSpiller. Probably, we just need to keep the live-out set of the MBB around after rewriting it, so that its successors can use it in some cases as initial value for the Spills set. Any opinions? 2) Moving of sub-expressions from loops and replacement of array accesses via pointer-based induction variables is also not optimal in some situations. In the example mentioned above, both blocks are executed inside a loop enclosing them. And they keep evaluating e.g. the down(%eax,%ecx,4) expression on every iteration. GCC at the same time hoists this expression outside of the loop and replaces it with a simple pointer, as you can see below: .LBB2_2: movl -32(%ebp), %edx movl 28(%edx), %eax testl %eax, %eax je .L5 .LBB2_3: movl -48(%ebp), %eax movl (%eax), %edi testl %edi, %edi je .L5 To make it possible for you to analyze this test-case, I attach the source file, the BC file and the output of the code produced by LLVM and by "GCC -O6". -Roman -------------- next part -------------- A non-text attachment was scrubbed... Name: 8q_speed.c.s Type: application/octet-stream Size: 10448 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 8q_speed.s.gcc Type: application/octet-stream Size: 12532 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0001.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 8q_speed.c.bc Type: application/octet-stream Size: 4720 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0002.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: 8q_speed.c Type: application/octet-stream Size: 595 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090205/ae02bf82/attachment-0003.obj>
Evan Cheng
2009-Feb-06 06:40 UTC
[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
Thanks. Can you file bugzilla reports? I'll look at the first one soon. Evan On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote:> Hi, > > While testing my new register allocators on some test-cases, I've > noticed that LLVM misses sometimes some optimization opportunities: > > 1) LocalSpiller::RewriteMBB seems not to propagate the information > about e.g. Spills between MBBs.In many cases, where MBB B1 has only > one predecessor MBB B2, B1 could reuse the information about the > physical registers that are in the live-out set of B2. This could help > to e.g. eliminate some useless reloads from spill slots, if the value > is available on the required physical register already. For example, > in the example below, the marked "movl 12(%esp), %ecx" instruction > could be eliminated. > > .LBB2_2: # bb31 > movl 12(%esp), %ecx > movl 8(%esp), %eax > cmpl $0, up+28(%eax,%ecx,4) > je .LBB2_9 # bb569 > .LBB2_3: # bb41 ; <--- bb31 is the only predecessor > of bb41 > movl 12(%esp), %ecx ; <--- This could be eliminated!!! > movl 4(%esp), %eax > cmpl $0, down(%eax,%ecx,4) > je .LBB2_9 # bb569 > > > It is also worth mentioning, that currently reloads from spill slots > are not recorded in the Spills set using the addAvailable method, as > far as I can see. Wouldn't it make sense? > > I have the feeling that these improvements are rather easy to achieve > and would not require too much changes to the LocalSpiller. Probably, > we just need to keep the live-out set of the MBB around after > rewriting it, so that its successors can use it in some cases as > initial value for the Spills set. > > Any opinions? > > 2) Moving of sub-expressions from loops and replacement of array > accesses via pointer-based induction variables is also not optimal in > some situations. > In the example mentioned above, both blocks are executed inside a > loop enclosing them. And they keep evaluating e.g. the > down(%eax,%ecx,4) expression on every iteration. GCC at the same time > hoists this expression outside of the loop and replaces it with a > simple pointer, as you can see below: > > .LBB2_2: > movl -32(%ebp), %edx > movl 28(%edx), %eax > testl %eax, %eax > je .L5 > > .LBB2_3: > movl -48(%ebp), %eax > movl (%eax), %edi > testl %edi, %edi > je .L5 > > > To make it possible for you to analyze this test-case, I attach the > source file, the BC file and the output of the code produced by LLVM > and by "GCC -O6". > > -Roman > <8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c>
Roman Levenstein
2009-Feb-06 08:43 UTC
[LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
Done. Please check these Bugzilla entries: http://llvm.org/bugs/show_bug.cgi?id=3495 (LocalSpiller problems) http://llvm.org/bugs/show_bug.cgi?id=3496 (Loop optimization problems) -Roman 2009/2/6 Evan Cheng <echeng at apple.com>:> Thanks. Can you file bugzilla reports? I'll look at the first one soon. > > Evan > On Feb 5, 2009, at 8:08 AM, Roman Levenstein wrote: > >> Hi, >> >> While testing my new register allocators on some test-cases, I've >> noticed that LLVM misses sometimes some optimization opportunities: >> >> 1) LocalSpiller::RewriteMBB seems not to propagate the information >> about e.g. Spills between MBBs.In many cases, where MBB B1 has only >> one predecessor MBB B2, B1 could reuse the information about the >> physical registers that are in the live-out set of B2. This could help >> to e.g. eliminate some useless reloads from spill slots, if the value >> is available on the required physical register already. For example, >> in the example below, the marked "movl 12(%esp), %ecx" instruction >> could be eliminated. >> >> .LBB2_2: # bb31 >> movl 12(%esp), %ecx >> movl 8(%esp), %eax >> cmpl $0, up+28(%eax,%ecx,4) >> je .LBB2_9 # bb569 >> .LBB2_3: # bb41 ; <--- bb31 is the only predecessor of bb41 >> movl 12(%esp), %ecx ; <--- This could be eliminated!!! >> movl 4(%esp), %eax >> cmpl $0, down(%eax,%ecx,4) >> je .LBB2_9 # bb569 >> >> >> It is also worth mentioning, that currently reloads from spill slots >> are not recorded in the Spills set using the addAvailable method, as >> far as I can see. Wouldn't it make sense? >> >> I have the feeling that these improvements are rather easy to achieve >> and would not require too much changes to the LocalSpiller. Probably, >> we just need to keep the live-out set of the MBB around after >> rewriting it, so that its successors can use it in some cases as >> initial value for the Spills set. >> >> Any opinions? >> >> 2) Moving of sub-expressions from loops and replacement of array >> accesses via pointer-based induction variables is also not optimal in >> some situations. >> In the example mentioned above, both blocks are executed inside a >> loop enclosing them. And they keep evaluating e.g. the >> down(%eax,%ecx,4) expression on every iteration. GCC at the same time >> hoists this expression outside of the loop and replaces it with a >> simple pointer, as you can see below: >> >> .LBB2_2: >> movl -32(%ebp), %edx >> movl 28(%edx), %eax >> testl %eax, %eax >> je .L5 >> >> .LBB2_3: >> movl -48(%ebp), %eax >> movl (%eax), %edi >> testl %edi, %edi >> je .L5 >> >> >> To make it possible for you to analyze this test-case, I attach the >> source file, the BC file and the output of the code produced by LLVM >> and by "GCC -O6". >> >> -Roman >> <8q_speed.c.s><8q_speed.s.gcc><8q_speed.c.bc><8q_speed.c> > >
Possibly Parallel Threads
- [LLVMdev] LLVM misses some cross-MBB and loop optimizations compared to GCC
- [LLVMdev] Problems compiling llvm-gcc4 frontend on x86_64
- [LLVMdev] Problems compiling llvm-gcc4 frontend on x86_64
- [LLVMdev] Problems compiling llvm-gcc4 frontend on x86_64
- [LLVMdev] Complicated Remat Question