Hi, Here is an update on our proposal to improve the uses of LEA on Atom processors. 1. Disable current generation of LEAs Due to a 3 cycle stall between the ALU and the AGU any address generation done using math instruction will cause a stall on loads and stores which are within 3 cycles of the address generation. Consequently, the heuristics for using LEAs efficiently must know how many cycles pass between the address generation and its use. However, currently LEAs are inserted before this information is known (ie before register allocation). Part of the attached patch disables the current generation of LEAs. 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass We will use an addPreEmitPass pass, similar to the VZeroUpper pass. For each load/store found we will identify its address and index, and examine previous instructions to identify where they are being generated to identify opportunities for LEAs. 3. Replacing instructions with LEAs Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or sub/imm, will be replaced with a single LEA. This will potentially reduce the number of registers in use, however, because this pass follows register allocation it will not affect instruction scheduling. Attached is an incomplete patch with test cases that disables current LEA generation and includes an empty pre-emit pass that will contain the LEA selection heuristics. Any feedback you may have on this updated plan is welcome. Sincerely, Tyler Nowicki Intel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120928/cfbf8bf3/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: UpdatedProposalPatch-svn.patch Type: application/octet-stream Size: 19061 bytes Desc: UpdatedProposalPatch-svn.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120928/cfbf8bf3/attachment.obj>
Was there any development on this? I noticed that clang still produces a lea for the testcase in llvm.org/pr13320. On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote:> Hi, > > > > Here is an update on our proposal to improve the uses of LEA on Atom > processors. > > > > 1. Disable current generation of LEAs > > > > Due to a 3 cycle stall between the ALU and the AGU any address generation > done using math instruction will cause a stall on loads and stores which are > within 3 cycles of the address generation. Consequently, the heuristics for > using LEAs efficiently must know how many cycles pass between the address > generation and its use. However, currently LEAs are inserted before this > information is known (ie before register allocation). Part of the attached > patch disables the current generation of LEAs. > > > > 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass > > > > We will use an addPreEmitPass pass, similar to the VZeroUpper pass. For each > load/store found we will identify its address and index, and examine > previous instructions to identify where they are being generated to identify > opportunities for LEAs. > > > > 3. Replacing instructions with LEAs > > > > Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or > sub/imm, will be replaced with a single LEA. This will potentially reduce > the number of registers in use, however, because this pass follows register > allocation it will not affect instruction scheduling. > > > > Attached is an incomplete patch with test cases that disables current LEA > generation and includes an empty pre-emit pass that will contain the LEA > selection heuristics. > > > > Any feedback you may have on this updated plan is welcome. > > > > Sincerely, > > > > Tyler Nowicki > > Intel > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Thanks for the reminder! The work which we did on fixing up LEAs focused on converting instructions to LEAs after register allocation on Atom. Given the way that the X86 code generator generates LEA instructions, the performance improvement requested by PR13320 might best be done as a peephole optimization after register allocation. We have now added this issue to our backlog of work to do, but I cannot hazard a guess as to when the issue would be addressed. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Rafael EspĂndola Sent: Monday, September 30, 2013 12:17 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] [PROPOSAL] Improve uses of LEA on Atom Was there any development on this? I noticed that clang still produces a lea for the testcase in llvm.org/pr13320. On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote:> Hi, > > > > Here is an update on our proposal to improve the uses of LEA on Atom > processors. > > > > 1. Disable current generation of LEAs > > > > Due to a 3 cycle stall between the ALU and the AGU any address > generation done using math instruction will cause a stall on loads and > stores which are within 3 cycles of the address generation. > Consequently, the heuristics for using LEAs efficiently must know how > many cycles pass between the address generation and its use. However, > currently LEAs are inserted before this information is known (ie > before register allocation). Part of the attached patch disables the current generation of LEAs. > > > > 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass > > > > We will use an addPreEmitPass pass, similar to the VZeroUpper pass. > For each load/store found we will identify its address and index, and > examine previous instructions to identify where they are being > generated to identify opportunities for LEAs. > > > > 3. Replacing instructions with LEAs > > > > Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or > sub/imm, will be replaced with a single LEA. This will potentially > reduce the number of registers in use, however, because this pass > follows register allocation it will not affect instruction scheduling. > > > > Attached is an incomplete patch with test cases that disables current > LEA generation and includes an empty pre-emit pass that will contain > the LEA selection heuristics. > > > > Any feedback you may have on this updated plan is welcome. > > > > Sincerely, > > > > Tyler Nowicki > > Intel > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reasonably Related Threads
- [LLVMdev] [PROPOSAL] Improve uses of LEA on Atom
- [LLVMdev] RFC: Adding pass in X86PassConfig::addPreEmitPass for LEA optimization on Atom
- [LLVMdev] Codegen performance issue: LEA vs. INC.
- [LLVMdev] Codegen performance issue: LEA vs. INC.
- [LLVMdev] Codegen performance issue: LEA vs. INC.