thr3ads.net - llvm dev - [LLVMdev] [PROPOSAL] Improve uses of LEA on Atom [Sep 2013]

If this information is useful, please help other people find it:
Share via:

Nowicki, Tyler

2012-Sep-28 15:36 UTC

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Hi,

Here is an update on our proposal to improve the uses of LEA on Atom processors.

1. Disable current generation of LEAs

Due to a 3 cycle stall between the ALU and the AGU any address generation done
using math instruction will cause a stall on loads and stores which are within 3
cycles of the address generation. Consequently, the heuristics for using LEAs
efficiently must know how many cycles pass between the address generation and
its use. However, currently LEAs are inserted before this information is known
(ie before register allocation). Part of the attached patch disables the current
generation of LEAs.

2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass

We will use an addPreEmitPass pass, similar to the VZeroUpper pass. For each
load/store found we will identify its address and index, and examine previous
instructions to identify where they are being generated to identify
opportunities for LEAs.

3. Replacing instructions with LEAs

Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or sub/imm,
will be replaced with a single LEA. This will potentially reduce the number of
registers in use, however, because this pass follows register allocation it will
not affect instruction scheduling.

Attached is an incomplete patch with test cases that disables current LEA
generation and includes an empty pre-emit pass that will contain the LEA
selection heuristics.

Any feedback you may have on this updated plan is welcome.

Sincerely,

Tyler Nowicki
Intel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120928/cfbf8bf3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: UpdatedProposalPatch-svn.patch
Type: application/octet-stream
Size: 19061 bytes
Desc: UpdatedProposalPatch-svn.patch
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20120928/cfbf8bf3/attachment.obj>

Rafael Espíndola

2013-Sep-30 16:17 UTC

head link

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Was there any development on this? I noticed that clang still produces
a lea for the testcase in llvm.org/pr13320.

On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com>
wrote:> Hi,
>
>
>
> Here is an update on our proposal to improve the uses of LEA on Atom
> processors.
>
>
>
> 1. Disable current generation of LEAs
>
>
>
> Due to a 3 cycle stall between the ALU and the AGU any address generation
> done using math instruction will cause a stall on loads and stores which
are
> within 3 cycles of the address generation. Consequently, the heuristics for
> using LEAs efficiently must know how many cycles pass between the address
> generation and its use. However, currently LEAs are inserted before this
> information is known (ie before register allocation). Part of the attached
> patch disables the current generation of LEAs.
>
>
>
> 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass
>
>
>
> We will use an addPreEmitPass pass, similar to the VZeroUpper pass. For
each
> load/store found we will identify its address and index, and examine
> previous instructions to identify where they are being generated to
identify
> opportunities for LEAs.
>
>
>
> 3. Replacing instructions with LEAs
>
>
>
> Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or
> sub/imm, will be replaced with a single LEA. This will potentially reduce
> the number of registers in use, however, because this pass follows register
> allocation it will not affect instruction scheduling.
>
>
>
> Attached is an incomplete patch with test cases that disables current LEA
> generation and includes an empty pre-emit pass that will contain the LEA
> selection heuristics.
>
>
>
> Any feedback you may have on this updated plan is welcome.
>
>
>
> Sincerely,
>
>
>
> Tyler Nowicki
>
> Intel
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

Gurd, Preston

2013-Oct-01 21:53 UTC

head link

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Thanks for the reminder!

The work which we did on fixing up LEAs focused on converting instructions to
LEAs after register allocation on Atom.

Given the way that the X86 code generator generates LEA instructions, the
performance improvement requested by PR13320 might best be done as a peephole
optimization after register allocation.

We have now added this issue to our backlog of work to do, but I cannot hazard a
guess as to when the issue would be addressed.

-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Rafael Espíndola
Sent: Monday, September 30, 2013 12:17 PM
To: Nowicki, Tyler
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Was there any development on this? I noticed that clang still produces a lea for
the testcase in llvm.org/pr13320.

On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com>
wrote:> Hi,
>
>
>
> Here is an update on our proposal to improve the uses of LEA on Atom 
> processors.
>
>
>
> 1. Disable current generation of LEAs
>
>
>
> Due to a 3 cycle stall between the ALU and the AGU any address 
> generation done using math instruction will cause a stall on loads and 
> stores which are within 3 cycles of the address generation. 
> Consequently, the heuristics for using LEAs efficiently must know how 
> many cycles pass between the address generation and its use. However, 
> currently LEAs are inserted before this information is known (ie 
> before register allocation). Part of the attached patch disables the
current generation of LEAs.
>
>
>
> 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass
>
>
>
> We will use an addPreEmitPass pass, similar to the VZeroUpper pass. 
> For each load/store found we will identify its address and index, and 
> examine previous instructions to identify where they are being 
> generated to identify opportunities for LEAs.
>
>
>
> 3. Replacing instructions with LEAs
>
>
>
> Instructions such as add/{reg,imm}, add/{reg,imm}+shift/{reg,imm}, or 
> sub/imm, will be replaced with a single LEA. This will potentially 
> reduce the number of registers in use, however, because this pass 
> follows register allocation it will not affect instruction scheduling.
>
>
>
> Attached is an incomplete patch with test cases that disables current 
> LEA generation and includes an empty pre-emit pass that will contain 
> the LEA selection heuristics.
>
>
>
> Any feedback you may have on this updated plan is welcome.
>
>
>
> Sincerely,
>
>
>
> Tyler Nowicki
>
> Intel
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Sep 2013 - [LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Possibly Parallel Threads