thr3ads.net - search: "pr13320"

Displaying 7 results from an estimated 7 matches for "pr13320".

Did you mean: 13320

[LLVMdev] Codegen performance issue: LEA vs. INC.

2013 Oct 05

[LLVMdev] Codegen performance issue: LEA vs. INC.

> The lea->cmp problem is fixed by switching to the MI scheduler. Please run with -mllvm -misched-bench to confirm. I get the same output in the testcase in pr13320. The leaq is in between the cmp and the jmp, preventing macro-fusion. Cheers, Rafael

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

2013 Sep 30

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Was there any development on this? I noticed that clang still produces a lea for the testcase in llvm.org/pr13320. On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote: > Hi, > > > > Here is an update on our proposal to improve the uses of LEA on Atom > processors. > > > > 1. Disable current generation of LEAs > > > > Due to a 3 cycle st...

[LLVMdev] Codegen performance issue: LEA vs. INC.

2013 Oct 03

[LLVMdev] Codegen performance issue: LEA vs. INC.

...ility target hook which knows about lea. We should also consider disabling it's dumb pseudo scheduling code when we enable MI scheduler. Evan Sent from my iPad > On Oct 2, 2013, at 8:38 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote: > > This sounds like llvm.org/pr13320. > >> On 17 September 2013 18:20, Bader, Aleksey A <aleksey.a.bader at intel.com> wrote: >> Hi all. >> >> >> >> I’m looking for an advice on how to deal with inefficient code generation >> for Intel Nehalem/Westmere architecture on 64-bit platf...

[LLVMdev] Codegen performance issue: LEA vs. INC.

2013 Oct 05

[LLVMdev] Codegen performance issue: LEA vs. INC.

...esn’t mean that your analysis of the 2-address pass is irrelevant. It’s just that the new pass order happens to work better. MI Scheduler also makes an effort to facilitate macro fusion. But for the record, the 2-address pass heuristics are clearly obsolete. As Rafael pointed out, that’s covered in PR13320. I’m honestly not even sure why we still use inc/dec in x86-64, saving a byte? Long-term plan: ideally, some of the tricks the 2-address pass is doing would be done within the MI scheduler now where we track register pressure precisely and know the final location of instructions. The major hurdle...

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

2012 Sep 28

[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Hi, Here is an update on our proposal to improve the uses of LEA on Atom processors. 1. Disable current generation of LEAs Due to a 3 cycle stall between the ALU and the AGU any address generation done using math instruction will cause a stall on loads and stores which are within 3 cycles of the address generation. Consequently, the heuristics for using LEAs efficiently must know how many

[LLVMdev] Codegen performance issue: LEA vs. INC.

2013 Oct 02

[LLVMdev] Codegen performance issue: LEA vs. INC.

This sounds like llvm.org/pr13320. On 17 September 2013 18:20, Bader, Aleksey A <aleksey.a.bader at intel.com> wrote: > Hi all. > > > > I’m looking for an advice on how to deal with inefficient code generation > for Intel Nehalem/Westmere architecture on 64-bit platform for the attached > test.cpp (LLVM...

[LLVMdev] Codegen performance issue: LEA vs. INC.

2013 Sep 17

[LLVMdev] Codegen performance issue: LEA vs. INC.

Hi all. I'm looking for an advice on how to deal with inefficient code generation for Intel Nehalem/Westmere architecture on 64-bit platform for the attached test.cpp (LLVM IR is in test.cpp.ll). The inner loop has 11 iterations and eventually unrolled. Test.lea.s is the assembly code of the outer loop. It simply has 11 loads, 11 FP add, 11 FP mull, 1 FP store and lea+mov for index

search for: pr13320