similar to: [LLVMdev] [PROPOSAL] Improve uses of LEA on Atom

Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom"

2013 Sep 30
0
[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom
Was there any development on this? I noticed that clang still produces a lea for the testcase in llvm.org/pr13320. On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote: > Hi, > > > > Here is an update on our proposal to improve the uses of LEA on Atom > processors. > > > > 1. Disable current generation of LEAs > > > > Due to
2012 Aug 10
0
[LLVMdev] RFC: Adding pass in X86PassConfig::addPreEmitPass for LEA optimization on Atom
Hi, We are getting ready to implement several heuristics for correctly using LEAs to avoid stalls in the address generator on Atom. Our plan is to: 1. Disabling LEA generation on Atom in X86ISelDAGToDAG:: SelectLEAAddr() for all but a few pseudo-instructions 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass and examine several preceding instructions to
2013 Oct 03
2
[LLVMdev] Codegen performance issue: LEA vs. INC.
The two address pass is only concerned about register pressure. It sounds like it should be taught about profitability. In cases where profitability can only be determined with something machinetracemetric then it probably should live it to more sophisticated pass like regalloc. In this case, we probably need a profitability target hook which knows about lea. We should also consider disabling
2013 Oct 05
1
[LLVMdev] Codegen performance issue: LEA vs. INC.
> The lea->cmp problem is fixed by switching to the MI scheduler. Please run with -mllvm -misched-bench to confirm. I get the same output in the testcase in pr13320. The leaq is in between the cmp and the jmp, preventing macro-fusion. Cheers, Rafael
2013 Sep 17
2
[LLVMdev] Codegen performance issue: LEA vs. INC.
Hi all. I'm looking for an advice on how to deal with inefficient code generation for Intel Nehalem/Westmere architecture on 64-bit platform for the attached test.cpp (LLVM IR is in test.cpp.ll). The inner loop has 11 iterations and eventually unrolled. Test.lea.s is the assembly code of the outer loop. It simply has 11 loads, 11 FP add, 11 FP mull, 1 FP store and lea+mov for index
2013 Oct 02
0
[LLVMdev] Codegen performance issue: LEA vs. INC.
This sounds like llvm.org/pr13320. On 17 September 2013 18:20, Bader, Aleksey A <aleksey.a.bader at intel.com> wrote: > Hi all. > > > > I’m looking for an advice on how to deal with inefficient code generation > for Intel Nehalem/Westmere architecture on 64-bit platform for the attached > test.cpp (LLVM IR is in test.cpp.ll). > > The inner loop has 11 iterations
2013 Oct 05
0
[LLVMdev] Codegen performance issue: LEA vs. INC.
On Oct 2, 2013, at 11:48 PM, Evan Cheng <evan.cheng at apple.com> wrote: > The two address pass is only concerned about register pressure. It sounds like it should be taught about profitability. In cases where profitability can only be determined with something machinetracemetric then it probably should live it to more sophisticated pass like regalloc. > > In this case, we
2013 Jan 07
4
[LLVMdev] instruction scheduling issue
On 1/7/2013 2:15 PM, Xu Liu wrote: > > This would be ideal. How can I do the instrumentation pass after the > instruction scheduling? You could derive your own class from TargetPassConfig, and add the annotation pass in YourDerivedTargetPassConfig::addPreEmitPass. This will add your annotation pass very late, just before the final code is emitted. If you're using the X86 target,
2014 Jan 22
2
[LLVMdev] How to force a MachineFunctionPass to be the last one ?
On Jan 21, 2014, at 3:20 PM, Andrew Trick <atrick at apple.com> wrote: > > On Jan 21, 2014, at 2:20 PM, sebastien riou <matic at nimp.co.uk> wrote: > >> Hi, >> >> I would like to execute a MachineFunctionPass after all other passes >> which modify the machine code. >> In other words, if we call llc to generate assembly file, that pass >>
2012 Jun 24
1
how to find out lea instruction causes skype crash when starting
Hi david, I find you signed off a patch about "x86: emulate lea with two register operands correctly" .In this patch,you described skype does a lea instruction and will crash when starting if it does not get the exception.I have used a tool named mentorKG.exe to make a LICENSE.TXT.but the software crashes when starting.I used your patch,and find it works well.I wonder how can you
2013 Jan 07
2
[LLVMdev] instruction scheduling issue
On 1/7/2013 1:53 PM, Sergei Larin wrote: > > Also, how much performance are you willing to sacrifice to do what you > do? Maybe turning off scheduling all together is an acceptable solution? Or insert the calls after scheduling. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
2013 Jan 07
0
[LLVMdev] instruction scheduling issue
Krzysztof, This would be ideal. How can I do the instrumentation pass after the instruction scheduling? Xu Liu Quoting Krzysztof Parzyszek <kparzysz at codeaurora.org>: > On 1/7/2013 1:53 PM, Sergei Larin wrote: >> >> Also, how much performance are you willing to sacrifice to do what you >> do? Maybe turning off scheduling all together is an acceptable solution?
2014 Jan 21
2
[LLVMdev] How to force a MachineFunctionPass to be the last one ?
Hi, I would like to execute a MachineFunctionPass after all other passes which modify the machine code. In other words, if we call llc to generate assembly file, that pass should run right before the "Assembly Printer" pass. Is there any official way to enforce this ? Best regards, Sebastien
2013 Apr 04
1
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Thanks, that did it! Are there any plans to enable the loop vectorizer by default? From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Wednesday, April 03, 2013 13:33 PM To: Nowicki, Tyler Cc: LLVM Developers Mailing List Subject: Re: Packed instructions generaetd by LoopVectorize? Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating
2013 Feb 21
2
[LLVMdev] Generate scalar SSE instructions instead of packed instructions
On Thu, Feb 21, 2013 at 12:14 PM, Nadav Rotem <nrotem at apple.com> wrote: > You can change the input LLVM-IR. > > On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> > wrote: > > Hi,**** > > ** ** > > I am interested in evaluating the performance of packed vs scalar > double-precision floating point instructions on
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi, I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are generated when input arrays are integer, but not when they are float or double. If I modify the float example in http://llvm.org/docs/Vectorizers.html by adding restrict to the input arrays packed instructions are generated. Although it should not be required I tried
2012 Jun 27
2
[LLVMdev] 8-bit DIV IR irregularities
Hi, I noticed that when dividing with signed 8-bit values the IR uses a 32-bit signed divide, however, when unsigned 8-bit values are used the IR uses an 8-bit unsigned divide. Why not use a 8-bit signed divide when using 8-bit signed values? Here is the C code and IR: char idiv8(char a, char b) { char c = a / b; return c; } define signext i8 @idiv8(i8 signext %a, i8 signext %b) nounwind
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
Hi Tyler, Try adding -ffast-math. We can only vectorize reduction variables if it is safe to reorder floating point operations. Thanks, Nadav On Apr 3, 2013, at 10:29 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> wrote: > Hi, > > I have a question about LoopVectorize. I wrote a simple test case, a dot product loop and found that packed instructions are
2012 Jun 28
2
[LLVMdev] 8-bit DIV IR irregularities
I understand, but this sounds like legalization. Does every architecture trigger an overflow exception, as opposed to setting a bit? Perhaps it makes more sense to do this in the backends that trigger an overflow exception? I'm working on a modification for DIV right now in the x86 backend for Intel Atom that will improve performance, however because the *actual* operation has been replaced
2013 Feb 21
2
[LLVMdev] Generate scalar SSE instructions instead of packed instructions
Hi, I am interested in evaluating the performance of packed vs scalar double-precision floating point instructions on x86-atom and I was wondering if anyone knows more precisely where to modify llvm to use one or the other. I know I probably need to change something in the type legalizer. Could anyone provide more details than that? Thanks, Tyler -------------- next part -------------- An HTML