thr3ads.net - similar to: "[RFC] New pass: LoopExitValues"

Displaying 20 results from an estimated 800 matches similar to: "[RFC] New pass: LoopExitValues"

2015 Sep 01

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

[LLVMdev] trunk's optimizer generates slower code than 3.5

I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current

[RFC] New pass: LoopExitValues

2015 Sep 23

[RFC] New pass: LoopExitValues

On Wed, Sep 23, 2015 at 12:00 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> >> Should we try the patch in it's current location, namely after LSR? > > Sure; post the patch as you have it so we can look at what's going on. > http://reviews.llvm.org/D12494 One particular point: The algorithm checks that SCEV's are equal when their raw pointers are equal. Is

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

Tail call optimization is getting affected due to local function related optimization with IPRA

Hello LLVM Community, To improve Interprocedural Register Allocation (IPRA) we are trying to force caller saved registers for local functions (which has likage type local). To achive it I have modified TargetFrameLowering::determineCalleeSaves() to return early for function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken()) and also reflecting the fact that for local

[RFC] New pass: LoopExitValues

2015 Sep 11

[RFC] New pass: LoopExitValues

Hi Steve it seems the general consensus is that the patch feels like a work-around for a problem with LSR (and possibly other loop transformations) that introduces redundant instructions. It is probably best to file a bug and a few of your test cases. Thanks Gerolf > On Sep 10, 2015, at 4:37 PM, Steve King via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On Thu, Sep 10, 2015

[LLVMdev] bug in X86 disasm code?

2013 Sep 12

[LLVMdev] bug in X86 disasm code?

hi, i found this code in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \

2019 Dec 23

Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott

2020 Jan 10

Hi Scott, Sorry for the late reply, I was out of office during the holidays. 1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed. 2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 26

Tail call optimization is getting affected due to local function related optimization with IPRA

According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section, it seems that adding a new CC for the purpose of local function optimization seems a good idea because tail call optimization only takes place when both caller and callee have fastcc or GHC or HiPE calling convention. -Vivek On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com> wrote: >

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: > > bool isEligibleForTailCallOptimization(Function *F) { > CallingConv::ID CC =

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

Tail call optimization is getting affected due to local function related optimization with IPRA

On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > Hello LLVM Community, > > To improve Interprocedural Register Allocation (IPRA) we are trying to > force caller > saved registers for local functions (which has likage type local). To > achive it > I have modified TargetFrameLowering::determineCalleeSaves() to return > early for >

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 27

Tail call optimization is getting affected due to local function related optimization with IPRA

Hello , To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: bool isEligibleForTailCallOptimization(Function *F) { CallingConv::ID CC = F->getCallingConv(); if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC == CallingConv::HiPE) return true; return false;

2019 Nov 08

Do you know whether it has been fixed on the 8.0.1 release? Scott On Fri, Nov 8, 2019 at 9:45 AM Krzysztof Parzyszek <kparzysz at quicinc.com<mailto:kparzysz at quicinc.com>> wrote: The one blocking issue that existed in the past has been fixed. I haven’t had time to do any work on it lately, but I’m not aware of any fundamental problems that would make it not work on x86. --

[RFC] New pass: LoopExitValues

2015 Sep 23

[RFC] New pass: LoopExitValues

On Mon, Sep 21, 2015 at 11:13 AM, Wei Mi <wmi at google.com> wrote: > Maybe it can follow the "Delete dead loops" pass which is after > Induction Variable Simplification pass, so it will not affect existing > exitvalue rewriting optimization in Induction Variable Simplification > to find out and delete dead loops? > > Existing pass pipeline: >

[RFC] New pass: LoopExitValues

2015 Sep 03

[RFC] New pass: LoopExitValues

On Wed, Sep 2, 2015 at 5:36 AM, James Molloy <james at jamesmolloy.co.uk> wrote: > Hi, > > Coremark really isn't a good enough test - have you run the LLVM test suite > with this patch, and what were the performance differences? For the test suite single source benches, the 235 tests improved performance, 2 regressed and 705 were unchanged. That seems very optimistic.

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call > optimization over local function related optimization in IPRA. I have added > following method to achieve this: > >

[RFC] New pass: LoopExitValues

2015 Sep 26

[RFC] New pass: LoopExitValues

Hi Steve, Do you primarily find this to help for nested loops? If so, that could be because LSR explicitly bails out of processing them: // Skip nested loops until we can model them better with formulae. if (!L->empty()) { DEBUG(dbgs() << "LSR skipping outer loop " << *L << "n"); return; } I don't know how much time you're

similar to: [RFC] New pass: LoopExitValues