thr3ads.net - similar to: "LoopStrengthReduce.cpp"

Displaying 20 results from an estimated 5000 matches similar to: "LoopStrengthReduce.cpp"

2016 Mar 29

LoopStrengthReduce.cpp

Hi Jonas, Are you talking specifically about the induction variable? You might look at what I did for PowerPC's counter-based loops (lib/Target/PowerPC/PPCCTRLoops.cpp, etc.). -Hal ----- Original Message ----- > From: "Jonas Paulsson via llvm-dev" <llvm-dev at lists.llvm.org> > To: "llvm-dev" <llvm-dev at lists.llvm.org> > Sent: Monday, March 28,

LoopStrengthReduce.cpp

2016 Mar 29

LoopStrengthReduce.cpp

Hi Hal, yes, it's all about the induction variable. SystemZ has a late pass (pre-emit) that looks for MI sequences that can be rewritten to 'branch on count'. Currently only about half the number of BRCTs are output compared to gcc on the same benchmarks. One reason for this is that when a loop gets unrolled, the loop gets a greater increment / decrement than 1, which makes the

LoopStrengthReduce.cpp

2016 Mar 29

LoopStrengthReduce.cpp

On 3/29/2016 3:05 AM, Jonas Paulsson via llvm-dev wrote: > Could this be done somehow, or is it really so that all targets have to > have their own passes to do this? In the Hexagon backend we also have a separate pass that converts compare+branch loops into hardware loops. We recognize several different patterns of the controlling induction variable, including cases where the increment

LoopStrengthReduce.cpp

2016 Mar 31

LoopStrengthReduce.cpp

> On that note, I think that in general it would be useful to have some > target-independent (CodeGen) pass that would do the majority of the > work for hardware loop generation. I have thought about it, but I > won't be able to do anything in the short term. > > -Krzysztof > I think a first and useful step would be to let targets optionally have the loop induction

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 17

RFC for a design change in LoopStrengthReduce / ScalarEvolution

This is related to an issue in loop strength reduction [1] that I've been trying to fix on and off for a while. [1] has a more detailed description of the issue and an example, but briefly put, I want LSR to consider formulae that have "Zext T" as base and/or scale registers, and to appropriately rate such formulae. My first attempt[2] at fixing this was buggy and had to be

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 17

RFC for a design change in LoopStrengthReduce / ScalarEvolution

> I don't understand why you want to factor out the information, > exactly. It seems like what you need is a function like: > > unsigned getMinLeadingZeros(const SCEV *); > > then, if you want to get the non-extended expression, you can just > apply an appropriate truncation. I assume, however, that I'm missing > something. The problem is not about how to codegen

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 17

RFC for a design change in LoopStrengthReduce / ScalarEvolution

> To back up for a second, how much of this is self-inflicted damage? > IndVarSimplify likes to preemptively widen induction variables. Is > that why you have the extensions here in the first place? In the specific example I was talking about the zext came from our frontend (our FE used to insert these extensions for reasons that are no longer relevant). But you can easily get the same

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 18

RFC for a design change in LoopStrengthReduce / ScalarEvolution

> Of course, and the point is that, for example, on x86_64, the zext here is free. I'm still trying to understand the problem... > > In the example you provided in your previous e-mail, we choose the solution: > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 0,+,1}` > `V` -> `trunc({i64 0,+,1}) + VStart` > > instead of the actually-better

Loop Distribution pass

2018 Sep 13

Loop Distribution pass

Hi, I found with the help of the optimization remarks a loop that could not be vectorized, but if loop distribution was enabled this may happen, which it in fact did with a very significant benchmark improvement (~25%). I tried (on SystemZ) to enable this pass, and found that it only affected a handful of files on SPEC. This means I could enable this without worrying about any regressions on

LoopVectorizer -- generating bad and unhandled shufflevector sequence

2016 Oct 06

LoopVectorizer -- generating bad and unhandled shufflevector sequence

Hi, I have experimented with enabling the LoopVectorizer for SystemZ. I have come across a loop which, when vectorized, seems to have been poorly generated. In short, there seems to be a completely unnecessary sequence of shufflevector instructions, that doesn't get optimized away anywhere. In other words, there is a shuffling so that leads back to the original vector: [0 1 2 3

How to best deal with undesirable Induction Variable Simplification?

2019 Aug 09

How to best deal with undesirable Induction Variable Simplification?

Hi Hal, I see. So LSR could theoretically counteract undesirable Ind Var transformations but it's not implemented at the moment? I think I've managed to come up with a small reproducer that can also exhibit similar problem on x86, here it is: https://godbolt.org/z/_wxzut As you can see, when rewriteLoopExitValues is not disabled Clang generates worse code due to additional spills,

phys reg liveness during foldMemoryOperandImpl()

2016 Apr 27

phys reg liveness during foldMemoryOperandImpl()

I would expect that it shouldn't be too hard to pass around a reference to LiveIntervalAnalysis*. Patches welcome :) - Matthias > On Apr 27, 2016, at 11:38 AM, Jonas Paulsson via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > ping. > > Either this can be implemented easily, or the current SystemZ optimization LAY -> AGSI in foldMemoryOperandImpl() should be

phys reg liveness during foldMemoryOperandImpl()

2016 Apr 15

phys reg liveness during foldMemoryOperandImpl()

Hi, I wonder if it would be possible to extend foldMemoryOperandImp() so that targets can check for liveness of a particular phys reg? The case I am thinking of is when the new instruction clobbers the CC reg, while the old one did not. In this case the new instruction can only become a replacement if the CC reg is known to be dead. The idea is that liveness of phys regs should be available

callee saved regs list

2017 Aug 17

callee saved regs list

Hi, It has been discovered recently that it is needed for the SystemZ backend to add super-regs to the callee saved regs list like: def CSR_SystemZ : CalleeSavedRegs<(add (sequence "R%dD", 6, 15), - (sequence "F%dD", 8, 15))>; + [R6Q, R8Q, R10Q, R12Q, R14Q], +

live-in lists during register allocation

2019 Jun 19

live-in lists during register allocation

Hi, I wonder if live-in lists can be trusted to be accurate during register allocation / foldMemoryOperandImp(). On SystemZ, a compare register-register which has one of the registers spilled can fold that reload into a compare register-memory instruction. In order to do this also with the first (LHS) register, the operands must be swapped. This can only reasonably be done when all the CC

Branches which return values in SelectionDAG

2020 Aug 07

Branches which return values in SelectionDAG

Hi all, I am working on modeling an instruction similar to SystemZ's 'BRCT', which takes a register, decrements it, and branches if the register is nonzero. I saw that the LLVM backend for SystemZ generates the instruction in a MachineFunctionPass as part of a pass intended to eliminate or combine compares. I then looked at ARM, where it uses the HardwareLoops pass first, and then a

SCEV and LoopStrengthReduction Formulae

2018 Apr 03

SCEV and LoopStrengthReduction Formulae

I am attempting to implement a minor loop strength reduction optimization for targets that support compare and jump fusion, specifically TTI::canMacroFuseCmp(). My approach might be wrong; however, I am soliciting the idea for feedback, so that I can implement this correctly. My plan is to add a Supplemental LSR formula to LoopStrengthReduce.cpp that optimizes the following case, but perhaps

mischeduler (pre-RA) experiments

2017 Nov 23

mischeduler (pre-RA) experiments

Hi, I have been experimenting for a while with tryCandidate() method of the pre-RA mischeduler. I have by chance found some parameters that give quite good results on benchmarks on SystemZ (on average 1% improvement, some improvements of several percent and very little regressions). Basically, I add a "latency heuristic boost" just above processor resources checking:

How to best deal with undesirable Induction Variable Simplification?

2019 Aug 13

How to best deal with undesirable Induction Variable Simplification?

I've noticed that there was an attempt to mitigate ExitValues problem in https://reviews.llvm.org/D12494 that went nowhere. Were there particular issues with that approach? -- Danila From: Philip Reames [mailto:listmail at philipreames.com] Sent: Saturday, August 10, 2019 02:05 To: Danila Malyutin <Danila.Malyutin at synopsys.com>; Finkel, Hal J. <hfinkel at anl.gov> Cc: llvm-dev

[MachineScheduler] Question about IssueWidth / NumMicroOps

2018 May 15

[MachineScheduler] Question about IssueWidth / NumMicroOps

Hi Andy, >> Right now it seems that BeginGroup/EndGroup is only used by SystemZ, >> or? I see they are used in checkHazard(), which I actually don't see >> as helpful during pre-RA scheduling for SystemZ. Could this be made >> optional, or perhaps only done post-RA if target does post-RA >> scheduling? SystemZ does post-RA scheduling to manage decoder

similar to: LoopStrengthReduce.cpp