search for: hardwareloop

Displaying 14 results from an estimated 14 matches for "hardwareloop".

2020 May 01
3
LV: predication
...r querying TTI that the backend understands this intrinsic and that it should be emitted for that loop. The vectoriser patch is available in D79100, and we pick this intrinsic up in the ARM backend here in D79175. Context: We are working on predication form that we call tail-predication: a vector hardwareloop has an implicit form of predication that sets active/inactive lanes for the last iteration of the vector loop. Thus, the scalar epilogue loop (if there is one) is tail-folded and tail-predicated in the main vector body. And to support this, we need to know the number of data elements processed by t...
2020 Aug 07
2
Branches which return values in SelectionDAG
...temZ's 'BRCT', which takes a register, decrements it, and branches if the register is nonzero. I saw that the LLVM backend for SystemZ generates the instruction in a MachineFunctionPass as part of a pass intended to eliminate or combine compares. I then looked at ARM, where it uses the HardwareLoops pass first, and then a combine that occurs in the ARM ISel stage. It replaces branch instructions with special 'WLS' and 'LE' nodes that are custom selected into t2WhileLoopStart and t2LoopEnd pseudo instructions with isBranch and isTerminator set. These pseudo instructions are fin...
2020 May 01
5
LV: predication
...iddle of nowhere. I do see that point. But is that also not the beauty of it? It just sits in the preheader, if gets removed, then so be it. And if it not recognised, then also no harm done? > Probably the simplest path to get this working is to derive the number of elements in the backend (in HardwareLoops, or your tail predication pass). You should be able to figure it from the masks used in the llvm.masked.load/store instructions in the loop. This is what we are currently doing and works excellent for simpler cases. For the more complicated cases that we now what to handle as well, the pattern ma...
2020 May 20
2
LV: predication
...e of nowhere. I do see that point. But is that also not the beauty of it? It just sits in the preheader, if gets removed, then so be it. And if it not recognised, then also no harm done? > Probably the simplest path to get this working is to derive the number of elements in the backend (in HardwareLoops, or your tail predication pass). You should be able to figure it from the masks used in the llvm.masked.load/store instructions in the loop. This is what we are currently doing and works excellent for simpler cases. For the more complicated cases that we now what to handle as well, the pattern...
2020 May 21
2
LV: predication
...e of nowhere. I do see that point. But is that also not the beauty of it? It just sits in the preheader, if gets removed, then so be it. And if it not recognised, then also no harm done? > Probably the simplest path to get this working is to derive the number of elements in the backend (in HardwareLoops, or your tail predication pass). You should be able to figure it from the masks used in the llvm.masked.load/store instructions in the loop. This is what we are currently doing and works excellent for simpler cases. For the more complicated cases that we now what to handle as well, the pattern...
2020 Sep 09
2
[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
...; https://reviews.llvm.org/D53005, but it seems to have gone stale. I’m > also curious about customizing the expander, since we too have some ways to > make the prolog and epilog more efficient. > > > > My current solution hides the PHI and IndVar update instructions added by > HardwareLoops inside the branch and rematerializes them after the stock > MachinePipeliner runs. Not having to do that would be great, but now that > it’s implemented, I think I can stop bothering you guys for historical > data! J > > > > Thanks again! > > > > JB > > >...
2020 Sep 07
2
[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
...ipeliner was worth it (the PeelingScheduleExpander is much easier to reason about, IMHO). Cheers, James On Thu, 3 Sep 2020 at 20:41, Nagurne, James <j-nagurne at ti.com> wrote: > We have that behaves similarly to yours in this regard it seems. > Specifically, our target utilizes the HardwareLoop pass with CounterInReg > true, and then treats loops augmented by this pass as software pipeline > candidates. It seems PPC does this as well, but has CounterInReg false. Our > loop body ends up looking like this (in mildly simplified pseudocode): > > > > %body: > > *...
2020 May 19
2
LV: predication
...The vectoriser tail-folds the loop, and creates masked load/stores; so existing functionality, and nothing has changed here. The generic hardware loop codegen pass inserts hardware loop intrinsics. Very late in the pipeline, e.g. in the PPC and ARM backends, this is picked and turned into an actual hardwareloop, in our case possibly predicated, or it is reverted. Thanks for explaining it (possibly once again) I wasn't aware that this will also be used for PPC. Point 3) still stands. > What will you do if there are no masked intrinsics in the hwloop body? Nothing. I.e., it can become a hardware lo...
2019 Jul 11
4
llvm.set.loop.iterations
After playing a bit with the newly introduced hardware loop framework I realize that the llvm.set.loop.iterations intrinsic takes as argument the number of iterations the loop will execute. In fact it goes all the way to, on IR, insert an addition of constant 1 to the number of taken backedges returned by SCEV. If the machine instruction realizing the loop is interested in the number of
2020 May 18
2
LV: predication
...The vectoriser tail-folds the loop, and creates masked load/stores; so existing functionality, and nothing has changed here. The generic hardware loop codegen pass inserts hardware loop intrinsics. Very late in the pipeline, e.g. in the PPC and ARM backends, this is picked and turned into an actual hardwareloop, in our case possibly predicated, or it is reverted. > What will you do if there are no masked intrinsics in the hwloop body? Nothing. I.e., it can become a hardware loop, but not one with implicit predication. > And i am curious why couldn't you use the %evl parameter of VP intrinsics...
2020 May 04
3
LV: predication
...preheader, if gets removed, then so be it. And if it not recognised, then also no harm done? The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop. > Probably the simplest path to get this working is to derive the number of elements in the backend (in HardwareLoops, or your tail predication pass). You should be able to figure it from the masks used in the llvm.masked.load/store instructions in the loop. This is what we are currently doing and works excellent for simpler cases. For the more complicated cases that we now what to handle as well, the pattern...
2020 May 19
3
LV: predication
...The vectoriser tail-folds the loop, and creates masked load/stores; so existing functionality, and nothing has changed here. The generic hardware loop codegen pass inserts hardware loop intrinsics. Very late in the pipeline, e.g. in the PPC and ARM backends, this is picked and turned into an actual hardwareloop, in our case possibly predicated, or it is reverted. Thanks for explaining it (possibly once again) I wasn't aware that this will also be used for PPC. Point 3) still stands. > What will you do if there are no masked intrinsics in the hwloop body? Nothing. I.e., it can become a hardware lo...
2020 May 18
2
LV: predication
Hi, I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this. Cheers. ________________________________ From: Simon Moll <Simon.Moll at EMEA.NEC.COM> Sent: 18 May 2020 13:32 To: Sjoerd Meijer <Sjoerd.Meijer at
2020 Sep 03
1
[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
Hi James, Adding Hendrik, who has taken over ownership of the downstream code involved. I can also add background about the rationale, of that helps? It was added to ignore induction variable update code (scalar code) that is rewritten when we unroll / peel the prolog epilog anyway. Targets like Hexagon or PPC with dedicated loop control instructions for pipelined loops don't need this, but