Karol Herbst
2017-Apr-03 15:58 UTC
[Nouveau] [PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization opportunities as the result of our own passes. For this we need to fix AlgebraicOpt to be able to handle mods on sources without creating new issues. The last patch enables looping opts. v2: update commit author Karol Herbst (3): nv50/ir: fix AlgebraicOpt for slcts with mods nv50/ir: handle logops with NOT in AlgebraicOpt nv50/ir: run some passes multiple times .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 29 +++++++++++++++------- 1 file changed, 20 insertions(+), 9 deletions(-) -- 2.12.2
Karol Herbst
2017-Apr-03 15:58 UTC
[Nouveau] [PATCH v2 1/3] nv50/ir: fix AlgebraicOpt for slcts with mods
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 4c92a1efb5..bd60a84998 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1797,10 +1797,10 @@ AlgebraicOpt::handleSLCT(Instruction *slct) if (slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f)) slct->setSrc(0, slct->getSrc(1)); } else - if (slct->getSrc(0) != slct->getSrc(1)) { + if (slct->getSrc(0) != slct->getSrc(1) || slct->src(0).mod != slct->src(1).mod) return; - } - slct->op = OP_MOV; + slct->op = slct->src(0).mod.getOp(); + slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op); slct->setSrc(1, NULL); slct->setSrc(2, NULL); } -- 2.12.2
Karol Herbst
2017-Apr-03 15:58 UTC
[Nouveau] [PATCH v2 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt
Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index bd60a84998..0de84fe9fc 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -1856,6 +1856,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop) set0 = cloneForward(func, set0); set1 = cloneShallow(func, set1); + + if (logop->src(0).mod == Modifier(NV50_IR_MOD_NOT)) + set0->asCmp()->setCond = inverseCondCode(set0->asCmp()->setCond); + if (logop->src(1).mod == Modifier(NV50_IR_MOD_NOT)) + set1->asCmp()->setCond = inverseCondCode(set1->asCmp()->setCond); + logop->bb->insertAfter(logop, set1); logop->bb->insertAfter(logop, set0); -- 2.12.2
Karol Herbst
2017-Apr-03 15:58 UTC
[Nouveau] [PATCH v2 3/3] nv50/ir: run some passes multiple times
With the shader cache, compilation time matters less. As a side effect we can write more optimizations to produce better optimized code. total instructions in shared programs : 3931743 -> 3917512 (-0.36%) total gprs used in shared programs : 481460 -> 481680 (0.05%) total local used in shared programs : 27481 -> 26761 (-2.62%) total bytes used in shared programs : 36032672 -> 35902648 (-0.36%) local gpr inst bytes helped 48 133 3843 3843 hurt 1 295 75 75 Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 0de84fe9fc..505de08573 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -3729,12 +3729,17 @@ Program::optimizeSSA(int level) RUN_PASS(1, CopyPropagation, run); RUN_PASS(1, MergeSplits, run); RUN_PASS(2, GlobalCSE, run); - RUN_PASS(1, LocalCSE, run); - RUN_PASS(2, AlgebraicOpt, run); - RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks - RUN_PASS(1, ConstantFolding, foldAll); - RUN_PASS(2, LateAlgebraicOpt, run); - RUN_PASS(1, Split64BitOpPreRA, run); + for (int i = 0; i < 2; ++i) { + RUN_PASS(1, LocalCSE, run); + RUN_PASS(2, AlgebraicOpt, run); + RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks + RUN_PASS(1, ConstantFolding, foldAll); + RUN_PASS(2, LateAlgebraicOpt, run); + // only once + if (i == 0) + RUN_PASS(1, Split64BitOpPreRA, run); + RUN_PASS(1, DeadCodeElim, buryAll); + } RUN_PASS(1, LoadPropagation, run); RUN_PASS(1, IndirectPropagation, run); RUN_PASS(2, MemoryOpt, run); -- 2.12.2
Ilia Mirkin
2017-Apr-09 17:34 UTC
[Nouveau] [PATCH v2 1/3] nv50/ir: fix AlgebraicOpt for slcts with mods
On Mon, Apr 3, 2017 at 11:58 AM, Karol Herbst <karolherbst at gmail.com> wrote:> Signed-off-by: Karol Herbst <karolherbst at gmail.com> > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 4c92a1efb5..bd60a84998 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -1797,10 +1797,10 @@ AlgebraicOpt::handleSLCT(Instruction *slct) > if (slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f)) > slct->setSrc(0, slct->getSrc(1)); > } else > - if (slct->getSrc(0) != slct->getSrc(1)) { > + if (slct->getSrc(0) != slct->getSrc(1) || slct->src(0).mod != slct->src(1).mod)SLCT can't have mods on src0/src1. Only on src2. I'd be just as happy to assert that they're both == 0 here. You can also add a helper to ValueRef to see if it's == to another ValueRef, which compares both the Value ptr as well as any modifiers, indirects, etc. But it again doesn't ultimately need to be used here.> return; > - } > - slct->op = OP_MOV; > + slct->op = slct->src(0).mod.getOp(); > + slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op); > slct->setSrc(1, NULL); > slct->setSrc(2, NULL); > } > -- > 2.12.2 >
Ilia Mirkin
2017-Apr-09 17:36 UTC
[Nouveau] [PATCH v2 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt
On Mon, Apr 3, 2017 at 11:58 AM, Karol Herbst <karolherbst at gmail.com> wrote:> Signed-off-by: Karol Herbst <karolherbst at gmail.com> > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index bd60a84998..0de84fe9fc 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -1856,6 +1856,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop) > > set0 = cloneForward(func, set0); > set1 = cloneShallow(func, set1); > + > + if (logop->src(0).mod == Modifier(NV50_IR_MOD_NOT)) > + set0->asCmp()->setCond = inverseCondCode(set0->asCmp()->setCond); > + if (logop->src(1).mod == Modifier(NV50_IR_MOD_NOT)) > + set1->asCmp()->setCond = inverseCondCode(set1->asCmp()->setCond);set0/set1 may have been swapped further up, so you need to keep track of that. Also, I don't think this will work if one of the sets is a SET_AND -- the condcode applies to the set bit, not to the AND bit. I think you'd also have to flip AND <-> OR and flip the neg. -ilia
Possibly Parallel Threads
- [PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop
- [PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop
- [PATCH v2 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt
- [PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
- [Bug 106132] New: bar.sync encoding incorrect for GM107