search for: trycollapsechainedmul

Displaying 10 results from an estimated 10 matches for "trycollapsechainedmul".

Did you mean: trycollapsechainedmuls
2016 Sep 30
2
[PATCH v2] nv50/ir: constant fold OP_SPLIT
...= 0; i->defExists(d); ++d) { + bld.mkMov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)), type); + val >>= shift; + } + delete_Instruction(prog, i); + } + } + break; case OP_MUL: if (i->dType == TYPE_F32) tryCollapseChainedMULs(i, s, imm0); -- 2.10.0
2016 Sep 27
2
[PATCH] nv50/ir: constant fold OP_SPLIT
..._NONE) { + bld.mkMov(i->getDef(0), bld.mkImm(imm0.reg.data.u64 >> shift), type); + bld.mkMov(i->getDef(1), bld.mkImm(imm0.reg.data.u64), type); + delete_Instruction(prog, i); + } + } + break; case OP_MUL: if (i->dType == TYPE_F32) tryCollapseChainedMULs(i, s, imm0); -- 2.10.0
2016 Sep 30
2
[PATCH] nv50/ir: constant fold OP_SPLIT
...U32. Not sure if that is what we want... other than that that, shorten it like this would be nice! > >> + delete_Instruction(prog, i); >> + } >> + } >> + break; >> case OP_MUL: >> if (i->dType == TYPE_F32) >> tryCollapseChainedMULs(i, s, imm0); >> -- >> 2.10.0 >> >> _______________________________________________ >> Nouveau mailing list >> Nouveau at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/nouveau
2014 Jul 08
1
[PATCH] nv50/ir: use unordered_set instead of list to keep our instructions in uses
...um/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index c162ac4..8d052c5 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -686,7 +686,7 @@ ConstantFolding::tryCollapseChainedMULs(Instruction *mul2, // b = mul a, imm // d = mul b, c -> d = mul_x_imm a, c int s2, t2; - insn = mul2->getDef(0)->uses.front()->getInsn(); + insn = (*mul2->getDef(0)->uses.begin())->getInsn(); if (!insn) return; mul1 = mu...
2016 Sep 28
0
[PATCH] nv50/ir: constant fold OP_SPLIT
...hift) - 1)); val >>= shift; } I think this will account for every case, and with a lot less special-casing. What do you think? > + delete_Instruction(prog, i); > + } > + } > + break; > case OP_MUL: > if (i->dType == TYPE_F32) > tryCollapseChainedMULs(i, s, imm0); > -- > 2.10.0 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
2016 Sep 30
0
[PATCH] nv50/ir: constant fold OP_SPLIT
...>dType), isSignedType(i->dType)) How's that :p > > >> >>> + delete_Instruction(prog, i); >>> + } >>> + } >>> + break; >>> case OP_MUL: >>> if (i->dType == TYPE_F32) >>> tryCollapseChainedMULs(i, s, imm0); >>> -- >>> 2.10.0 >>> >>> _______________________________________________ >>> Nouveau mailing list >>> Nouveau at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/nouveau > >
2016 Sep 30
0
[PATCH v2] nv50/ir: constant fold OP_SPLIT
...ov(i->getDef(d), bld.mkImm(val & ((1 << shift) - 1)), type); 1ULL > + val >>= shift; > + } > + delete_Instruction(prog, i); > + } > + } > + break; > case OP_MUL: > if (i->dType == TYPE_F32) > tryCollapseChainedMULs(i, s, imm0); > -- > 2.10.0 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/nouveau
2015 Mar 25
0
[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations
...9,6 +581,7 @@ ConstantFolding::expr(Instruction *i, i->src(0).mod = Modifier(0); i->src(1).mod = Modifier(0); + i->postFactor = 0; i->setSrc(0, new_ImmediateValue(i->bb->getProgram(), res.data.u32)); i->setSrc(1, NULL); @@ -682,7 +685,7 @@ ConstantFolding::tryCollapseChainedMULs(Instruction *mul2, Instruction *insn; Instruction *mul1 = NULL; // mul1 before mul2 int e = 0; - float f = imm2.reg.data.f32; + float f = imm2.reg.data.f32 * exp2f(mul2->postFactor); ImmediateValue imm1; assert(mul2->op == OP_MUL && mul2->dType == TYPE_F3...
2015 May 09
5
[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Pretty sure there's nothing wrong with it, but it looks odd in the code. src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +++++-- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-)
2014 May 18
1
[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic
Retrieving the high 32 bits of a signed multiply is rather annoying. It appears that the simplest way to do this is to compute the absolute value of the arguments, and perform a u32 x u32 -> u64 operation. If the arguments' signs differ, then negate the result. Since there is no u64 support in the cvt instruction, we have the perform the 2's complement negation "by hand".