Ilia Mirkin
2014-May-13 15:33 UTC
[Nouveau] [PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
Cc: "10.2" <mesa-stable at lists.freedesktop.org>
---
Not 100% sure of the significance of this code, but this seems like the
correct thing to do... will definitely run it through a full piglit run before
pushing out.
src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 60a6a3f..b284081 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -998,7 +998,9 @@ GCRA::doCoalesce(ArrayList& insns, unsigned int mask)
case OP_TXQ:
case OP_TXD:
case OP_TXG:
+ case OP_TXLQ:
case OP_TEXCSAA:
+ case OP_TEXPREP:
if (!(mask & JOIN_MASK_TEX))
break;
for (c = 0; insn->srcExists(c) && c != insn->predSrc;
++c)
--
1.8.5.5
Ilia Mirkin
2014-May-13 15:33 UTC
[Nouveau] [PATCH 2/2] nv50/ir: fix integer mul lowering for u32 x u32 -> high u32
UNION appears to expect that all of its defines are conditional.
Otherwise it inserts a mov instruction which overwrites the desired
result. This fixes tests that use UMUL_HI, and much less directly,
unsigned integer division by a constant, which uses this functionality
in its lowering.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable at lists.freedesktop.org>
---
The S32 version of this logic is busted too (thankfully there is no IMUL
instruction so it normally doesn't come up). However division by signed
integer is still broken as a result. Fixing it will require more changes, but
those will rely on the unsigned case working.
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
index 63db1d7..b17d57d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
@@ -75,16 +75,17 @@ expandIntegerMUL(BuildUtil *bld, Instruction *mul)
i[4] = bld->mkOp3(OP_MAD, fTy, t[3], a[0], b[0], t[2]);
if (highResult) {
- Value *r[3];
+ Value *r[4];
Value *imm = bld->loadImm(NULL, 1 << (halfSize * 8));
c[0] = bld->getSSA(1, FILE_FLAGS);
c[1] = bld->getSSA(1, FILE_FLAGS);
- for (int j = 0; j < 3; ++j)
+ for (int j = 0; j < 4; ++j)
r[j] = bld->getSSA(fullSize);
i[8] = bld->mkOp2(OP_SHR, fTy, r[0], t[1], bld->mkImm(halfSize *
8));
i[6] = bld->mkOp2(OP_ADD, fTy, r[1], r[0], imm);
- bld->mkOp2(OP_UNION, TYPE_U32, r[2], r[1], r[0]);
+ bld->mkMov(r[3], r[0])->setPredicate(CC_NC, c[0]);
+ bld->mkOp2(OP_UNION, TYPE_U32, r[2], r[1], r[3]);
i[5] = bld->mkOp3(OP_MAD, fTy, mul->getDef(0), a[1], b[1], r[2]);
// set carry defs / sources
--
1.8.5.5
Reasonably Related Threads
- [PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic
- [PATCH] gm107/ir: fix loading z offset for layered 3d image bindings
- [PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)
- [PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2
- [PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax