thr3ads.net - search: "getssa"

Displaying 18 results from an estimated 18 matches for "getssa".

[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)

2015 Feb 23

[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)

..."quake" style + * approximation for RSQ: + * + * 0x5fe6eb50c7b537a9 - num >> 1 + * + * For RCP, we will then square it. + */ + Value *abs, *guess, *parts[2], *input[2], *shr[4], *pred; + + bld.setPosition(i, false); + + abs = bld.mkOp1v(OP_ABS, TYPE_F64, bld.getSSA(8), i->getSrc(0)); + + parts[0] = bld.loadImm(NULL, 0xc7b537a9); + parts[1] = bld.loadImm(NULL, 0x5fe6eb50); + guess = bld.mkOp2v(OP_MERGE, TYPE_F64, bld.getSSA(8), parts[0], parts[1]); + + bld.mkSplit(input, 4, abs); + shr[0] = bld.mkOp2v(OP_SHR, TYPE_U32, bld.getSSA(4), input[0], bld...

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

...@@ -93,7 +94,42 @@ NVC0LegalizeSSA::handleRCPRSQ(Instruction *i) > > // 4. Recombine the two dst pieces back into the original destination. > bld.setPosition(i, true); > - bld.mkOp2(OP_MERGE, TYPE_U64, def, dst[0], dst[1]); > + guess = bld.mkOp2v(OP_MERGE, TYPE_U64, bld.getSSA(8), dst[0], dst[1]); > + > + // 5. Perform 2 Newton-Raphson steps > + if (i->op == OP_RCP) { > + // RCP: x_{n+1} = 2 * x_n - input * x_n^2 > + Value *two = bld.getSSA(8); > + > + bld.mkCvt(OP_CVT, TYPE_F64, two, TYPE_F32, bld.loadImm(NULL, 2.0f)); > + &...

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

...= bld.loadImm(NULL, 0); @@ -93,7 +94,42 @@ NVC0LegalizeSSA::handleRCPRSQ(Instruction *i) // 4. Recombine the two dst pieces back into the original destination. bld.setPosition(i, true); - bld.mkOp2(OP_MERGE, TYPE_U64, def, dst[0], dst[1]); + guess = bld.mkOp2v(OP_MERGE, TYPE_U64, bld.getSSA(8), dst[0], dst[1]); + + // 5. Perform 2 Newton-Raphson steps + if (i->op == OP_RCP) { + // RCP: x_{n+1} = 2 * x_n - input * x_n^2 + Value *two = bld.getSSA(8); + + bld.mkCvt(OP_CVT, TYPE_F64, two, TYPE_F32, bld.loadImm(NULL, 2.0f)); + + guess = bld.mkOp2v(OP_SUB, TYPE_F6...

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

...NVC0LegalizeSSA::handleRCPRSQ(Instruction *i) >> >> // 4. Recombine the two dst pieces back into the original destination. >> bld.setPosition(i, true); >> - bld.mkOp2(OP_MERGE, TYPE_U64, def, dst[0], dst[1]); >> + guess = bld.mkOp2v(OP_MERGE, TYPE_U64, bld.getSSA(8), dst[0], dst[1]); >> + >> + // 5. Perform 2 Newton-Raphson steps >> + if (i->op == OP_RCP) { >> + // RCP: x_{n+1} = 2 * x_n - input * x_n^2 >> + Value *two = bld.getSSA(8); >> + >> + bld.mkCvt(OP_CVT, TYPE_F64, two, TYPE_F32, bld.lo...

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

2015 Feb 20

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index dfb093c..e38a3b8 100644 ---

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

2019 Oct 14

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

...e *ptr, int slot, uint32_t off, bool bindless { uint32_t base = slot * NVC0_SU_INFO__STRIDE; + // We don't upload surface info for bindless for GM107+ + assert(!bindless || targ->getChipset() < NVISA_GM107_CHIPSET); + if (ptr) { ptr = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), ptr, bld.mkImm(slot)); if (bindless) @@ -2204,7 +2207,7 @@ getDestType(const ImgType type) { } void -NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su) +NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su, Instruction **loaded) { const TexInstruction::ImgFormatDesc...

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

2014 May 18

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

...TYPE_S32; break; default: return false; } @@ -59,15 +66,25 @@ expandIntegerMUL(BuildUtil *bld, Instruction *mul) bld->setPosition(mul, true); + Value *s[2]; Value *a[2], *b[2]; - Value *c[2]; Value *t[4]; for (int j = 0; j < 4; ++j) t[j] = bld->getSSA(fullSize); + s[0] = mul->getSrc(0); + s[1] = mul->getSrc(1); + + if (isSignedType(mul->sType)) { + s[0] = bld->getSSA(fullSize); + s[1] = bld->getSSA(fullSize); + bld->mkOp1(OP_ABS, mul->sType, s[0], mul->getSrc(0)); + bld->mkOp1(OP_ABS, mul-&g...

[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced

2014 May 13

[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2" <mesa-stable at lists.freedesktop.org> --- Not 100% sure of the significance of this code, but this seems like the correct thing to do... will definitely run it through a full piglit run before pushing out. src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

2014 Jan 13

[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

OK, so there's a bunch of stuff in here. The geometry stuff is based on the work started by Bryan Cain and Christoph Bumiller. Patches 01-12: Add support for geometry shaders and fix related issues Patches 13-14: Make it possible for fb clears to operate on texture attachments with an explicit layer set (as is allowed in gl 3.2). Patches 15-17: Make ARB_texture_multisample work

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

..., but using SM50-friendly primitives. + static const uint8_t qOps[2] = + { QUADOP(MOV2, ADD, MOV2, ADD), QUADOP(MOV2, MOV2, ADD, ADD) }; Value *def[4][4]; - Value *crd[3]; + Value *crd[3], *arr, *shadow; Value *tmp; Instruction *tex, *add; - Value *zero = bld.loadImm(bld.getSSA(), 0); + Value *quad = bld.mkImm(SHFL_BOUND_QUAD); int l, c; const int dim = i->tex.target.getDim() + i->tex.target.isCube(); const int array = i->tex.target.isArray(); @@ -115,35 +112,40 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) for (c = 0; c < dim; +...

[PATCH] nvc0: do quadops on the right texture coordinates for TXD

2014 Jul 05

[PATCH] nvc0: do quadops on the right texture coordinates for TXD

.../nv50_ir_lowering_nvc0.cpp index 8f26645..0e24db7 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -712,6 +712,7 @@ NVC0LoweringPass::handleManualTXD(TexInstruction *i) Value *zero = bld.loadImm(bld.getSSA(), 0); int l, c; const int dim = i->tex.target.getDim(); + const int array = i->tex.target.isArray(); i->op = OP_TEX; // no need to clone dPdx/dPdy later @@ -722,7 +723,7 @@ NVC0LoweringPass::handleManualTXD(TexInstruction *i) for (l = 0; l < 4; ++l) { // mo...

[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case

2014 Aug 08

[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index f010767..4a9e48f 100644 ---

[PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space

2016 Jan 14

[PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space

...+ void adjustTempIndex(int arrayId, int &idx, int &idx2d) const; Value *applySrcMod(Value *, int s, int c); Symbol *makeSym(uint file, int fileIndex, int idx, int c, uint32_t addr); @@ -1679,11 +1698,23 @@ Converter::shiftAddress(Value *index) return mkOp2v(OP_SHL, TYPE_U32, getSSA(4, FILE_ADDRESS), index, mkImm(4)); } +void +Converter::adjustTempIndex(int arrayId, int &idx, int &idx2d) const +{ + std::map<int, tgsi::Source::TempBase>::const_iterator it = + code->indirectTempBases.find(arrayId); + if (it == code->indirectTempBases.end()) +...

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

...t; + static const uint8_t qOps[2] = > + { QUADOP(MOV2, ADD, MOV2, ADD), QUADOP(MOV2, MOV2, ADD, ADD) }; > Value *def[4][4]; > - Value *crd[3]; > + Value *crd[3], *arr, *shadow; > Value *tmp; > Instruction *tex, *add; > - Value *zero = bld.loadImm(bld.getSSA(), 0); > + Value *quad = bld.mkImm(SHFL_BOUND_QUAD); > int l, c; > const int dim = i->tex.target.getDim() + i->tex.target.isCube(); > const int array = i->tex.target.isArray(); > @@ -115,35 +112,40 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) > &...

[PATCH] nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection

2016 Jan 14

[PATCH] nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection

....getIndex(1) : 0; + int idx2d = src.is2D() ? src.getIndex(1) : 0; const int idx = src.getIndex(0); const int swz = src.getSwizzle(c); Instruction *ld; @@ -1704,6 +1724,14 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c)); ld->perPatch = info->sv[idx].patch; return ld->getDef(0); + case TGSI_FILE_TEMPORARY: { + int arrayid = src.getArrayId(); + if (!arrayid) + arrayid = code->tempArrayId[idx]; + idx2d = (code->indirectTempArrays.find(arrayid)...

[PATCH 00/12] Tessellation support for nvc0

2015 May 17

[PATCH 00/12] Tessellation support for nvc0

This is enough to enable tessellation support on nvc0. It seems to work a lot better on my GF108 than GK208. I suspect that there's some sort of scheduling shenanigans that need to be adjusted for kepler+. Or perhaps some shader header things. Even with the GF108, I still get occasional blue triangles in Heaven, but I get a *ton* of them on the GK208 -- seemingly the same issue, but it's

[PATCH 00/12] Cherry-pick nv50/nvc0 patches from gallium-nine

2014 May 20

[PATCH 00/12] Cherry-pick nv50/nvc0 patches from gallium-nine

I went through the gallium-nine tree and picked out nouveau patches that are general bug-fixes. The first bunch I'd like to also get into 10.2. I've reviewed all of them and they make sense to me, but sending them out for public review as well in case there are any objections. Unless I hear objections, I'd like to push this by Friday. Christoph Bumiller (11): nv50,nvc0: always pull

[RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 11

[RFC 0/9] Add precise/invariant semantics to TGSI

Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise modifiers on variables inside Nouveau. This series add precise/invariant handling to TGSI, which can be then used by drivers to disable certain unsafe optimisations which may otherwise alter calculations, which depend on having the same result across shaders. This series fixes this bug in Tomb Raider and one CTS test

search for: getssa