search for: getssa

Displaying 18 results from an estimated 18 matches for "getssa".

2015 Feb 23
2
[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)
..."quake" style + * approximation for RSQ: + * + * 0x5fe6eb50c7b537a9 - num >> 1 + * + * For RCP, we will then square it. + */ + Value *abs, *guess, *parts[2], *input[2], *shr[4], *pred; + + bld.setPosition(i, false); + + abs = bld.mkOp1v(OP_ABS, TYPE_F64, bld.getSSA(8), i->getSrc(0)); + + parts[0] = bld.loadImm(NULL, 0xc7b537a9); + parts[1] = bld.loadImm(NULL, 0x5fe6eb50); + guess = bld.mkOp2v(OP_MERGE, TYPE_F64, bld.getSSA(8), parts[0], parts[1]); + + bld.mkSplit(input, 4, abs); + shr[0] = bld.mkOp2v(OP_SHR, TYPE_U32, bld.getSSA(4), input[0], bld...
2015 Feb 23
2
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
...@@ -93,7 +94,42 @@ NVC0LegalizeSSA::handleRCPRSQ(Instruction *i) > > // 4. Recombine the two dst pieces back into the original destination. > bld.setPosition(i, true); > - bld.mkOp2(OP_MERGE, TYPE_U64, def, dst[0], dst[1]); > + guess = bld.mkOp2v(OP_MERGE, TYPE_U64, bld.getSSA(8), dst[0], dst[1]); > + > + // 5. Perform 2 Newton-Raphson steps > + if (i->op == OP_RCP) { > + // RCP: x_{n+1} = 2 * x_n - input * x_n^2 > + Value *two = bld.getSSA(8); > + > + bld.mkCvt(OP_CVT, TYPE_F64, two, TYPE_F32, bld.loadImm(NULL, 2.0f)); > + &...
2015 Feb 23
0
[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
...= bld.loadImm(NULL, 0); @@ -93,7 +94,42 @@ NVC0LegalizeSSA::handleRCPRSQ(Instruction *i) // 4. Recombine the two dst pieces back into the original destination. bld.setPosition(i, true); - bld.mkOp2(OP_MERGE, TYPE_U64, def, dst[0], dst[1]); + guess = bld.mkOp2v(OP_MERGE, TYPE_U64, bld.getSSA(8), dst[0], dst[1]); + + // 5. Perform 2 Newton-Raphson steps + if (i->op == OP_RCP) { + // RCP: x_{n+1} = 2 * x_n - input * x_n^2 + Value *two = bld.getSSA(8); + + bld.mkCvt(OP_CVT, TYPE_F64, two, TYPE_F32, bld.loadImm(NULL, 2.0f)); + + guess = bld.mkOp2v(OP_SUB, TYPE_F6...
2015 Feb 23
0
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
...NVC0LegalizeSSA::handleRCPRSQ(Instruction *i) >> >> // 4. Recombine the two dst pieces back into the original destination. >> bld.setPosition(i, true); >> - bld.mkOp2(OP_MERGE, TYPE_U64, def, dst[0], dst[1]); >> + guess = bld.mkOp2v(OP_MERGE, TYPE_U64, bld.getSSA(8), dst[0], dst[1]); >> + >> + // 5. Perform 2 Newton-Raphson steps >> + if (i->op == OP_RCP) { >> + // RCP: x_{n+1} = 2 * x_n - input * x_n^2 >> + Value *two = bld.getSSA(8); >> + >> + bld.mkCvt(OP_CVT, TYPE_F64, two, TYPE_F32, bld.lo...
2015 Feb 20
10
[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index dfb093c..e38a3b8 100644 ---
2019 Oct 14
1
[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings
...e *ptr, int slot, uint32_t off, bool bindless { uint32_t base = slot * NVC0_SU_INFO__STRIDE; + // We don't upload surface info for bindless for GM107+ + assert(!bindless || targ->getChipset() < NVISA_GM107_CHIPSET); + if (ptr) { ptr = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), ptr, bld.mkImm(slot)); if (bindless) @@ -2204,7 +2207,7 @@ getDestType(const ImgType type) { } void -NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su) +NVC0LoweringPass::convertSurfaceFormat(TexInstruction *su, Instruction **loaded) { const TexInstruction::ImgFormatDesc...
2014 May 18
1
[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic
...TYPE_S32; break; default: return false; } @@ -59,15 +66,25 @@ expandIntegerMUL(BuildUtil *bld, Instruction *mul) bld->setPosition(mul, true); + Value *s[2]; Value *a[2], *b[2]; - Value *c[2]; Value *t[4]; for (int j = 0; j < 4; ++j) t[j] = bld->getSSA(fullSize); + s[0] = mul->getSrc(0); + s[1] = mul->getSrc(1); + + if (isSignedType(mul->sType)) { + s[0] = bld->getSSA(fullSize); + s[1] = bld->getSSA(fullSize); + bld->mkOp1(OP_ABS, mul->sType, s[0], mul->getSrc(0)); + bld->mkOp1(OP_ABS, mul-&g...
2014 May 13
1
[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2" <mesa-stable at lists.freedesktop.org> --- Not 100% sure of the significance of this code, but this seems like the correct thing to do... will definitely run it through a full piglit run before pushing out. src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git
2014 Jan 13
20
[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2
OK, so there's a bunch of stuff in here. The geometry stuff is based on the work started by Bryan Cain and Christoph Bumiller. Patches 01-12: Add support for geometry shaders and fix related issues Patches 13-14: Make it possible for fb clears to operate on texture attachments with an explicit layer set (as is allowed in gl 3.2). Patches 15-17: Make ARB_texture_multisample work
2017 Dec 20
2
[PATCH] gm107/ir: use lane 0 for manual textureGrad handling
..., but using SM50-friendly primitives. + static const uint8_t qOps[2] = + { QUADOP(MOV2, ADD, MOV2, ADD), QUADOP(MOV2, MOV2, ADD, ADD) }; Value *def[4][4]; - Value *crd[3]; + Value *crd[3], *arr, *shadow; Value *tmp; Instruction *tex, *add; - Value *zero = bld.loadImm(bld.getSSA(), 0); + Value *quad = bld.mkImm(SHFL_BOUND_QUAD); int l, c; const int dim = i->tex.target.getDim() + i->tex.target.isCube(); const int array = i->tex.target.isArray(); @@ -115,35 +112,40 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) for (c = 0; c < dim; +...
2014 Jul 05
0
[PATCH] nvc0: do quadops on the right texture coordinates for TXD
.../nv50_ir_lowering_nvc0.cpp index 8f26645..0e24db7 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -712,6 +712,7 @@ NVC0LoweringPass::handleManualTXD(TexInstruction *i) Value *zero = bld.loadImm(bld.getSSA(), 0); int l, c; const int dim = i->tex.target.getDim(); + const int array = i->tex.target.isArray(); i->op = OP_TEX; // no need to clone dPdx/dPdy later @@ -722,7 +723,7 @@ NVC0LoweringPass::handleManualTXD(TexInstruction *i) for (l = 0; l < 4; ++l) { // mo...
2014 Aug 08
2
[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index f010767..4a9e48f 100644 ---
2016 Jan 14
0
[PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space
...+ void adjustTempIndex(int arrayId, int &idx, int &idx2d) const; Value *applySrcMod(Value *, int s, int c); Symbol *makeSym(uint file, int fileIndex, int idx, int c, uint32_t addr); @@ -1679,11 +1698,23 @@ Converter::shiftAddress(Value *index) return mkOp2v(OP_SHL, TYPE_U32, getSSA(4, FILE_ADDRESS), index, mkImm(4)); } +void +Converter::adjustTempIndex(int arrayId, int &idx, int &idx2d) const +{ + std::map<int, tgsi::Source::TempBase>::const_iterator it = + code->indirectTempBases.find(arrayId); + if (it == code->indirectTempBases.end()) +...
2017 Dec 20
0
[PATCH] gm107/ir: use lane 0 for manual textureGrad handling
...t; + static const uint8_t qOps[2] = > + { QUADOP(MOV2, ADD, MOV2, ADD), QUADOP(MOV2, MOV2, ADD, ADD) }; > Value *def[4][4]; > - Value *crd[3]; > + Value *crd[3], *arr, *shadow; > Value *tmp; > Instruction *tex, *add; > - Value *zero = bld.loadImm(bld.getSSA(), 0); > + Value *quad = bld.mkImm(SHFL_BOUND_QUAD); > int l, c; > const int dim = i->tex.target.getDim() + i->tex.target.isCube(); > const int array = i->tex.target.isArray(); > @@ -115,35 +112,40 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) > &...
2016 Jan 14
0
[PATCH] nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirection
....getIndex(1) : 0; + int idx2d = src.is2D() ? src.getIndex(1) : 0; const int idx = src.getIndex(0); const int swz = src.getSwizzle(c); Instruction *ld; @@ -1704,6 +1724,14 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c)); ld->perPatch = info->sv[idx].patch; return ld->getDef(0); + case TGSI_FILE_TEMPORARY: { + int arrayid = src.getArrayId(); + if (!arrayid) + arrayid = code->tempArrayId[idx]; + idx2d = (code->indirectTempArrays.find(arrayid)...
2015 May 17
14
[PATCH 00/12] Tessellation support for nvc0
This is enough to enable tessellation support on nvc0. It seems to work a lot better on my GF108 than GK208. I suspect that there's some sort of scheduling shenanigans that need to be adjusted for kepler+. Or perhaps some shader header things. Even with the GF108, I still get occasional blue triangles in Heaven, but I get a *ton* of them on the GK208 -- seemingly the same issue, but it's
2014 May 20
14
[PATCH 00/12] Cherry-pick nv50/nvc0 patches from gallium-nine
I went through the gallium-nine tree and picked out nouveau patches that are general bug-fixes. The first bunch I'd like to also get into 10.2. I've reviewed all of them and they make sense to me, but sending them out for public review as well in case there are any objections. Unless I hear objections, I'd like to push this by Friday. Christoph Bumiller (11): nv50,nvc0: always pull
2017 Jun 11
14
[RFC 0/9] Add precise/invariant semantics to TGSI
Running Tomb Raider on Nouveau I found some flicker caused by ignoring precise modifiers on variables inside Nouveau. This series add precise/invariant handling to TGSI, which can be then used by drivers to disable certain unsafe optimisations which may otherwise alter calculations, which depend on having the same result across shaders. This series fixes this bug in Tomb Raider and one CTS test