Tobias Klausmann
2015-Jan-11 20:17 UTC
[Nouveau] [PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions
On 11.01.2015 20:57, Ilia Mirkin wrote:> On Sun, Jan 11, 2015 at 2:56 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> >> On 11.01.2015 20:19, Ilia Mirkin wrote: >>> On Sun, Jan 11, 2015 at 12:27 PM, Tobias Klausmann >>> <tobias.johannes.klausmann at mni.thm.de> wrote: >>>> >>>> On 11.01.2015 01:58, Ilia Mirkin wrote: >>>>> On Fri, Jan 9, 2015 at 8:24 PM, Tobias Klausmann >>>>> <tobias.johannes.klausmann at mni.thm.de> wrote: >>>>>> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, >>>>>> {S16/32})->F32 >>>>>> >>>>>> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> >>>>>> --- >>>>>> V2: beat me, whip me, split out F64 >>>>>> >>>>>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 81 >>>>>> ++++++++++++++++++++++ >>>>>> 1 file changed, 81 insertions(+) >>>>>> >>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>>> index 9a0bb60..741c74f 100644 >>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp >>>>>> @@ -997,6 +997,87 @@ ConstantFolding::opnd(Instruction *i, >>>>>> ImmediateValue >>>>>> &imm0, int s) >>>>>> i->op = OP_MOV; >>>>>> break; >>>>>> } >>>>>> + case OP_CVT: { >>>>>> + Storage res; >>>>>> + bld.setPosition(i, true); /* make sure bld is init'ed */ >>>>>> + switch(i->dType) { >>>>>> + case TYPE_U16: >>>>>> + switch (i->sType) { >>>>>> + case TYPE_F32: >>>>>> + if (i->saturate) >>>>>> + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, >>>>>> + UINT16_MAX)); >>>>> Where did this saturate stuff come from? It doesn't make sense to >>>>> saturate to a non-float dtype. I'd go ahead and just >>>>> assert(!i->saturate) in the int dtype cases. >>>>> >>>>> One does wonder what the hw does if the float doesn't fit in the >>>>> destination... whether it saturates or not. I don't hugely care >>>>> though. >>>> Actually i can't remember why that was added in the first place, i'll go >>>> ahead and follow your advice here. >>> Oh wait... this was to support saturating an array access into a u16... >>> >>> const int sat = (i->op == OP_TXF) ? 1 : 0; >>> DataType sTy = (i->op == OP_TXF) ? TYPE_U32 : TYPE_F32; >>> bld.mkCvt(OP_CVT, TYPE_U16, layer, sTy, src)->saturate = sat; >>> >>> So... basically if the source is a U32 and the dest is a U16, we want >>> to saturate there? IMO this is such a minor use-case that it doesn't >>> really matter. However I guess you can keep the saturate bits around >>> if you like. >> We can do it with or without the saturate if we rely on the test, >> assert(!i->saturate)'ing is the only thing that breaks the test you sure >> meant: >> >> glsl-resource-not-bound 1DArray >> glsl-resource-not-bound 2DArray >> glsl-resource-not-bound 2DMSArray > Hm, those are the only times that a texelFetch is done in piglit with > a constant layer index, I guess.Ok, i'll keep the saturates for (U/S)16 to for once satisfy the "dependency" you posted up there and to be future proof if somebody implements something similar(?) for the S16 one!
Tobias Klausmann
2015-Jan-11 21:40 UTC
[Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- V2: Split out F64 parts V3: remove handling of saturate for (U/S)32, .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 73 ++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 21d20ca..aaf0d0d 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) i->op = OP_MOV; break; } + case OP_CVT: { + Storage res; + bld.setPosition(i, true); /* make sure bld is init'ed */ + switch(i->dType) { + case TYPE_U16: + switch (i->sType) { + case TYPE_F32: + if (i->saturate) + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, + UINT16_MAX)); + else + res.data.u16 = util_iround(imm0.reg.data.f32); + break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.u16)); + break; + case TYPE_U32: + assert(!i->saturate); + switch (i->sType) { + case TYPE_F32: + res.data.u32 = util_iround(imm0.reg.data.f32); + break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.u32)); + break; + case TYPE_S16: + switch (i->sType) { + case TYPE_F32: + if (i->saturate) + res.data.s16 = util_iround(CLAMP(imm0.reg.data.f32, INT16_MIN, + INT16_MAX)); + else + res.data.s16 = util_iround(imm0.reg.data.f32); + break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.s16)); + break; + case TYPE_S32: + assert(!i->saturate); + switch (i->sType) { + case TYPE_F32: + res.data.s32 = util_iround(imm0.reg.data.f32); + break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.s32)); + break; + case TYPE_F32: + switch (i->sType) { + case TYPE_U16: res.data.f32 = (float) imm0.reg.data.u16; break; + case TYPE_U32: res.data.f32 = (float) imm0.reg.data.u32; break; + case TYPE_S16: res.data.f32 = (float) imm0.reg.data.s16; break; + case TYPE_S32: res.data.f32 = (float) imm0.reg.data.s32; break; + default: + return; + } + i->setSrc(0, bld.mkImm(res.data.f32)); + break; + default: + return; + } + i->setType(i->dType); /* Remove i->sType, which we don't need anymore */ + i->op = OP_MOV; + i->src(0).mod = Modifier(0); /* Clear the already applied modifier */ + break; + } default: return; } -- 2.2.1
Ilia Mirkin
2015-Jan-11 21:54 UTC
[Nouveau] [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
On Sun, Jan 11, 2015 at 4:40 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 > > Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> > --- > V2: Split out F64 parts > V3: remove handling of saturate for (U/S)32, > > .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 73 ++++++++++++++++++++++ > 1 file changed, 73 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 21d20ca..aaf0d0d 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -997,6 +997,79 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) > i->op = OP_MOV; > break; > } > + case OP_CVT: { > + Storage res; > + bld.setPosition(i, true); /* make sure bld is init'ed */ > + switch(i->dType) { > + case TYPE_U16: > + switch (i->sType) { > + case TYPE_F32: > + if (i->saturate) > + res.data.u16 = util_iround(CLAMP(imm0.reg.data.f32, 0, > + UINT16_MAX)); > + else > + res.data.u16 = util_iround(imm0.reg.data.f32); > + break; > + default: > + return; > + }This won't get hit for the U32 -> U16 conversion though right? Did you test that case? Am I misreading/misunderstanding perhaps? -ilia
Possibly Parallel Threads
- [PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions
- [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
- [PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
- [RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
- [PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions