Displaying 20 results from an estimated 500 matches similar to: "[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations"
2015 Jan 01
0
[PATCH] nv50/ir: fold MAD when one of the multiplicands is const
Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when
- immed = 0 -> MOV dst, src2
- immed = +/- 1 -> ADD dst, src0, src2
These types of MAD pattersn were observed in some st/nine shaders.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Haven't tested this enough to push yet, but thought I'd get it out there.
Passes some simple test cases.
2014 Jun 03
0
[PATCH v2 3/4] nvc0/ir: Handle OP_BFIND when folding constant expressions
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index a214ffc..c497335 100644
---
2015 Aug 19
5
[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++-------
1 file changed, 46 insertions(+), 20 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2014 Jun 03
0
[PATCH v2 4/4] nvc0/ir: Handle OP_POPCNT when folding constant expressions
V2: Add support for a single-argument version of POPCNT for Maxwell (SM5)
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2016 Oct 02
0
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
> Previously we'd end up with an unnecessary mov for the thirs immediate value.
>
> total instructions in shared programs : 851881 -> 851864 (-0.00%)
> total gprs used in shared programs : 110295 -> 110295 (0.00%)
> total local used in shared programs : 1020 ->
2016 Oct 02
2
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
Previously we'd end up with an unnecessary mov for the thirs immediate value.
total instructions in shared programs : 851881 -> 851864 (-0.00%)
total gprs used in shared programs : 110295 -> 110295 (0.00%)
total local used in shared programs : 1020 -> 1020 (0.00%)
local gpr inst bytes
helped 0 0 17 17
2016 Oct 02
0
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
That's very odd. LoadPropagation should have picked that up even in
its current form. Should try to figure out why it didn't and that is
likely to "fix" a *lot* more situations.
On Sun, Oct 2, 2016 at 2:24 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
>
>
> On 02.10.2016 20:03, Ilia Mirkin wrote:
>>
>> On Sun, Oct 2, 2016 at
2016 Oct 02
1
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:26, Ilia Mirkin wrote:
> That's very odd. LoadPropagation should have picked that up even in
> its current form. Should try to figure out why it didn't and that is
> likely to "fix" a *lot* more situations.
Actually i was coming from an, given really constrained, addition to the
LoadPropagation pass, where i was told to fix it within OP_MAD :/
> On
2016 Oct 02
2
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote:
> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
> <tobias.johannes.klausmann at mni.thm.de> wrote:
>> Previously we'd end up with an unnecessary mov for the thirs immediate value.
>>
>> total instructions in shared programs : 851881 -> 851864 (-0.00%)
>> total gprs used in shared programs : 110295 -> 110295
2017 Apr 29
3
[PATCH] nv50/ir: optimmize shl(a, 0) to a
helps two alien isolation shaders
shader-db:
total instructions in shared programs : 4251497 -> 4251494 (-0.00%)
total gprs used in shared programs : 513962 -> 513962 (0.00%)
total local used in shared programs : 29797 -> 29797 (0.00%)
total bytes used in shared programs : 38960264 -> 38960232 (-0.00%)
local gpr inst bytes
helped
2017 Apr 29
0
[PATCH] nv50/ir: optimmize shl(a, 0) to a
On Sat, Apr 29, 2017 at 12:46 PM, Karol Herbst <karolherbst at gmail.com> wrote:
> helps two alien isolation shaders
>
> shader-db:
> total instructions in shared programs : 4251497 -> 4251494 (-0.00%)
> total gprs used in shared programs : 513962 -> 513962 (0.00%)
> total local used in shared programs : 29797 -> 29797 (0.00%)
> total bytes used in shared
2017 Apr 29
0
[PATCH v2] nv50/ir: optimize shl(a, 0) to a
On Sat, Apr 29, 2017 at 6:09 PM, Karol Herbst <karolherbst at gmail.com> wrote:
> helps two alien isolation shaders
>
> shader-db:
> total instructions in shared programs : 4251497 -> 4251494 (-0.00%)
> total gprs used in shared programs : 513962 -> 513962 (0.00%)
> total local used in shared programs : 29797 -> 29797 (0.00%)
> total bytes used in shared
2017 Apr 30
0
[PATCH v2] nv50/ir: optimize shl(a, 0) to a
Maybe in a separate change. I'd want to double check on all gens. I think
the thing I suggested is sufficient.
On Apr 29, 2017 8:09 PM, "Karol Herbst" <karolherbst at gmail.com> wrote:
2017-04-30 0:28 GMT+02:00 Ilia Mirkin <imirkin at alum.mit.edu>:
> On Sat, Apr 29, 2017 at 6:09 PM, Karol Herbst <karolherbst at gmail.com>
wrote:
>> helps two alien
2017 Apr 30
0
[PATCH v2] nv50/ir: optimize shl(a, 0) to a
On Apr 30, 2017 8:14 AM, "Karol Herbst" <karolherbst at gmail.com> wrote:
2017-04-30 2:28 GMT+02:00 Ilia Mirkin <imirkin at alum.mit.edu>:
> Maybe in a separate change. I'd want to double check on all gens. I think
> the thing I suggested is sufficient.
>
well, if I just fixup the op, I kind of have to fix the mod as well.
And if I use getOp, it could also
2014 Jul 06
0
[PATCH v5] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32
No piglit regressions observed on nv50 and nvc0!
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
V2: fix usage of wrong variable
V3: enable F64 support
V4:
- disable F64 support again
- handle saturate flag: clamp to min/max if needed
V5: clamp before rounding to nearest
2014 Jul 05
1
[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32
No piglit regressions observed on nv50 and nvc0!
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
V2: fix usage of wrong variable
V3: enable F64 support
V4:
- disable F64 support again
- handle saturate flag: clamp to min/max if needed
2015 Jan 10
0
[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
On Fri, Jan 9, 2015 at 6:47 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
>
> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
> ---
> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++
> 1 file
2014 Jul 03
0
[PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32
No piglit regressions observed!
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 74 ++++++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2015 Jan 09
3
[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++
1 file changed, 109 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2015 Jan 11
0
[PATCH v2] nv50/ir: Handle OP_CVT when folding constant expressions
On Fri, Jan 9, 2015 at 8:24 PM, Tobias Klausmann
<tobias.johannes.klausmann at mni.thm.de> wrote:
> Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
>
> Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
> ---
> V2: beat me, whip me, split out F64
>
> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp |