Displaying 20 results from an estimated 500 matches similar to: "nv50/ir: Implement short notation for MAD V2"
2015 Jan 11
6
[PATCH 1/3] nv50/ir: Add support for MAD short+IMM notation
MAD IMM has a very specific SDST == SSRC2 requirement, so don't emit
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
.../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 18 ++++++++++++------
.../drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 2 +-
2 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
2015 Jan 23
3
[PATCH 1/2] nv50/ir: Add support for MAD short+IMM notation
Add emission rules for negative and saturate flags for MAD 4-byte opcodes,
and get rid of constraints. Short MAD has a very specific SDST == SSRC2
requirement, and since MAD IMM is short notation + 4-byte immediate, don't
have the compiler create MAD IMM instructions yet.
V2: Document MAD as supported short form
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
2015 Feb 06
2
[PATCH 1/3] nv50/ir: Add support for MAD 4-byte opcode
Add emission rules for negative and saturate flags for MAD 4-byte opcodes,
and get rid of some of the constraints. Obviously tested with a wide variety
of shaders.
V2: Document MAD as supported short form
V3: Split up IMM from short-form modifiers
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 10 ++++------
2017 Mar 26
5
[PATCH v5 0/5] nvc0/ir: add support for MAD/FMA PostRALoadPropagation
was "nv50/ir: PostRaConstantFolding improvements" before.
nothing really changed from the last version, just minor things.
Karol Herbst (5):
nv50/ir: restructure and rename postraconstantfolding pass
nv50/ir: implement mad post ra folding for nvc0+
gk110/ir: add LIMM form of mad
gm107/ir: add LIMM form of mad
nv50/ir: also do PostRaLoadPropagation for FMA
2015 Jan 11
1
[PATCH 1/3] nv50/ir: Add support for MAD short+IMM notation
Op 11-01-15 om 01:34 schreef Ilia Mirkin:
> And you're allowing saturate/neg emission on the short form.
Yes
> Is this already in envytools?
Tesla floating point instructions are poorly documented in the RST
documents; fmad is no exception. I'll make sure to check envydis.
> Also, what's the shortForm thing?
Documented in envytools; see
2015 Jan 11
0
[PATCH 3/3] nv50/ir: Fold IMM into MAD
Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it is required that SDST == SSRC2.
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 52 ++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git
2015 Jan 13
0
[PATCH 2/3] nv50/ir: Fold IMM into MAD
Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it requires that SDST == SSRC2.
V2: improve readability and add comments to clarify decisions
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 60
2015 Jan 23
0
[PATCH 2/2] nv50/ir: Fold IMM into MAD
Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it requires that SDST == SSRC2.
V2: improve readability and add comments to clarify decisions
V3: Remove redundant code... compiler already attempts to put the IMM in
SSRC1
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
2015 Feb 06
0
[PATCH 3/3] nv50/ir: Fold IMM into MAD
Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it requires that SDST == SSRC2.
V2: improve readability and add comments to clarify decisions
V3: Remove redundant code... compiler already attempts to put the IMM in
SSRC1
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
2014 Jan 13
20
[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2
OK, so there's a bunch of stuff in here. The geometry stuff is based on the
work started by Bryan Cain and Christoph Bumiller.
Patches 01-12: Add support for geometry shaders and fix related issues
Patches 13-14: Make it possible for fb clears to operate on texture attachments
with an explicit layer set (as is allowed in gl 3.2).
Patches 15-17: Make ARB_texture_multisample work
2016 Mar 16
13
[PATCH mesa 1/6] tgsi_build: Fix return of uninitialized memory in tgsi_*_instruction_memory
tgsi_default_instruction_memory / tgsi_build_instruction_memory were
returning uninitialized memory for tgsi_instruction_memory.Texture and
tgsi_instruction_memory.Format. Note 0 means not set, and thus is a
correct default initializer for these.
Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory")
Cc: Nicolai Hähnle <nicolai.haehnle at amd.com>
2015 Jan 11
0
[PATCH 1/3] nv50/ir: Add support for MAD short+IMM notation
And you're allowing saturate/neg emission on the short form. Is this
already in envytools? Also, what's the shortForm thing? This change is
probably fine, but the changelog needs work.
On Sat, Jan 10, 2015 at 7:22 PM, Roy Spliet <rspliet at eclipso.eu> wrote:
> MAD IMM has a very specific SDST == SSRC2 requirement, so don't emit
>
> Signed-off-by: Roy Spliet <rspliet
2015 Feb 20
10
[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++-
1 file changed, 63 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index dfb093c..e38a3b8 100644
---
2017 Apr 03
5
[PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.
For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.
The last patch enables looping opts.
v2: update commit author
Karol Herbst (3):
nv50/ir: fix AlgebraicOpt for slcts with mods
nv50/ir: handle logops with NOT in AlgebraicOpt
2017 Apr 03
3
[PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.
For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.
The last patch enables looping opts.
Karol Herbst (3):
nv50/ir: fix AlgebraicOpt for slcts with mods
nv50/ir: handle logops with NOT in AlgebraicOpt
nv50/ir: run some
2016 Mar 16
2
[PATCH mesa 5/6] nouveau: codegen: Add support for OpenCL global memory buffers
Could you please get rid of the cosmetic changes (eg. the switch ones)?
Because this doesn't really improve readability and in my opinion these
changes should be eventually done in a separate patch.
Other than that, this patch is :
Reviewed-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Yes, this probably won't work as is for atomic operations but the
lowering pass is
2015 May 09
2
[PATCH 3/4] nvc0/ir: optimize set & 1.0 to produce boolean-float sets
On 09.05.2015 07:35, Ilia Mirkin wrote:
> This has started to happen more now that the backend is producing
> KILL_IF more often.
>
> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
> ---
> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 29 ++++++++++++++++++++++
> .../nouveau/codegen/nv50_ir_target_nv50.cpp | 2 ++
> 2 files changed, 31
2015 Jan 11
0
[PATCH 2/3] nv50/ir: For MAD, prefer SDST == SSRC2
If liveness analysis indicates it's good, this should improve the chances
of being able to emit the short MAD form.
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
2016 Mar 16
2
[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/
This approach leads to the emitters needing to know about both global and
buffer, even though at that point, they are identical. I was thinking that
in the lowering logic, buffer would just get rewritten as global (with the
offset added), thus not needing any change to the emitters. What do you
think about such an approach?
On Mar 16, 2016 2:24 AM, "Hans de Goede" <hdegoede at
2015 May 09
5
[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Pretty sure there's nothing wrong with it, but it looks odd in the code.
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +++++--
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++
3 files changed, 9 insertions(+), 2 deletions(-)