thr3ads.net - similar to: "[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates"

Displaying 20 results from an estimated 300 matches similar to: "[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates"

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

2015 Feb 20

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index dfb093c..e38a3b8 100644 ---

[PATCH mesa 1/6] tgsi_build: Fix return of uninitialized memory in tgsi_*_instruction_memory

2016 Mar 16

[PATCH mesa 1/6] tgsi_build: Fix return of uninitialized memory in tgsi_*_instruction_memory

tgsi_default_instruction_memory / tgsi_build_instruction_memory were returning uninitialized memory for tgsi_instruction_memory.Texture and tgsi_instruction_memory.Format. Note 0 means not set, and thus is a correct default initializer for these. Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory") Cc: Nicolai Hähnle <nicolai.haehnle at amd.com>

[PATCH mesa 5/6] nouveau: codegen: Add support for OpenCL global memory buffers

2016 Mar 16

[PATCH mesa 5/6] nouveau: codegen: Add support for OpenCL global memory buffers

Could you please get rid of the cosmetic changes (eg. the switch ones)? Because this doesn't really improve readability and in my opinion these changes should be eventually done in a separate patch. Other than that, this patch is : Reviewed-by: Samuel Pitoiset <samuel.pitoiset at gmail.com> Yes, this probably won't work as is for atomic operations but the lowering pass is

[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

2015 Nov 07

[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Hi Hans, All pushed. I made a few additional fixes and improvement to fp64 immediate handling along the way, but all your commits were fine as-is. (Except that they enabled fp64 immediates on nv50 implicitly which is wrong -- there are no immediate-taking variants on nv50, so I fixed that glitch. But only the G200 can do fp64 in the first place, and nouveau doesn't actually expose it. Corner

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

2016 Mar 16

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

This approach leads to the emitters needing to know about both global and buffer, even though at that point, they are identical. I was thinking that in the lowering logic, buffer would just get rewritten as global (with the offset added), thus not needing any change to the emitters. What do you think about such an approach? On Mar 16, 2016 2:24 AM, "Hans de Goede" <hdegoede at

[PATCH envytools] envydis: gk110: Add support for dadd with an immediate src

2015 Nov 05

[PATCH envytools] envydis: gk110: Add support for dadd with an immediate src

This commit adds support for dadd with an immediate src in gk110 code. The machine-code in question is generated by e.g. nouveau_compiler with the new "Make use of double immediates" patch series when building the piglit glsl-algebraic-double-add.shader_test. This commit changes the output from: 00000010: 001c0001 c38001ff $r0 $r0 $r0 $r0 0x3fe00 0x3fe00 0x3fe0000000000000

[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction

2015 May 09

[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Pretty sure there's nothing wrong with it, but it looks odd in the code. src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +++++-- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-)

[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

2015 Aug 19

[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp

[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016 Oct 02

[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

On 02.10.2016 20:03, Ilia Mirkin wrote: > On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> Previously we'd end up with an unnecessary mov for the thirs immediate value. >> >> total instructions in shared programs : 851881 -> 851864 (-0.00%) >> total gprs used in shared programs : 110295 -> 110295

[PATCH] nv50/ir: make sure to reverse cond codes on all the OP_SET variants

2014 May 10

[PATCH] nv50/ir: make sure to reverse cond codes on all the OP_SET variants

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2 10.1" <mesa-stable at lists.freedesktop.org> --- Found this while tracking a regression on nvc0 for my patch which fixes ir_unop_any to emit or's instead of dp3's. (That patch is fine, this code was always broken.) src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 ++- 1 file changed, 2

[PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop

2017 Apr 03

[PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop

Slowly we are getting to the point, that we miss enough optimization opportunities as the result of our own passes. For this we need to fix AlgebraicOpt to be able to handle mods on sources without creating new issues. The last patch enables looping opts. Karol Herbst (3): nv50/ir: fix AlgebraicOpt for slcts with mods nv50/ir: handle logops with NOT in AlgebraicOpt nv50/ir: run some

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Does this give correct results for special floats (0, infs)? We tried to improve (for single floats) x86 rcp in llvmpipe with newton-raphson, but unfortunately not being able to give correct results for these two cases (without even more additional code) meant it got all disabled in the end (you can still see that code in the driver) since the problems are at least as bad as those due to bad

[PATCH 0/2] nvc0: support for GK20A (Tegra K1)

2014 May 27

[PATCH 0/2] nvc0: support for GK20A (Tegra K1)

The following 2 patches make it possible to run Mesa programs on GK20A (Tegra K1). GK20A is very similar to GK104, but uses a new (backward-compatible) 3D class as well as the same ISA as GK110 (SM35). Taking these differences into account is sufficient to successfully render simple off-screen buffers. Alexandre Courbot (2): nvc0: add GK20A 3D class nvc0: use SM35 ISA with GK20A

[PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop

2017 Apr 03

[PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop

Slowly we are getting to the point, that we miss enough optimization opportunities as the result of our own passes. For this we need to fix AlgebraicOpt to be able to handle mods on sources without creating new issues. The last patch enables looping opts. v2: update commit author Karol Herbst (3): nv50/ir: fix AlgebraicOpt for slcts with mods nv50/ir: handle logops with NOT in AlgebraicOpt

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 09

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp

[PATCH 2/2] nvc0: use SM35 ISA with GK20A

2014 May 27

[PATCH 2/2] nvc0: use SM35 ISA with GK20A

On Tue, May 27, 2014 at 12:59 AM, Alexandre Courbot <acourbot at nvidia.com> wrote: > GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use > the GK110 path when this chip is detected. > > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + >

[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

2016 Oct 02

[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD

On 02.10.2016 20:26, Ilia Mirkin wrote: > That's very odd. LoadPropagation should have picked that up even in > its current form. Should try to figure out why it didn't and that is > likely to "fix" a *lot* more situations. Actually i was coming from an, given really constrained, addition to the LoadPropagation pass, where i was told to fix it within OP_MAD :/ > On

'__builtin_nanl' and soft-FP64 support

2017 Sep 25

'__builtin_nanl' and soft-FP64 support

I am seeing failures in two tests after migrating to v5.0 final, these are: std/language.support/support.limits/limits/numeric.limits.members/quiet_NaN. pass.cpp and: std/language.support/support.limits/limits/numeric.limits.members/signaling_ NaN.pass.cpp However, these are new tests and it turns out that the underlying problem is that the builtin '__builtin_nanl("")' is

[PATCH v5 0/5] nvc0/ir: add support for MAD/FMA PostRALoadPropagation

2017 Mar 26

[PATCH v5 0/5] nvc0/ir: add support for MAD/FMA PostRALoadPropagation

was "nv50/ir: PostRaConstantFolding improvements" before. nothing really changed from the last version, just minor things. Karol Herbst (5): nv50/ir: restructure and rename postraconstantfolding pass nv50/ir: implement mad post ra folding for nvc0+ gk110/ir: add LIMM form of mad gm107/ir: add LIMM form of mad nv50/ir: also do PostRaLoadPropagation for FMA

'__builtin_nanl' and soft-FP64 support

2017 Sep 25

'__builtin_nanl' and soft-FP64 support

On 9/25/2017 5:35 AM, Martin J. O'Riordan via llvm-dev wrote: > > I am seeing failures in two tests after migrating to v5.0 final, these > are: > > std/language.support/support.limits/limits/numeric.limits.members/quiet_NaN.pass.cpp > > and: > > std/language.support/support.limits/limits/numeric.limits.members/signaling_NaN.pass.cpp > > However, these are new

similar to: [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates