similar to: [PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Displaying 20 results from an estimated 300 matches similar to: "[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates"

2015 Feb 20
10
[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index dfb093c..e38a3b8 100644 ---
2016 Mar 16
13
[PATCH mesa 1/6] tgsi_build: Fix return of uninitialized memory in tgsi_*_instruction_memory
tgsi_default_instruction_memory / tgsi_build_instruction_memory were returning uninitialized memory for tgsi_instruction_memory.Texture and tgsi_instruction_memory.Format. Note 0 means not set, and thus is a correct default initializer for these. Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory") Cc: Nicolai Hähnle <nicolai.haehnle at amd.com>
2016 Mar 16
2
[PATCH mesa 5/6] nouveau: codegen: Add support for OpenCL global memory buffers
Could you please get rid of the cosmetic changes (eg. the switch ones)? Because this doesn't really improve readability and in my opinion these changes should be eventually done in a separate patch. Other than that, this patch is : Reviewed-by: Samuel Pitoiset <samuel.pitoiset at gmail.com> Yes, this probably won't work as is for atomic operations but the lowering pass is
2015 Nov 07
0
[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates
Hi Hans, All pushed. I made a few additional fixes and improvement to fp64 immediate handling along the way, but all your commits were fine as-is. (Except that they enabled fp64 immediates on nv50 implicitly which is wrong -- there are no immediate-taking variants on nv50, so I fixed that glitch. But only the G200 can do fp64 in the first place, and nouveau doesn't actually expose it. Corner
2016 Mar 16
2
[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/
This approach leads to the emitters needing to know about both global and buffer, even though at that point, they are identical. I was thinking that in the lowering logic, buffer would just get rewritten as global (with the offset added), thus not needing any change to the emitters. What do you think about such an approach? On Mar 16, 2016 2:24 AM, "Hans de Goede" <hdegoede at
2015 Nov 05
1
[PATCH envytools] envydis: gk110: Add support for dadd with an immediate src
This commit adds support for dadd with an immediate src in gk110 code. The machine-code in question is generated by e.g. nouveau_compiler with the new "Make use of double immediates" patch series when building the piglit glsl-algebraic-double-add.shader_test. This commit changes the output from: 00000010: 001c0001 c38001ff $r0 $r0 $r0 $r0 0x3fe00 0x3fe00 0x3fe0000000000000
2015 May 09
5
[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Pretty sure there's nothing wrong with it, but it looks odd in the code. src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +++++-- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-)
2015 Aug 19
5
[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2016 Oct 02
2
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote: > On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> Previously we'd end up with an unnecessary mov for the thirs immediate value. >> >> total instructions in shared programs : 851881 -> 851864 (-0.00%) >> total gprs used in shared programs : 110295 -> 110295
2014 May 10
1
[PATCH] nv50/ir: make sure to reverse cond codes on all the OP_SET variants
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2 10.1" <mesa-stable at lists.freedesktop.org> --- Found this while tracking a regression on nvc0 for my patch which fixes ir_unop_any to emit or's instead of dp3's. (That patch is fine, this code was always broken.) src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 ++- 1 file changed, 2
2017 Apr 03
3
[PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization opportunities as the result of our own passes. For this we need to fix AlgebraicOpt to be able to handle mods on sources without creating new issues. The last patch enables looping opts. Karol Herbst (3): nv50/ir: fix AlgebraicOpt for slcts with mods nv50/ir: handle logops with NOT in AlgebraicOpt nv50/ir: run some
2015 Feb 23
2
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Does this give correct results for special floats (0, infs)? We tried to improve (for single floats) x86 rcp in llvmpipe with newton-raphson, but unfortunately not being able to give correct results for these two cases (without even more additional code) meant it got all disabled in the end (you can still see that code in the driver) since the problems are at least as bad as those due to bad
2014 May 27
8
[PATCH 0/2] nvc0: support for GK20A (Tegra K1)
The following 2 patches make it possible to run Mesa programs on GK20A (Tegra K1). GK20A is very similar to GK104, but uses a new (backward-compatible) 3D class as well as the same ISA as GK110 (SM35). Taking these differences into account is sufficient to successfully render simple off-screen buffers. Alexandre Courbot (2): nvc0: add GK20A 3D class nvc0: use SM35 ISA with GK20A
2017 Apr 03
5
[PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization opportunities as the result of our own passes. For this we need to fix AlgebraicOpt to be able to handle mods on sources without creating new issues. The last patch enables looping opts. v2: update commit author Karol Herbst (3): nv50/ir: fix AlgebraicOpt for slcts with mods nv50/ir: handle logops with NOT in AlgebraicOpt
2015 Jan 09
3
[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2014 May 27
1
[PATCH 2/2] nvc0: use SM35 ISA with GK20A
On Tue, May 27, 2014 at 12:59 AM, Alexandre Courbot <acourbot at nvidia.com> wrote: > GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use > the GK110 path when this chip is detected. > > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 + >
2016 Oct 02
1
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:26, Ilia Mirkin wrote: > That's very odd. LoadPropagation should have picked that up even in > its current form. Should try to figure out why it didn't and that is > likely to "fix" a *lot* more situations. Actually i was coming from an, given really constrained, addition to the LoadPropagation pass, where i was told to fix it within OP_MAD :/ > On
2017 Sep 25
2
'__builtin_nanl' and soft-FP64 support
I am seeing failures in two tests after migrating to v5.0 final, these are: std/language.support/support.limits/limits/numeric.limits.members/quiet_NaN. pass.cpp and: std/language.support/support.limits/limits/numeric.limits.members/signaling_ NaN.pass.cpp However, these are new tests and it turns out that the underlying problem is that the builtin '__builtin_nanl("")' is
2017 Mar 26
5
[PATCH v5 0/5] nvc0/ir: add support for MAD/FMA PostRALoadPropagation
was "nv50/ir: PostRaConstantFolding improvements" before. nothing really changed from the last version, just minor things. Karol Herbst (5): nv50/ir: restructure and rename postraconstantfolding pass nv50/ir: implement mad post ra folding for nvc0+ gk110/ir: add LIMM form of mad gm107/ir: add LIMM form of mad nv50/ir: also do PostRaLoadPropagation for FMA
2017 Sep 25
0
'__builtin_nanl' and soft-FP64 support
On 9/25/2017 5:35 AM, Martin J. O'Riordan via llvm-dev wrote: > > I am seeing failures in two tests after migrating to v5.0 final, these > are: > > std/language.support/support.limits/limits/numeric.limits.members/quiet_NaN.pass.cpp > > and: > > std/language.support/support.limits/limits/numeric.limits.members/signaling_NaN.pass.cpp > > However, these are new