Displaying 20 results from an estimated 300 matches similar to: "[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates"
2015 Feb 20
10
[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++-
1 file changed, 63 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index dfb093c..e38a3b8 100644
---
2016 Mar 16
13
[PATCH mesa 1/6] tgsi_build: Fix return of uninitialized memory in tgsi_*_instruction_memory
tgsi_default_instruction_memory / tgsi_build_instruction_memory were
returning uninitialized memory for tgsi_instruction_memory.Texture and
tgsi_instruction_memory.Format. Note 0 means not set, and thus is a
correct default initializer for these.
Fixes: 3243b6fc97 ("tgsi: add Texture and Format to tgsi_instruction_memory")
Cc: Nicolai Hähnle <nicolai.haehnle at amd.com>
2016 Mar 16
2
[PATCH mesa 5/6] nouveau: codegen: Add support for OpenCL global memory buffers
Could you please get rid of the cosmetic changes (eg. the switch ones)?
Because this doesn't really improve readability and in my opinion these
changes should be eventually done in a separate patch.
Other than that, this patch is :
Reviewed-by: Samuel Pitoiset <samuel.pitoiset at gmail.com>
Yes, this probably won't work as is for atomic operations but the
lowering pass is
2015 Nov 07
0
[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates
Hi Hans,
All pushed. I made a few additional fixes and improvement to fp64
immediate handling along the way, but all your commits were fine
as-is. (Except that they enabled fp64 immediates on nv50 implicitly
which is wrong -- there are no immediate-taking variants on nv50, so I
fixed that glitch. But only the G200 can do fp64 in the first place,
and nouveau doesn't actually expose it. Corner
2016 Mar 16
2
[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/
This approach leads to the emitters needing to know about both global and
buffer, even though at that point, they are identical. I was thinking that
in the lowering logic, buffer would just get rewritten as global (with the
offset added), thus not needing any change to the emitters. What do you
think about such an approach?
On Mar 16, 2016 2:24 AM, "Hans de Goede" <hdegoede at
2015 Nov 05
1
[PATCH envytools] envydis: gk110: Add support for dadd with an immediate src
This commit adds support for dadd with an immediate src in gk110 code.
The machine-code in question is generated by e.g. nouveau_compiler with
the new "Make use of double immediates" patch series when building the
piglit glsl-algebraic-double-add.shader_test.
This commit changes the output from:
00000010: 001c0001 c38001ff $r0 $r0 $r0 $r0 0x3fe00 0x3fe00
0x3fe0000000000000
2015 May 09
5
[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Pretty sure there's nothing wrong with it, but it looks odd in the code.
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +++++--
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++
3 files changed, 9 insertions(+), 2 deletions(-)
2015 Aug 19
5
[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++-------
1 file changed, 46 insertions(+), 20 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2016 Oct 02
2
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:03, Ilia Mirkin wrote:
> On Sun, Oct 2, 2016 at 1:58 PM, Tobias Klausmann
> <tobias.johannes.klausmann at mni.thm.de> wrote:
>> Previously we'd end up with an unnecessary mov for the thirs immediate value.
>>
>> total instructions in shared programs : 851881 -> 851864 (-0.00%)
>> total gprs used in shared programs : 110295 -> 110295
2014 May 10
1
[PATCH] nv50/ir: make sure to reverse cond codes on all the OP_SET variants
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
Cc: "10.2 10.1" <mesa-stable at lists.freedesktop.org>
---
Found this while tracking a regression on nvc0 for my patch which fixes
ir_unop_any to emit or's instead of dp3's. (That patch is fine, this code was
always broken.)
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 3 ++-
1 file changed, 2
2017 Apr 03
3
[PATCH 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.
For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.
The last patch enables looping opts.
Karol Herbst (3):
nv50/ir: fix AlgebraicOpt for slcts with mods
nv50/ir: handle logops with NOT in AlgebraicOpt
nv50/ir: run some
2015 Feb 23
2
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Does this give correct results for special floats (0, infs)?
We tried to improve (for single floats) x86 rcp in llvmpipe with
newton-raphson, but unfortunately not being able to give correct results
for these two cases (without even more additional code) meant it got all
disabled in the end (you can still see that code in the driver) since
the problems are at least as bad as those due to bad
2014 May 27
8
[PATCH 0/2] nvc0: support for GK20A (Tegra K1)
The following 2 patches make it possible to run Mesa programs on GK20A
(Tegra K1).
GK20A is very similar to GK104, but uses a new (backward-compatible) 3D class
as well as the same ISA as GK110 (SM35). Taking these differences into account
is sufficient to successfully render simple off-screen buffers.
Alexandre Courbot (2):
nvc0: add GK20A 3D class
nvc0: use SM35 ISA with GK20A
2017 Apr 03
5
[PATCH v2 0/3] nv50/ir: Preapre for running Opts inside a loop
Slowly we are getting to the point, that we miss enough optimization
opportunities as the result of our own passes.
For this we need to fix AlgebraicOpt to be able to handle mods on sources
without creating new issues.
The last patch enables looping opts.
v2: update commit author
Karol Herbst (3):
nv50/ir: fix AlgebraicOpt for slcts with mods
nv50/ir: handle logops with NOT in AlgebraicOpt
2015 Jan 09
3
[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++
1 file changed, 109 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2014 May 27
1
[PATCH 2/2] nvc0: use SM35 ISA with GK20A
On Tue, May 27, 2014 at 12:59 AM, Alexandre Courbot <acourbot at nvidia.com> wrote:
> GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use
> the GK110 path when this chip is detected.
>
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
> ---
> src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h | 1 +
>
2016 Oct 02
1
[PATCH] nv50/ir: Propagate third immediate src when folding OP_MAD
On 02.10.2016 20:26, Ilia Mirkin wrote:
> That's very odd. LoadPropagation should have picked that up even in
> its current form. Should try to figure out why it didn't and that is
> likely to "fix" a *lot* more situations.
Actually i was coming from an, given really constrained, addition to the
LoadPropagation pass, where i was told to fix it within OP_MAD :/
> On
2017 Sep 25
2
'__builtin_nanl' and soft-FP64 support
I am seeing failures in two tests after migrating to v5.0 final, these are:
std/language.support/support.limits/limits/numeric.limits.members/quiet_NaN.
pass.cpp
and:
std/language.support/support.limits/limits/numeric.limits.members/signaling_
NaN.pass.cpp
However, these are new tests and it turns out that the underlying problem is
that the builtin '__builtin_nanl("")' is
2017 Mar 26
5
[PATCH v5 0/5] nvc0/ir: add support for MAD/FMA PostRALoadPropagation
was "nv50/ir: PostRaConstantFolding improvements" before.
nothing really changed from the last version, just minor things.
Karol Herbst (5):
nv50/ir: restructure and rename postraconstantfolding pass
nv50/ir: implement mad post ra folding for nvc0+
gk110/ir: add LIMM form of mad
gm107/ir: add LIMM form of mad
nv50/ir: also do PostRaLoadPropagation for FMA
2017 Sep 25
0
'__builtin_nanl' and soft-FP64 support
On 9/25/2017 5:35 AM, Martin J. O'Riordan via llvm-dev wrote:
>
> I am seeing failures in two tests after migrating to v5.0 final, these
> are:
>
> std/language.support/support.limits/limits/numeric.limits.members/quiet_NaN.pass.cpp
>
> and:
>
> std/language.support/support.limits/limits/numeric.limits.members/signaling_NaN.pass.cpp
>
> However, these are new