similar to: [PATCH] nv50/ir: we can't replace 0x0 with the zero reg for SHLADD

Displaying 20 results from an estimated 1200 matches similar to: "[PATCH] nv50/ir: we can't replace 0x0 with the zero reg for SHLADD"

2017 Apr 29
2
[PATCH] nv50/ir: we can't replace 0x0 with zero reg for SHLADD
fixes a crash in Alien Isolation Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index 732e1a93b4..4815d6df07 100644 ---
2017 Apr 29
0
[PATCH] nv50/ir: we can't replace 0x0 with zero reg for SHLADD
On Sat, Apr 29, 2017 at 10:41 AM, Karol Herbst <karolherbst at gmail.com> wrote: > fixes a crash in Alien Isolation What crash? How did the zero get there? Does this only happen if you do your optimization loop thing? In either case, we still want the replaceZero() logic. However that logic should be aware that the middle argument of a SHLADD is not to be touched. Otherwise we could end
2014 Aug 30
2
[PATCH 1/2] nvc0/ir: avoid infinite recursion when finding first uses of tex
In certain circumstances, findFirstUses could end up doubling back on instructions it had already processed, resulting in an infinite recursion. Avoid this by keeping track of already-visited instructions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83079 Tested-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> Signed-off-by: Ilia Mirkin <imirkin at
2014 Dec 02
0
[PATCH RESEND] nv50/ir: use unordered_set instead of list to keep track of var defs
The set of variable defs does not need to be ordered in any way, and removing/adding elements is a fairly common operation in various optimization passes. This shortens runtime of piglit test fp-long-alu to ~11s from ~22s No piglit regressions observed on nvc0! Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
2017 Aug 13
1
[PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd
2017 Aug 12
0
[PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote: > On using builtin functions we have to move the input to registers $0 and $1, if > one of the input value is an immediate, we fail to propagate the immediate: > > ... > mov u32 $r477 0x00000003 (0) > ... > mov u32 $r0 %r473 (0) > mov u32 $r1 $r477 (0) > call abs
2017 Aug 12
3
[PATCH] nvc0/ir: propagate immediates to CALL input MOVs
On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd
2014 Aug 08
2
[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index f010767..4a9e48f 100644 ---
2014 Jul 05
1
[PATCH 1/2] nvc0/ir: use manual TXD when offsets are involved
Something about how we're implementing offsets for TXD is wrong, just flip to the generic quadop-based implementation in that case. This is the minimal fix appropriate for backporting. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: <mesa-stable at lists.freedesktop.org> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++- 1 file changed, 2
2014 Jul 05
0
[PATCH] nvc0: do quadops on the right texture coordinates for TXD
handleTEX moves the layer as the first argument. This makes sure that the quadops deal with the texture coordinates. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: <mesa-stable at lists.freedesktop.org> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git
2015 Jan 04
0
[PATCH] nv50/ir: fix texture offsets in release builds
assert's get compiled out in release builds, so they can't be relied upon to perform logic. Reported-by: Pierre Moreau <pierre.morrow at free.fr> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2 10.3 10.4" <mesa-stable at lists.freedesktop.org> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp | 3 ++-
2015 Jan 05
0
[PATCH] nv50/ir: change the way float face is returned
The old way made it impossible for the optimizer to reason about what was going on. The new way is the same number of instructions (the neg gets folded into the cvt) but enables the optimizer to be cleverer if comparing to a constant (most common case). [The optimizer is presently not sufficiently clever to work this out, but it could relatively easily be made to be. The old way would have
2015 Feb 23
0
[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Not sure how many steps are needed for the necessary accuracy. Just doing 2 because that seems like a reasonable number. .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 42 ++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
2014 Sep 25
0
[PATCH] gm107/ir: fix texture argument order
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.3" <mesa-stable at lists.freedesktop.org> --- With this, all the tex-miplevel-selection tests pass on maxwell. There is a minor bit of this change which affects textureGrad on kepler that I have yet to test, but I'm moderately sure it's correct and was only working by luck before. (Changing the insbf to use
2014 Mar 20
0
[PATCH] nvc0/ir: move sample id to second source arg to fix sampler2DMS
The nvc0 texfetch instruction expects the sample id to be in the second source (usually used for the offset) rather than as part of the texture coordinate. This fixes all the sampler2DMS/Array tests on nvc0. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.1" <mesa-stable at lists.freedesktop.org> --- Tested on nvc1 with a full piglit run, no regressions,
2015 Feb 23
0
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Oh right. I think the NVIDIA blob executes those steps conditionally based on the upper bits not being 0x7ff (== infinity/nan). I should do the same thing here. [FWIW I was able to test the nv50 code last night and that one's a total fail for rcp/rsq... will need to port that over to my nvc0 and debug there.] On Mon, Feb 23, 2015 at 8:24 AM, Roland Scheidegger <sroland at vmware.com>
2016 Mar 14
0
[RFC mesa] nouveau: Add support for OpenCL global memory buffers
There's a less hacky and more hacky way forward. The more hacky solution is to set file index to -1 or something and then not do the lowering when you see that. The less hacky solution is the one you proposed as #1 - introduce a new file for "buffer" memory and lower it to the global file by adding a base offset. Right now the meaning of global is overloaded - before lowering it
2016 Mar 14
0
[RFC mesa] nouveau: Add support for OpenCL global memory buffers
On 03/14/2016 04:28 PM, Hans de Goede wrote: > Hi, > > On 14-03-16 16:05, Ilia Mirkin wrote: >> There's a less hacky and more hacky way forward. The more hacky >> solution is >> to set file index to -1 or something and then not do the lowering when >> you >> see that. >> >> The less hacky solution is the one you proposed as #1 - introduce a
2014 Apr 04
2
[PATCH 1/2] nvc0: add support for texture gather
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Tested on NVE6. Very strange that it seems to use 8 bits for offsets, vs 4 bits used by texelFetch. But this passes the piglit tests. Will test on a NVCX before checking in, in case it's different there. (Although that'd be surprising, given the similarities between the 2 ISAs.)
2016 Mar 14
0
[RFC mesa] nouveau: Add support for OpenCL global memory buffers
On 03/14/2016 08:50 PM, Hans de Goede wrote: > Hi, > > On 14-03-16 16:41, Samuel Pitoiset wrote: >> >> >> On 03/14/2016 04:28 PM, Hans de Goede wrote: >>> Hi, >>> >>> On 14-03-16 16:05, Ilia Mirkin wrote: >>>> There's a less hacky and more hacky way forward. The more hacky >>>> solution is >>>> to set