thr3ads.net - similar to: "[PATCH] nv50/ir: we can't replace 0x0 with the zero reg for SHLADD"

Displaying 20 results from an estimated 1200 matches similar to: "[PATCH] nv50/ir: we can't replace 0x0 with the zero reg for SHLADD"

[PATCH] nv50/ir: we can't replace 0x0 with zero reg for SHLADD

2017 Apr 29

[PATCH] nv50/ir: we can't replace 0x0 with zero reg for SHLADD

fixes a crash in Alien Isolation Signed-off-by: Karol Herbst <karolherbst at gmail.com> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index 732e1a93b4..4815d6df07 100644 ---

[PATCH] nv50/ir: we can't replace 0x0 with zero reg for SHLADD

2017 Apr 29

[PATCH] nv50/ir: we can't replace 0x0 with zero reg for SHLADD

On Sat, Apr 29, 2017 at 10:41 AM, Karol Herbst <karolherbst at gmail.com> wrote: > fixes a crash in Alien Isolation What crash? How did the zero get there? Does this only happen if you do your optimization loop thing? In either case, we still want the replaceZero() logic. However that logic should be aware that the middle argument of a SHLADD is not to be touched. Otherwise we could end

[PATCH 1/2] nvc0/ir: avoid infinite recursion when finding first uses of tex

2014 Aug 30

[PATCH 1/2] nvc0/ir: avoid infinite recursion when finding first uses of tex

In certain circumstances, findFirstUses could end up doubling back on instructions it had already processed, resulting in an infinite recursion. Avoid this by keeping track of already-visited instructions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83079 Tested-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> Signed-off-by: Ilia Mirkin <imirkin at

[PATCH RESEND] nv50/ir: use unordered_set instead of list to keep track of var defs

2014 Dec 02

[PATCH RESEND] nv50/ir: use unordered_set instead of list to keep track of var defs

The set of variable defs does not need to be ordered in any way, and removing/adding elements is a fairly common operation in various optimization passes. This shortens runtime of piglit test fp-long-alu to ~11s from ~22s No piglit regressions observed on nvc0! Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp

[PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

2017 Aug 13

[PATCH v2] nvc0/ir: propagate immediates to CALL input MOVs

On using builtin functions we have to move the input to registers $0 and $1, if one of the input value is an immediate, we fail to propagate the immediate: ... mov u32 $r477 0x00000003 (0) ... mov u32 $r0 %r473 (0) mov u32 $r1 $r477 (0) call abs BUILTIN:0 (0) mov u32 %r495 $r1 (0) ... With this patch the immediate is propagated, potentially causing the first MOV to be superfluous, which we'd

[PATCH] nvc0/ir: propagate immediates to CALL input MOVs

2017 Aug 12

[PATCH] nvc0/ir: propagate immediates to CALL input MOVs

On Sat, Aug 12, 2017 at 3:33 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote: > On using builtin functions we have to move the input to registers $0 and $1, if > one of the input value is an immediate, we fail to propagate the immediate: > > ... > mov u32 $r477 0x00000003 (0) > ... > mov u32 $r0 %r473 (0) > mov u32 $r1 $r477 (0) > call abs

[PATCH] nvc0/ir: propagate immediates to CALL input MOVs

2017 Aug 12

[PATCH] nvc0/ir: propagate immediates to CALL input MOVs

[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case

2014 Aug 08

[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index f010767..4a9e48f 100644 ---

[PATCH 1/2] nvc0/ir: use manual TXD when offsets are involved

2014 Jul 05

[PATCH 1/2] nvc0/ir: use manual TXD when offsets are involved

Something about how we're implementing offsets for TXD is wrong, just flip to the generic quadop-based implementation in that case. This is the minimal fix appropriate for backporting. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: <mesa-stable at lists.freedesktop.org> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 3 ++- 1 file changed, 2

[PATCH] nvc0: do quadops on the right texture coordinates for TXD

2014 Jul 05

[PATCH] nvc0: do quadops on the right texture coordinates for TXD

handleTEX moves the layer as the first argument. This makes sure that the quadops deal with the texture coordinates. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: <mesa-stable at lists.freedesktop.org> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git

[PATCH] nv50/ir: fix texture offsets in release builds

2015 Jan 04

[PATCH] nv50/ir: fix texture offsets in release builds

assert's get compiled out in release builds, so they can't be relied upon to perform logic. Reported-by: Pierre Moreau <pierre.morrow at free.fr> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2 10.3 10.4" <mesa-stable at lists.freedesktop.org> --- src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp | 3 ++-

[PATCH] nv50/ir: change the way float face is returned

2015 Jan 05

[PATCH] nv50/ir: change the way float face is returned

The old way made it impossible for the optimizer to reason about what was going on. The new way is the same number of instructions (the neg gets folded into the cvt) but enables the optimizer to be cleverer if comparing to a constant (most common case). [The optimizer is presently not sufficiently clever to work this out, but it could relatively easily be made to be. The old way would have

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Not sure how many steps are needed for the necessary accuracy. Just doing 2 because that seems like a reasonable number. .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 42 ++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp

[PATCH] gm107/ir: fix texture argument order

2014 Sep 25

[PATCH] gm107/ir: fix texture argument order

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.3" <mesa-stable at lists.freedesktop.org> --- With this, all the tex-miplevel-selection tests pass on maxwell. There is a minor bit of this change which affects textureGrad on kepler that I have yet to test, but I'm moderately sure it's correct and was only working by luck before. (Changing the insbf to use

[PATCH] nvc0/ir: move sample id to second source arg to fix sampler2DMS

2014 Mar 20

[PATCH] nvc0/ir: move sample id to second source arg to fix sampler2DMS

The nvc0 texfetch instruction expects the sample id to be in the second source (usually used for the offset) rather than as part of the texture coordinate. This fixes all the sampler2DMS/Array tests on nvc0. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.1" <mesa-stable at lists.freedesktop.org> --- Tested on nvc1 with a full piglit run, no regressions,

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Oh right. I think the NVIDIA blob executes those steps conditionally based on the upper bits not being 0x7ff (== infinity/nan). I should do the same thing here. [FWIW I was able to test the nv50 code last night and that one's a total fail for rcp/rsq... will need to port that over to my nvc0 and debug there.] On Mon, Feb 23, 2015 at 8:24 AM, Roland Scheidegger <sroland at vmware.com>

[RFC mesa] nouveau: Add support for OpenCL global memory buffers

2016 Mar 14

[RFC mesa] nouveau: Add support for OpenCL global memory buffers

There's a less hacky and more hacky way forward. The more hacky solution is to set file index to -1 or something and then not do the lowering when you see that. The less hacky solution is the one you proposed as #1 - introduce a new file for "buffer" memory and lower it to the global file by adding a base offset. Right now the meaning of global is overloaded - before lowering it

[RFC mesa] nouveau: Add support for OpenCL global memory buffers

2016 Mar 14

[RFC mesa] nouveau: Add support for OpenCL global memory buffers

On 03/14/2016 04:28 PM, Hans de Goede wrote: > Hi, > > On 14-03-16 16:05, Ilia Mirkin wrote: >> There's a less hacky and more hacky way forward. The more hacky >> solution is >> to set file index to -1 or something and then not do the lowering when >> you >> see that. >> >> The less hacky solution is the one you proposed as #1 - introduce a

[PATCH 1/2] nvc0: add support for texture gather

2014 Apr 04

[PATCH 1/2] nvc0: add support for texture gather

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Tested on NVE6. Very strange that it seems to use 8 bits for offsets, vs 4 bits used by texelFetch. But this passes the piglit tests. Will test on a NVCX before checking in, in case it's different there. (Although that'd be surprising, given the similarities between the 2 ISAs.)

[RFC mesa] nouveau: Add support for OpenCL global memory buffers

2016 Mar 14

[RFC mesa] nouveau: Add support for OpenCL global memory buffers

On 03/14/2016 08:50 PM, Hans de Goede wrote: > Hi, > > On 14-03-16 16:41, Samuel Pitoiset wrote: >> >> >> On 03/14/2016 04:28 PM, Hans de Goede wrote: >>> Hi, >>> >>> On 14-03-16 16:05, Ilia Mirkin wrote: >>>> There's a less hacky and more hacky way forward. The more hacky >>>> solution is >>>> to set

similar to: [PATCH] nv50/ir: we can't replace 0x0 with the zero reg for SHLADD