thr3ads.net - similar to: "Handling UMAD with a negative modifier, or why glsl-fs-atan-3 was failing"

Displaying 20 results from an estimated 600 matches similar to: "Handling UMAD with a negative modifier, or why glsl-fs-atan-3 was failing"

[PATCH] nv50/ir: change the way float face is returned

2015 Jan 05

[PATCH] nv50/ir: change the way float face is returned

The old way made it impossible for the optimizer to reason about what was going on. The new way is the same number of instructions (the neg gets folded into the cvt) but enables the optimizer to be cleverer if comparing to a constant (most common case). [The optimizer is presently not sufficiently clever to work this out, but it could relatively easily be made to be. The old way would have

[PATCH v3 1/2] nv50/ir: Add support for the double Type to BuildUtil

2014 Jul 03

[PATCH v3 1/2] nv50/ir: Add support for the double Type to BuildUtil

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_build_util.cpp | 17 +++++++++++++++++ .../drivers/nouveau/codegen/nv50_ir_build_util.h | 2 ++ 2 files changed, 19 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp

[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

2015 Aug 19

[PATCH 1/2] nvc0/ir: detect AND/SHR pairs and convert into EXTBF

Some shaders appear to extract bits using shift/and combos. Detect (some) of those and convert to EXTBF instead. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 66 +++++++++++++++------- 1 file changed, 46 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

2019 Oct 14

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

Unfortuantely we don't know if a particular load is a real 2d image (as would be a cube face or 2d array element), or a layer of a 3d image. Since we pass in the TIC reference, the instruction's type has to match what's in the TIC (experimentally). In order to properly support bindless images, this also can't be done by looking at the current bindings and generating appropriate

branch cuts of atan()

2005 May 16

branch cuts of atan()

Hi the following gave me a shock: > atan(2) [1] 1.107149 > atan(2+0i) [1] -0.4636476+0i > or, perhaps more of a gotcha: > atan(1.0001+0i) [1] -0.7853482+0i > atan(0.9999+0i) [1] 0.7853482+0i > evidently atan()'s branch cuts aren't where I thought they were. Where do I look for documentation on this? -- Robin Hankin Uncertainty Analyst National

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

2015 Feb 20

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index dfb093c..e38a3b8 100644 ---

derivative of atan(x) and similar functions

2004 Jan 21

derivative of atan(x) and similar functions

Dear R experts. 'D()' function recognizes some of the analitical functions, such as sin, cos, etc. But I'd like to take analytical derivatives from asin, atan etc. functions. Are there any R packages providing that features? Thanks. -- Timur.

[PATCH] nv50/ir: use unordered_set instead of list to keep track of var defs

2014 Sep 01

[PATCH] nv50/ir: use unordered_set instead of list to keep track of var defs

The set of variable defs does not need to be ordered in any way, and removing/adding elements is a fairly common operation in various optimization passes. This shortens runtime of piglit test fp-long-alu to ~11s from ~22s No piglit regressions observed on nvc0! Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir.cpp

[PATCH RESEND] nv50/ir: use unordered_set instead of list to keep track of var defs

2014 Dec 02

[PATCH RESEND] nv50/ir: use unordered_set instead of list to keep track of var defs

[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)

2015 Feb 23

[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Untested beyond compiling a few shaders to see if they look like they might work. nvdisasm agrees with envydis's decoding of these things. Will definitely get ahold of a G200 to run tests on before pushing this. .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 94 ++++++++++++++++++---

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

On Tue, Dec 19, 2017 at 11:41 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > This is parallel to the pre-SM50 change which does this. Adjusts the > shuffles / quadops to make the values correct relative to lane 0, and > then splat the results to all lanes for the final move into the target > register. > > Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> >

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Am 13.06.2017 um 02:05 schrieb Ilia Mirkin: > On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing

[PATCH] nv50/ra: Only increment DefValue counter if we are going to spill

2017 Aug 19

[PATCH] nv50/ra: Only increment DefValue counter if we are going to spill

This is in preparation of an upcoming patch changing how we keep track of the defs. Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

This is parallel to the pre-SM50 change which does this. Adjusts the shuffles / quadops to make the values correct relative to lane 0, and then splat the results to all lanes for the final move into the target register. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Entirely untested beyond compilation. Should check bin/tex-miplevel-selection textureGrad Cube

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 12

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

2016 Mar 16

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

FILE_MEMORY_GLOBAL is currently only used for buffer handling, as we do not yet have (opencl) global memory support. Global memory support actually requires some different handling during lowering, so rename FILE_MEMORY_GLOBAL to FILE_MEMORY_BUFFER to reflect that the current code is for buffer handling, this will allow the later (re-)addition of FILE_MEMORY_GLOBAL for regular global memory.

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: > FWIW surely on nv50 you could keep a single mad instruction for umad > (sad maybe too?). (I'm actually wondering if the hw really can't do > unfused float multiply+add as a single instruction but I know next to > nothing about nvidia hw...) The compiler should reassociate a mul + add

[PATCH] nv50/ir: Fold sat into mad

2015 Jan 02

[PATCH] nv50/ir: Fold sat into mad

The mad instruction emitter already supported the saturate modifier, but the ModifierFolding pass never tried folding cvt sat operations in for NV50. Signed-off-by: Roy Spliet <rspliet at eclipso.eu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp

[PATCH] nv50/ir: avoid deleting pseudo instructions too early

2014 Sep 25

[PATCH] nv50/ir: avoid deleting pseudo instructions too early

What happens is that a SPLIT operation is part of the spill node, and as a pseudo op, the instruction gets erased after processing its first def. However the later defs still need to refer to it, so instead delay spilling until after that whole RA node is done processing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79462 Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc:

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

2016 Mar 16

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

Hi, On 16-03-16 15:55, Ilia Mirkin wrote: > This approach leads to the emitters needing to know about both global and > buffer, even though at that point, they are identical. I was thinking that > in the lowering logic, buffer would just get rewritten as global (with the > offset added), thus not needing any change to the emitters. What do you > think about such an approach? I was

similar to: Handling UMAD with a negative modifier, or why glsl-fs-atan-3 was failing