thr3ads.net - similar to: "[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)"

Displaying 20 results from an estimated 500 matches similar to: "[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)"

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Does this give correct results for special floats (0, infs)? We tried to improve (for single floats) x86 rcp in llvmpipe with newton-raphson, but unfortunately not being able to give correct results for these two cases (without even more additional code) meant it got all disabled in the end (you can still see that code in the driver) since the problems are at least as bad as those due to bad

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Not sure how many steps are needed for the necessary accuracy. Just doing 2 because that seems like a reasonable number. .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 42 ++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Oh right. I think the NVIDIA blob executes those steps conditionally based on the upper bits not being 0x7ff (== infinity/nan). I should do the same thing here. [FWIW I was able to test the nv50 code last night and that one's a total fail for rcp/rsq... will need to port that over to my nvc0 and debug there.] On Mon, Feb 23, 2015 at 8:24 AM, Roland Scheidegger <sroland at vmware.com>

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

2015 Feb 20

[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++- 1 file changed, 63 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index dfb093c..e38a3b8 100644 ---

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

2014 May 18

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

Retrieving the high 32 bits of a signed multiply is rather annoying. It appears that the simplest way to do this is to compute the absolute value of the arguments, and perform a u32 x u32 -> u64 operation. If the arguments' signs differ, then negate the result. Since there is no u64 support in the cvt instruction, we have the perform the 2's complement negation "by hand".

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

2019 Oct 14

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

Unfortuantely we don't know if a particular load is a real 2d image (as would be a cube face or 2d array element), or a layer of a 3d image. Since we pass in the TIC reference, the instruction's type has to match what's in the TIC (experimentally). In order to properly support bindless images, this also can't be done by looking at the current bindings and generating appropriate

[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

2014 Jan 13

[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

OK, so there's a bunch of stuff in here. The geometry stuff is based on the work started by Bryan Cain and Christoph Bumiller. Patches 01-12: Add support for geometry shaders and fix related issues Patches 13-14: Make it possible for fb clears to operate on texture attachments with an explicit layer set (as is allowed in gl 3.2). Patches 15-17: Make ARB_texture_multisample work

[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case

2014 Aug 08

[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- .../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index f010767..4a9e48f 100644 ---

[PATCH v3 1/2] nv50/ir: Add support for the double Type to BuildUtil

2014 Jul 03

[PATCH v3 1/2] nv50/ir: Add support for the double Type to BuildUtil

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_build_util.cpp | 17 +++++++++++++++++ .../drivers/nouveau/codegen/nv50_ir_build_util.h | 2 ++ 2 files changed, 19 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp

[PATCH] nv50: enable texture query lod

2014 Feb 28

[PATCH] nv50: enable texture query lod

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Note: this applies on top of airlied's r600g-texture-gather branch. Appears to pass all 4 piglit tests. The conversion from what the instruction outputs is the same as what the blob does. src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 + .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 4 ++++

[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

2014 Jul 05

[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions

Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32 No piglit regressions observed on nv50 and nvc0! Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- V2: fix usage of wrong variable V3: enable F64 support V4: - disable F64 support again - handle saturate flag: clamp to min/max if needed

[PATCH] nv50/ir: Add sat modifier for mul

2015 Jan 04

[PATCH] nv50/ir: Add sat modifier for mul

Signed-off-by: Roy Spliet <rspliet at eclipso.eu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 6 ++++++ src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp index

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

2015 Jan 09

[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions

Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32 Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++ 1 file changed, 109 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp

[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations

2015 Mar 25

[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations

Multiply operations can have a post-factor on them, which other ops don't support. Only perform the peephole optimizations when there is no post-factor involved. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758 Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 ++++++++---- 1 file changed, 8 insertions(+),

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

2016 Mar 16

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

This approach leads to the emitters needing to know about both global and buffer, even though at that point, they are identical. I was thinking that in the lowering logic, buffer would just get rewritten as global (with the offset added), thus not needing any change to the emitters. What do you think about such an approach? On Mar 16, 2016 2:24 AM, "Hans de Goede" <hdegoede at

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

2016 Mar 16

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

FILE_MEMORY_GLOBAL is currently only used for buffer handling, as we do not yet have (opencl) global memory support. Global memory support actually requires some different handling during lowering, so rename FILE_MEMORY_GLOBAL to FILE_MEMORY_BUFFER to reflect that the current code is for buffer handling, this will allow the later (re-)addition of FILE_MEMORY_GLOBAL for regular global memory.

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

2016 Mar 16

[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/

Hi, On 16-03-16 15:55, Ilia Mirkin wrote: > This approach leads to the emitters needing to know about both global and > buffer, even though at that point, they are identical. I was thinking that > in the lowering logic, buffer would just get rewritten as global (with the > offset added), thus not needing any change to the emitters. What do you > think about such an approach? I was

[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced

2014 May 13

[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2" <mesa-stable at lists.freedesktop.org> --- Not 100% sure of the significance of this code, but this seems like the correct thing to do... will definitely run it through a full piglit run before pushing out. src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

This is parallel to the pre-SM50 change which does this. Adjusts the shuffles / quadops to make the values correct relative to lane 0, and then splat the results to all lanes for the final move into the target register. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Entirely untested beyond compilation. Should check bin/tex-miplevel-selection textureGrad Cube

[PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space

2016 Jan 14

[PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space

Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Seems like there ought to be a simpler

similar to: [PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)