Displaying 20 results from an estimated 500 matches similar to: "[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)"
2015 Feb 23
2
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Does this give correct results for special floats (0, infs)?
We tried to improve (for single floats) x86 rcp in llvmpipe with
newton-raphson, but unfortunately not being able to give correct results
for these two cases (without even more additional code) meant it got all
disabled in the end (you can still see that code in the driver) since
the problems are at least as bad as those due to bad
2015 Feb 23
0
[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Not sure how many steps are needed for the necessary accuracy. Just
doing 2 because that seems like a reasonable number.
.../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 42 ++++++++++++++++++++--
1 file changed, 39 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
2015 Feb 23
0
[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results
Oh right. I think the NVIDIA blob executes those steps conditionally
based on the upper bits not being 0x7ff (== infinity/nan). I should do
the same thing here. [FWIW I was able to test the nv50 code last night
and that one's a total fail for rcp/rsq... will need to port that over
to my nvc0 and debug there.]
On Mon, Feb 23, 2015 at 8:24 AM, Roland Scheidegger <sroland at vmware.com>
2015 Feb 20
10
[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 66 +++++++++++++++++++++-
1 file changed, 63 insertions(+), 3 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index dfb093c..e38a3b8 100644
---
2014 May 18
1
[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".
2019 Oct 14
1
[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings
Unfortuantely we don't know if a particular load is a real 2d image (as
would be a cube face or 2d array element), or a layer of a 3d image.
Since we pass in the TIC reference, the instruction's type has to match
what's in the TIC (experimentally). In order to properly support
bindless images, this also can't be done by looking at the current
bindings and generating appropriate
2014 Jan 13
20
[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2
OK, so there's a bunch of stuff in here. The geometry stuff is based on the
work started by Bryan Cain and Christoph Bumiller.
Patches 01-12: Add support for geometry shaders and fix related issues
Patches 13-14: Make it possible for fb clears to operate on texture attachments
with an explicit layer set (as is allowed in gl 3.2).
Patches 15-17: Make ARB_texture_multisample work
2014 Aug 08
2
[PATCH 1/3] nvc0/ir: add base tex offset for fermi indirect tex case
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
.../drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
index f010767..4a9e48f 100644
---
2014 Jul 03
1
[PATCH v3 1/2] nv50/ir: Add support for the double Type to BuildUtil
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_build_util.cpp | 17 +++++++++++++++++
.../drivers/nouveau/codegen/nv50_ir_build_util.h | 2 ++
2 files changed, 19 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_build_util.cpp
2014 Feb 28
0
[PATCH] nv50: enable texture query lod
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Note: this applies on top of airlied's r600g-texture-gather branch.
Appears to pass all 4 piglit tests. The conversion from what the instruction
outputs is the same as what the blob does.
src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
.../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 4 ++++
2014 Jul 05
1
[PATCH v4] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32/64->(U16/32, S16/32) and (U16/32, S16/32)->F32
No piglit regressions observed on nv50 and nvc0!
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
V2: fix usage of wrong variable
V3: enable F64 support
V4:
- disable F64 support again
- handle saturate flag: clamp to min/max if needed
2015 Jan 04
0
[PATCH] nv50/ir: Add sat modifier for mul
Signed-off-by: Roy Spliet <rspliet at eclipso.eu>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 6 ++++++
src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp
index
2015 Jan 09
3
[RESEND/PATCH] nv50/ir: Handle OP_CVT when folding constant expressions
Folding for conversions: F32->(U{16/32}, S{16/32}) and (U{16/32}, {S16/32})->F32
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de>
---
.../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 109 +++++++++++++++++++++
1 file changed, 109 insertions(+)
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
2015 Mar 25
0
[PATCH] nv50/ir: take postFactor into account when doing peephole optimizations
Multiply operations can have a post-factor on them, which other ops
don't support. Only perform the peephole optimizations when there is no
post-factor involved.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89758
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 12 ++++++++----
1 file changed, 8 insertions(+),
2016 Mar 16
2
[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/
This approach leads to the emitters needing to know about both global and
buffer, even though at that point, they are identical. I was thinking that
in the lowering logic, buffer would just get rewritten as global (with the
offset added), thus not needing any change to the emitters. What do you
think about such an approach?
On Mar 16, 2016 2:24 AM, "Hans de Goede" <hdegoede at
2016 Mar 16
0
[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/
FILE_MEMORY_GLOBAL is currently only used for buffer handling, as we
do not yet have (opencl) global memory support. Global memory support
actually requires some different handling during lowering, so rename
FILE_MEMORY_GLOBAL to FILE_MEMORY_BUFFER to reflect that the current
code is for buffer handling, this will allow the later (re-)addition
of FILE_MEMORY_GLOBAL for regular global memory.
2016 Mar 16
0
[PATCH mesa 4/6] nouveau: codegen: s/FILE_MEMORY_GLOBAL/FILE_MEMORY_BUFFER/
Hi,
On 16-03-16 15:55, Ilia Mirkin wrote:
> This approach leads to the emitters needing to know about both global and
> buffer, even though at that point, they are identical. I was thinking that
> in the lowering logic, buffer would just get rewritten as global (with the
> offset added), thus not needing any change to the emitters. What do you
> think about such an approach?
I was
2014 May 13
1
[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
Cc: "10.2" <mesa-stable at lists.freedesktop.org>
---
Not 100% sure of the significance of this code, but this seems like the
correct thing to do... will definitely run it through a full piglit run before
pushing out.
src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 ++
1 file changed, 2 insertions(+)
diff --git
2017 Dec 20
2
[PATCH] gm107/ir: use lane 0 for manual textureGrad handling
This is parallel to the pre-SM50 change which does this. Adjusts the
shuffles / quadops to make the values correct relative to lane 0, and
then splat the results to all lanes for the final move into the target
register.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Entirely untested beyond compilation. Should check
bin/tex-miplevel-selection textureGrad Cube
2016 Jan 14
0
[PATCH] nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem space
Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP
shaders:
total local used in shared programs : 54116 -> 30372 (-43.88%)
Probably modest advantage to execution, but it's an imporant
prerequisite to dropping some of the TGSI optimizations done by the
state tracker.
Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
---
Seems like there ought to be a simpler