thr3ads.net - similar to: "[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax"

Displaying 20 results from an estimated 400 matches similar to: "[PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax"

[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)

2015 Feb 23

[PATCH 1/2] nv50/ir: add fp64 support on G200 (NVA0)

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Untested beyond compiling a few shaders to see if they look like they might work. nvdisasm agrees with envydis's decoding of these things. Will definitely get ahold of a G200 to run tests on before pushing this. .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp | 94 ++++++++++++++++++---

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Does this give correct results for special floats (0, infs)? We tried to improve (for single floats) x86 rcp in llvmpipe with newton-raphson, but unfortunately not being able to give correct results for these two cases (without even more additional code) meant it got all disabled in the end (you can still see that code in the driver) since the problems are at least as bad as those due to bad

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

2014 May 18

[PATCH 1/2] nv50/ir: fix s32 x s32 -> high s32 multiply logic

Retrieving the high 32 bits of a signed multiply is rather annoying. It appears that the simplest way to do this is to compute the absolute value of the arguments, and perform a u32 x u32 -> u64 operation. If the arguments' signs differ, then negate the result. Since there is no u64 support in the cvt instruction, we have the perform the 2's complement negation "by hand".

[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

2015 Nov 05

[PATCH mesa 0/5] nouveau: codegen: Make use of double immediates

Hi All, This series implements using double immediates in the nouveau codegen code. This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) This has been tested with the 2 double shader tests which I just send to the piglet list. On a gk208 (gk110 / SM35)

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

2019 Oct 14

[PATCH] gm107/ir: fix loading z offset for layered 3d image bindings

Unfortuantely we don't know if a particular load is a real 2d image (as would be a cube face or 2d array element), or a layer of a 3d image. Since we pass in the TIC reference, the instruction's type has to match what's in the TIC (experimentally). In order to properly support bindless images, this also can't be done by looking at the current bindings and generating appropriate

[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

2014 Jan 13

[PATCH 00/19] nv50: add sampler2DMS/GP support to get OpenGL 3.2

OK, so there's a bunch of stuff in here. The geometry stuff is based on the work started by Bryan Cain and Christoph Bumiller. Patches 01-12: Add support for geometry shaders and fix related issues Patches 13-14: Make it possible for fb clears to operate on texture attachments with an explicit layer set (as is allowed in gl 3.2). Patches 15-17: Make ARB_texture_multisample work

[PATCH] nv50/ir: saturate FRC result to avoid completely bogus values

2014 Nov 18

[PATCH] nv50/ir: saturate FRC result to avoid completely bogus values

For values above integer accuracy in floats, val - floor(val) might actually produce a value greater than 1. For such large floats, it's reasonable to be imprecise, but it's unreasonable for FRC to return a value that is not between 0 and 1. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 3 ++- 1 file changed, 2

[Mesa-dev] [PATCH] nv50/ir: saturate FRC result to avoid completely bogus values

2014 Nov 18

[Mesa-dev] [PATCH] nv50/ir: saturate FRC result to avoid completely bogus values

On Tue, Nov 18, 2014 at 8:54 AM, Roland Scheidegger <sroland at vmware.com> wrote: > Am 18.11.2014 um 05:03 schrieb Ilia Mirkin: >> For values above integer accuracy in floats, val - floor(val) might >> actually produce a value greater than 1. For such large floats, it's >> reasonable to be imprecise, but it's unreasonable for FRC to return a >> value that

[PATCH 00/12] Cherry-pick nv50/nvc0 patches from gallium-nine

2014 May 20

[PATCH 00/12] Cherry-pick nv50/nvc0 patches from gallium-nine

I went through the gallium-nine tree and picked out nouveau patches that are general bug-fixes. The first bunch I'd like to also get into 10.2. I've reviewed all of them and they make sense to me, but sending them out for public review as well in case there are any objections. Unless I hear objections, I'd like to push this by Friday. Christoph Bumiller (11): nv50,nvc0: always pull

[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced

2014 May 13

[PATCH 1/2] nv50/ir: make sure that texprep/texquerylod's args get coalesced

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> Cc: "10.2" <mesa-stable at lists.freedesktop.org> --- Not 100% sure of the significance of this code, but this seems like the correct thing to do... will definitely run it through a full piglit run before pushing out. src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 2 ++ 1 file changed, 2 insertions(+) diff --git

[Mesa-dev] [PATCH] nv50/ir: saturate FRC result to avoid completely bogus values

2014 Nov 18

[Mesa-dev] [PATCH] nv50/ir: saturate FRC result to avoid completely bogus values

On 18/11/14 14:34, Roland Scheidegger wrote: > Am 18.11.2014 um 15:05 schrieb Ilia Mirkin: >> On Tue, Nov 18, 2014 at 8:54 AM, Roland Scheidegger <sroland at vmware.com> wrote: >>> Am 18.11.2014 um 05:03 schrieb Ilia Mirkin: >>>> For values above integer accuracy in floats, val - floor(val) might >>>> actually produce a value greater than 1. For such

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Not sure how many steps are needed for the necessary accuracy. Just doing 2 because that seems like a reasonable number. .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 42 ++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

2015 Feb 23

[Mesa-dev] [PATCH 2/2] nvc0/ir: improve precision of double RCP/RSQ results

Oh right. I think the NVIDIA blob executes those steps conditionally based on the upper bits not being 0x7ff (== infinity/nan). I should do the same thing here. [FWIW I was able to test the nv50 code last night and that one's a total fail for rcp/rsq... will need to port that over to my nvc0 and debug there.] On Mon, Feb 23, 2015 at 8:24 AM, Roland Scheidegger <sroland at vmware.com>

[PATCH] nv50/ir: make ARB_viewport_array behave like it does with other drivers

2014 Jun 23

[PATCH] nv50/ir: make ARB_viewport_array behave like it does with other drivers

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> --- .../drivers/nouveau/codegen/nv50_ir_driver.h | 1 + .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 27 ++++++++++++++++++++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h b/src/gallium/drivers/nouveau/codegen/nv50_ir_driver.h

[PATCH v3] nv50/ir: make ARB_viewport_array behave like it does with other drivers

2014 Jun 23

[PATCH v3] nv50/ir: make ARB_viewport_array behave like it does with other drivers

previously, if we had something like: gl_ViewportIndex = idx; for(int i = 0; i < gl_in.length(); i++) { gl_Position = gl_in[i].gl_Position; EmitVertex(); } EndPrimitive(); we failed to set the right ViewportIndex. To resolve this, save the ViewportIndex and store it to the right register on each emit. This fixes the remaining piglit tests in ARB_viewport_array for nvc0. Note: Not

[PATCH v2] nv50/ir: make ARB_viewport_array behave like it does with other drivers

2014 Jun 23

[PATCH v2] nv50/ir: make ARB_viewport_array behave like it does with other drivers

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

2017 Dec 20

[PATCH] gm107/ir: use lane 0 for manual textureGrad handling

This is parallel to the pre-SM50 change which does this. Adjusts the shuffles / quadops to make the values correct relative to lane 0, and then splat the results to all lanes for the final move into the target register. Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Entirely untested beyond compilation. Should check bin/tex-miplevel-selection textureGrad Cube

[Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result

2014 May 21

[Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result

On 21/05/14 00:39, Ilia Mirkin wrote: > From: Christoph Bumiller <christoph.bumiller at speed.at> > > Reviewed-by: Ilia Mirkin <imirkin at alum.mit.edu> > Cc: "10.2" <mesa-stable at lists.freedesktop.org> > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff

[Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result

2014 May 21

[Mesa-dev] [PATCH 04/12] nv50/ir/tgsi: TGSI_OPCODE_POW replicates its result

On 21/05/14 19:53, Ilia Mirkin wrote: > On Wed, May 21, 2014 at 2:51 PM, Emil Velikov <emil.l.velikov at gmail.com> wrote: >> On 21/05/14 00:39, Ilia Mirkin wrote: >>> From: Christoph Bumiller <christoph.bumiller at speed.at> >>> >>> Reviewed-by: Ilia Mirkin <imirkin at alum.mit.edu> >>> Cc: "10.2" <mesa-stable at

[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction

2015 May 09

[PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction

Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu> --- Pretty sure there's nothing wrong with it, but it looks odd in the code. src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +++++-- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-)

similar to: [PATCH 01/11] nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax