thr3ads.net - Nouveau - [Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2015-May-26 23:34 UTC

[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

One additional observation that I just made is that on GK208, the blob
apparently doesn't use the result of S2R Rx, SR_INVOCATION_ID
wholesale in TCS. It either passes it through a I2I.S32.S32 Rx, |Rx|
(i.e. absolute value), or even more paradoxically, shl 2; shr 2; which
removes the top *2* bits, rather than just the top 1. However I see no
such behaviour on GF108.

I'm going to test out tomorrow whether this is the cause of my GK208 woes.

On Fri, May 22, 2015 at 5:10 PM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> On Mon, May 18, 2015 at 4:48 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> Hello,
>>
>> I've been debugging a few different tessellation shader issues with
>> nouveau, but let's start small. I see this issue on my GK208 with
high
>> frequency, and I *think* I've seen it once or twice on my GF108,
but
>> it's exceedingly rare, if it does happen. I don't have a GK10x
to test
>> on, unfortunately, but I assume it'll have the same issue as the
>> GK208.
>>
>> The issue is this -- a bunch of triangles that should come out of the
>> tessellator end up black. I also see a GPC0/TPC1/MP trap:
>> MEM_OUT_OF_BOUNDS error produced by nouveau -- this is output in
>> response to a interrupt and MP trap generated by the hardware, read
>> out with nv_rd32(priv, TPC_UNIT(gpc, tpc, 0x648)); (see
>> gf100_gr_trap_mp). I assume some of the tessellation evaluation
>> invocations get killed, but I have no proof of this.
>>
>> I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]]
>>
>> I would imagine that's some floating point number ending up in the
>> register instead of an address, but the fp32 value of it
>> (1.35107421875) does not seem familiar.
>
> Ben pointed out that the 0x3facf000 is a channel address, not a value
> from the shader. Oops. So that theory completely doesn't hold water.
> Perhaps some buffer isn't big enough? This ends up using 9 output
> vertices per patch, with 2 vec4's each. I've tried playing with the
> per-warp stack size to no avail, but I didn't *entirely* know what I
> was doing either though.
>
>>
>> Even when all the triangles show up, I still see the error on the
>> GK208, so I'm not sure if they're the same issue or not.
>>
>> Now, here's the fun part -- this is completely non-deterministic.
>> Sometimes everything shows up on the GK208, other times I see holes,
>> in varying locations. I'm fairly sure that the actual shader code
is
>> correct... so I'm doing something funny wrong. (And yeah, tons of
>> missed optimization opportunities in this code, but let's not dwell
on
>> that.)
>>
>> This is the piglit test:
>>
>>
http://cgit.freedesktop.org/piglit/tree/tests/spec/arb_tessellation_shader/execution/quads.shader_test
>>
>> It should be noted that other piglit tests don't exhibit this
error,
>> however they also tend to be simpler. One key difference is that they
>> don't change the patch size in TCS. I'm including a link to a
text
>> file with the tessellation control and evaluation shaders (decoded
>> with nvdisasm which you're hopefully more familiar with), along
with
>> the shader headers that we generate.
>>
>> FTR, this is how I feed the raw shader opcode bytes into nvdisasm:
>>
>> perl -ane 'foreach (@F) { print pack "I", hex($_) }'
> tt; nvdisasm -b SM35 tt
>>
>> (for some reason it doesn't want to read from a pipe or even a fd).
>>
>> http://people.freedesktop.org/~imirkin/tess_shaders_quads.txt
>>
>> My suspicion is that we're doing something wrong with the sched
codes.
>> We have an elaborate calculator, but... perhaps not elaborate enough?
>> You can see it here:
>>
>>
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n2574
>>
>> The reason I think it's an error in sched codes is due to the TRAP
>> memory location that I see -- could well be some "stale"
value in the
>> register and the value from S2R or VILD doesn't make it in there in
>> time before the ALD reads it.
>>
>> If you should like to try this yourself, you can use
>> https://github.com/imirkin/mesa/commits/gl4-integration-2 . This
>> branch is good enough to run Unigine Heaven, but still has a lot of
>> known shortcomings. (Both at the core and the nouveau levels.)
>>
>> Any advice or suggestions for debugging this would be greatly
>> appreciated. And let me know if you'd like me to generate
additional
>> info on this. For example I can supply a full command trace that can
>> be piped to demmt, if that's helpful.
>>
>> Thanks in advance,
>>
>>   -ilia

Ilia Mirkin

2015-Jul-23 06:36 UTC

head link

[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

I think I figured out what was going on. Will re-check on the GK208,
but on a GF108 the random blue splotches in Unigine Heaven are gone
now. Turns out that with an instruction like

        /*00d0*/                   ALD.128 R0, a[0x70], R0;
   /* 0x7ecc0000381ffc02 */

The hardware will internally split it up into roughly

ALD R0, a[0x70], R0
ALD R1, a[0x74], R0
ALD R2, a[0x78], R0
ALD R3, a[0x7c], R0

Of course the first one of those overwrites R0, which makes the
subsequent loads be full of fail. Adding a hazard in our RA for the
indirect argument resolves the issue.

  -ilia


On Tue, May 26, 2015 at 7:34 PM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> One additional observation that I just made is that on GK208, the blob
> apparently doesn't use the result of S2R Rx, SR_INVOCATION_ID
> wholesale in TCS. It either passes it through a I2I.S32.S32 Rx, |Rx|
> (i.e. absolute value), or even more paradoxically, shl 2; shr 2; which
> removes the top *2* bits, rather than just the top 1. However I see no
> such behaviour on GF108.
>
> I'm going to test out tomorrow whether this is the cause of my GK208
woes.
>
> On Fri, May 22, 2015 at 5:10 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> On Mon, May 18, 2015 at 4:48 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>>> Hello,
>>>
>>> I've been debugging a few different tessellation shader issues
with
>>> nouveau, but let's start small. I see this issue on my GK208
with high
>>> frequency, and I *think* I've seen it once or twice on my
GF108, but
>>> it's exceedingly rare, if it does happen. I don't have a
GK10x to test
>>> on, unfortunately, but I assume it'll have the same issue as
the
>>> GK208.
>>>
>>> The issue is this -- a bunch of triangles that should come out of
the
>>> tessellator end up black. I also see a GPC0/TPC1/MP trap:
>>> MEM_OUT_OF_BOUNDS error produced by nouveau -- this is output in
>>> response to a interrupt and MP trap generated by the hardware, read
>>> out with nv_rd32(priv, TPC_UNIT(gpc, tpc, 0x648)); (see
>>> gf100_gr_trap_mp). I assume some of the tessellation evaluation
>>> invocations get killed, but I have no proof of this.
>>>
>>> I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]]
>>>
>>> I would imagine that's some floating point number ending up in
the
>>> register instead of an address, but the fp32 value of it
>>> (1.35107421875) does not seem familiar.
>>
>> Ben pointed out that the 0x3facf000 is a channel address, not a value
>> from the shader. Oops. So that theory completely doesn't hold
water.
>> Perhaps some buffer isn't big enough? This ends up using 9 output
>> vertices per patch, with 2 vec4's each. I've tried playing with
the
>> per-warp stack size to no avail, but I didn't *entirely* know what
I
>> was doing either though.
>>
>>>
>>> Even when all the triangles show up, I still see the error on the
>>> GK208, so I'm not sure if they're the same issue or not.
>>>
>>> Now, here's the fun part -- this is completely
non-deterministic.
>>> Sometimes everything shows up on the GK208, other times I see
holes,
>>> in varying locations. I'm fairly sure that the actual shader
code is
>>> correct... so I'm doing something funny wrong. (And yeah, tons
of
>>> missed optimization opportunities in this code, but let's not
dwell on
>>> that.)
>>>
>>> This is the piglit test:
>>>
>>>
http://cgit.freedesktop.org/piglit/tree/tests/spec/arb_tessellation_shader/execution/quads.shader_test
>>>
>>> It should be noted that other piglit tests don't exhibit this
error,
>>> however they also tend to be simpler. One key difference is that
they
>>> don't change the patch size in TCS. I'm including a link to
a text
>>> file with the tessellation control and evaluation shaders (decoded
>>> with nvdisasm which you're hopefully more familiar with), along
with
>>> the shader headers that we generate.
>>>
>>> FTR, this is how I feed the raw shader opcode bytes into nvdisasm:
>>>
>>> perl -ane 'foreach (@F) { print pack "I", hex($_)
}' > tt; nvdisasm -b SM35 tt
>>>
>>> (for some reason it doesn't want to read from a pipe or even a
fd).
>>>
>>> http://people.freedesktop.org/~imirkin/tess_shaders_quads.txt
>>>
>>> My suspicion is that we're doing something wrong with the sched
codes.
>>> We have an elaborate calculator, but... perhaps not elaborate
enough?
>>> You can see it here:
>>>
>>>
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n2574
>>>
>>> The reason I think it's an error in sched codes is due to the
TRAP
>>> memory location that I see -- could well be some "stale"
value in the
>>> register and the value from S2R or VILD doesn't make it in
there in
>>> time before the ALD reads it.
>>>
>>> If you should like to try this yourself, you can use
>>> https://github.com/imirkin/mesa/commits/gl4-integration-2 . This
>>> branch is good enough to run Unigine Heaven, but still has a lot of
>>> known shortcomings. (Both at the core and the nouveau levels.)
>>>
>>> Any advice or suggestions for debugging this would be greatly
>>> appreciated. And let me know if you'd like me to generate
additional
>>> info on this. For example I can supply a full command trace that
can
>>> be piped to demmt, if that's helpful.
>>>
>>> Thanks in advance,
>>>
>>>   -ilia

Ilia Mirkin

2015-Jul-24 16:34 UTC

head link

[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

Indeed, this fixed the original issue on the GK208. Additionally it
seems like starting with GK104 the mechanism for indirect offsets for
ALD/AST changed and a AL2P instruction must now be used to determine
the "indirect" or "physical" offset. Once nouveau was
adjusted to do
this, all MEM_OUT_OF_BOUNDS errors with tessellation shaders are gone.

On Thu, Jul 23, 2015 at 2:36 AM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> I think I figured out what was going on. Will re-check on the GK208,
> but on a GF108 the random blue splotches in Unigine Heaven are gone
> now. Turns out that with an instruction like
>
>         /*00d0*/                   ALD.128 R0, a[0x70], R0;
>    /* 0x7ecc0000381ffc02 */
>
> The hardware will internally split it up into roughly
>
> ALD R0, a[0x70], R0
> ALD R1, a[0x74], R0
> ALD R2, a[0x78], R0
> ALD R3, a[0x7c], R0
>
> Of course the first one of those overwrites R0, which makes the
> subsequent loads be full of fail. Adding a hazard in our RA for the
> indirect argument resolves the issue.
>
>   -ilia
>
>
> On Tue, May 26, 2015 at 7:34 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> One additional observation that I just made is that on GK208, the blob
>> apparently doesn't use the result of S2R Rx, SR_INVOCATION_ID
>> wholesale in TCS. It either passes it through a I2I.S32.S32 Rx, |Rx|
>> (i.e. absolute value), or even more paradoxically, shl 2; shr 2; which
>> removes the top *2* bits, rather than just the top 1. However I see no
>> such behaviour on GF108.
>>
>> I'm going to test out tomorrow whether this is the cause of my
GK208 woes.
>>
>> On Fri, May 22, 2015 at 5:10 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>>> On Mon, May 18, 2015 at 4:48 PM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>>>> Hello,
>>>>
>>>> I've been debugging a few different tessellation shader
issues with
>>>> nouveau, but let's start small. I see this issue on my
GK208 with high
>>>> frequency, and I *think* I've seen it once or twice on my
GF108, but
>>>> it's exceedingly rare, if it does happen. I don't have
a GK10x to test
>>>> on, unfortunately, but I assume it'll have the same issue
as the
>>>> GK208.
>>>>
>>>> The issue is this -- a bunch of triangles that should come out
of the
>>>> tessellator end up black. I also see a GPC0/TPC1/MP trap:
>>>> MEM_OUT_OF_BOUNDS error produced by nouveau -- this is output
in
>>>> response to a interrupt and MP trap generated by the hardware,
read
>>>> out with nv_rd32(priv, TPC_UNIT(gpc, tpc, 0x648)); (see
>>>> gf100_gr_trap_mp). I assume some of the tessellation evaluation
>>>> invocations get killed, but I have no proof of this.
>>>>
>>>> I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]]
>>>>
>>>> I would imagine that's some floating point number ending up
in the
>>>> register instead of an address, but the fp32 value of it
>>>> (1.35107421875) does not seem familiar.
>>>
>>> Ben pointed out that the 0x3facf000 is a channel address, not a
value
>>> from the shader. Oops. So that theory completely doesn't hold
water.
>>> Perhaps some buffer isn't big enough? This ends up using 9
output
>>> vertices per patch, with 2 vec4's each. I've tried playing
with the
>>> per-warp stack size to no avail, but I didn't *entirely* know
what I
>>> was doing either though.
>>>
>>>>
>>>> Even when all the triangles show up, I still see the error on
the
>>>> GK208, so I'm not sure if they're the same issue or
not.
>>>>
>>>> Now, here's the fun part -- this is completely
non-deterministic.
>>>> Sometimes everything shows up on the GK208, other times I see
holes,
>>>> in varying locations. I'm fairly sure that the actual
shader code is
>>>> correct... so I'm doing something funny wrong. (And yeah,
tons of
>>>> missed optimization opportunities in this code, but let's
not dwell on
>>>> that.)
>>>>
>>>> This is the piglit test:
>>>>
>>>>
http://cgit.freedesktop.org/piglit/tree/tests/spec/arb_tessellation_shader/execution/quads.shader_test
>>>>
>>>> It should be noted that other piglit tests don't exhibit
this error,
>>>> however they also tend to be simpler. One key difference is
that they
>>>> don't change the patch size in TCS. I'm including a
link to a text
>>>> file with the tessellation control and evaluation shaders
(decoded
>>>> with nvdisasm which you're hopefully more familiar with),
along with
>>>> the shader headers that we generate.
>>>>
>>>> FTR, this is how I feed the raw shader opcode bytes into
nvdisasm:
>>>>
>>>> perl -ane 'foreach (@F) { print pack "I", hex($_)
}' > tt; nvdisasm -b SM35 tt
>>>>
>>>> (for some reason it doesn't want to read from a pipe or
even a fd).
>>>>
>>>> http://people.freedesktop.org/~imirkin/tess_shaders_quads.txt
>>>>
>>>> My suspicion is that we're doing something wrong with the
sched codes.
>>>> We have an elaborate calculator, but... perhaps not elaborate
enough?
>>>> You can see it here:
>>>>
>>>>
http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n2574
>>>>
>>>> The reason I think it's an error in sched codes is due to
the TRAP
>>>> memory location that I see -- could well be some
"stale" value in the
>>>> register and the value from S2R or VILD doesn't make it in
there in
>>>> time before the ALD reads it.
>>>>
>>>> If you should like to try this yourself, you can use
>>>> https://github.com/imirkin/mesa/commits/gl4-integration-2 .
This
>>>> branch is good enough to run Unigine Heaven, but still has a
lot of
>>>> known shortcomings. (Both at the core and the nouveau levels.)
>>>>
>>>> Any advice or suggestions for debugging this would be greatly
>>>> appreciated. And let me know if you'd like me to generate
additional
>>>> info on this. For example I can supply a full command trace
that can
>>>> be piped to demmt, if that's helpful.
>>>>
>>>> Thanks in advance,
>>>>
>>>>   -ilia

Nouveau - Jul 2015 - Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles

[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles