Ilia Mirkin
2015-May-18 20:48 UTC
[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles
Hello, I've been debugging a few different tessellation shader issues with nouveau, but let's start small. I see this issue on my GK208 with high frequency, and I *think* I've seen it once or twice on my GF108, but it's exceedingly rare, if it does happen. I don't have a GK10x to test on, unfortunately, but I assume it'll have the same issue as the GK208. The issue is this -- a bunch of triangles that should come out of the tessellator end up black. I also see a GPC0/TPC1/MP trap: MEM_OUT_OF_BOUNDS error produced by nouveau -- this is output in response to a interrupt and MP trap generated by the hardware, read out with nv_rd32(priv, TPC_UNIT(gpc, tpc, 0x648)); (see gf100_gr_trap_mp). I assume some of the tessellation evaluation invocations get killed, but I have no proof of this. I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]] I would imagine that's some floating point number ending up in the register instead of an address, but the fp32 value of it (1.35107421875) does not seem familiar. Even when all the triangles show up, I still see the error on the GK208, so I'm not sure if they're the same issue or not. Now, here's the fun part -- this is completely non-deterministic. Sometimes everything shows up on the GK208, other times I see holes, in varying locations. I'm fairly sure that the actual shader code is correct... so I'm doing something funny wrong. (And yeah, tons of missed optimization opportunities in this code, but let's not dwell on that.) This is the piglit test: http://cgit.freedesktop.org/piglit/tree/tests/spec/arb_tessellation_shader/execution/quads.shader_test It should be noted that other piglit tests don't exhibit this error, however they also tend to be simpler. One key difference is that they don't change the patch size in TCS. I'm including a link to a text file with the tessellation control and evaluation shaders (decoded with nvdisasm which you're hopefully more familiar with), along with the shader headers that we generate. FTR, this is how I feed the raw shader opcode bytes into nvdisasm: perl -ane 'foreach (@F) { print pack "I", hex($_) }' > tt; nvdisasm -b SM35 tt (for some reason it doesn't want to read from a pipe or even a fd). http://people.freedesktop.org/~imirkin/tess_shaders_quads.txt My suspicion is that we're doing something wrong with the sched codes. We have an elaborate calculator, but... perhaps not elaborate enough? You can see it here: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n2574 The reason I think it's an error in sched codes is due to the TRAP memory location that I see -- could well be some "stale" value in the register and the value from S2R or VILD doesn't make it in there in time before the ALD reads it. If you should like to try this yourself, you can use https://github.com/imirkin/mesa/commits/gl4-integration-2 . This branch is good enough to run Unigine Heaven, but still has a lot of known shortcomings. (Both at the core and the nouveau levels.) Any advice or suggestions for debugging this would be greatly appreciated. And let me know if you'd like me to generate additional info on this. For example I can supply a full command trace that can be piped to demmt, if that's helpful. Thanks in advance, -ilia
Ilia Mirkin
2015-May-22 21:10 UTC
[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles
On Mon, May 18, 2015 at 4:48 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> Hello, > > I've been debugging a few different tessellation shader issues with > nouveau, but let's start small. I see this issue on my GK208 with high > frequency, and I *think* I've seen it once or twice on my GF108, but > it's exceedingly rare, if it does happen. I don't have a GK10x to test > on, unfortunately, but I assume it'll have the same issue as the > GK208. > > The issue is this -- a bunch of triangles that should come out of the > tessellator end up black. I also see a GPC0/TPC1/MP trap: > MEM_OUT_OF_BOUNDS error produced by nouveau -- this is output in > response to a interrupt and MP trap generated by the hardware, read > out with nv_rd32(priv, TPC_UNIT(gpc, tpc, 0x648)); (see > gf100_gr_trap_mp). I assume some of the tessellation evaluation > invocations get killed, but I have no proof of this. > > I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]] > > I would imagine that's some floating point number ending up in the > register instead of an address, but the fp32 value of it > (1.35107421875) does not seem familiar.Ben pointed out that the 0x3facf000 is a channel address, not a value from the shader. Oops. So that theory completely doesn't hold water. Perhaps some buffer isn't big enough? This ends up using 9 output vertices per patch, with 2 vec4's each. I've tried playing with the per-warp stack size to no avail, but I didn't *entirely* know what I was doing either though.> > Even when all the triangles show up, I still see the error on the > GK208, so I'm not sure if they're the same issue or not. > > Now, here's the fun part -- this is completely non-deterministic. > Sometimes everything shows up on the GK208, other times I see holes, > in varying locations. I'm fairly sure that the actual shader code is > correct... so I'm doing something funny wrong. (And yeah, tons of > missed optimization opportunities in this code, but let's not dwell on > that.) > > This is the piglit test: > > http://cgit.freedesktop.org/piglit/tree/tests/spec/arb_tessellation_shader/execution/quads.shader_test > > It should be noted that other piglit tests don't exhibit this error, > however they also tend to be simpler. One key difference is that they > don't change the patch size in TCS. I'm including a link to a text > file with the tessellation control and evaluation shaders (decoded > with nvdisasm which you're hopefully more familiar with), along with > the shader headers that we generate. > > FTR, this is how I feed the raw shader opcode bytes into nvdisasm: > > perl -ane 'foreach (@F) { print pack "I", hex($_) }' > tt; nvdisasm -b SM35 tt > > (for some reason it doesn't want to read from a pipe or even a fd). > > http://people.freedesktop.org/~imirkin/tess_shaders_quads.txt > > My suspicion is that we're doing something wrong with the sched codes. > We have an elaborate calculator, but... perhaps not elaborate enough? > You can see it here: > > http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n2574 > > The reason I think it's an error in sched codes is due to the TRAP > memory location that I see -- could well be some "stale" value in the > register and the value from S2R or VILD doesn't make it in there in > time before the ALD reads it. > > If you should like to try this yourself, you can use > https://github.com/imirkin/mesa/commits/gl4-integration-2 . This > branch is good enough to run Unigine Heaven, but still has a lot of > known shortcomings. (Both at the core and the nouveau levels.) > > Any advice or suggestions for debugging this would be greatly > appreciated. And let me know if you'd like me to generate additional > info on this. For example I can supply a full command trace that can > be piped to demmt, if that's helpful. > > Thanks in advance, > > -ilia
Ilia Mirkin
2015-May-26 23:34 UTC
[Nouveau] Tessellation shaders get MEM_OUT_OF_BOUNDS errors / missing triangles
One additional observation that I just made is that on GK208, the blob apparently doesn't use the result of S2R Rx, SR_INVOCATION_ID wholesale in TCS. It either passes it through a I2I.S32.S32 Rx, |Rx| (i.e. absolute value), or even more paradoxically, shl 2; shr 2; which removes the top *2* bits, rather than just the top 1. However I see no such behaviour on GF108. I'm going to test out tomorrow whether this is the cause of my GK208 woes. On Fri, May 22, 2015 at 5:10 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> On Mon, May 18, 2015 at 4:48 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> Hello, >> >> I've been debugging a few different tessellation shader issues with >> nouveau, but let's start small. I see this issue on my GK208 with high >> frequency, and I *think* I've seen it once or twice on my GF108, but >> it's exceedingly rare, if it does happen. I don't have a GK10x to test >> on, unfortunately, but I assume it'll have the same issue as the >> GK208. >> >> The issue is this -- a bunch of triangles that should come out of the >> tessellator end up black. I also see a GPC0/TPC1/MP trap: >> MEM_OUT_OF_BOUNDS error produced by nouveau -- this is output in >> response to a interrupt and MP trap generated by the hardware, read >> out with nv_rd32(priv, TPC_UNIT(gpc, tpc, 0x648)); (see >> gf100_gr_trap_mp). I assume some of the tessellation evaluation >> invocations get killed, but I have no proof of this. >> >> I also see this: TRAP ch 5 [0x003facf000 shader_runner[19044]] >> >> I would imagine that's some floating point number ending up in the >> register instead of an address, but the fp32 value of it >> (1.35107421875) does not seem familiar. > > Ben pointed out that the 0x3facf000 is a channel address, not a value > from the shader. Oops. So that theory completely doesn't hold water. > Perhaps some buffer isn't big enough? This ends up using 9 output > vertices per patch, with 2 vec4's each. I've tried playing with the > per-warp stack size to no avail, but I didn't *entirely* know what I > was doing either though. > >> >> Even when all the triangles show up, I still see the error on the >> GK208, so I'm not sure if they're the same issue or not. >> >> Now, here's the fun part -- this is completely non-deterministic. >> Sometimes everything shows up on the GK208, other times I see holes, >> in varying locations. I'm fairly sure that the actual shader code is >> correct... so I'm doing something funny wrong. (And yeah, tons of >> missed optimization opportunities in this code, but let's not dwell on >> that.) >> >> This is the piglit test: >> >> http://cgit.freedesktop.org/piglit/tree/tests/spec/arb_tessellation_shader/execution/quads.shader_test >> >> It should be noted that other piglit tests don't exhibit this error, >> however they also tend to be simpler. One key difference is that they >> don't change the patch size in TCS. I'm including a link to a text >> file with the tessellation control and evaluation shaders (decoded >> with nvdisasm which you're hopefully more familiar with), along with >> the shader headers that we generate. >> >> FTR, this is how I feed the raw shader opcode bytes into nvdisasm: >> >> perl -ane 'foreach (@F) { print pack "I", hex($_) }' > tt; nvdisasm -b SM35 tt >> >> (for some reason it doesn't want to read from a pipe or even a fd). >> >> http://people.freedesktop.org/~imirkin/tess_shaders_quads.txt >> >> My suspicion is that we're doing something wrong with the sched codes. >> We have an elaborate calculator, but... perhaps not elaborate enough? >> You can see it here: >> >> http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n2574 >> >> The reason I think it's an error in sched codes is due to the TRAP >> memory location that I see -- could well be some "stale" value in the >> register and the value from S2R or VILD doesn't make it in there in >> time before the ALD reads it. >> >> If you should like to try this yourself, you can use >> https://github.com/imirkin/mesa/commits/gl4-integration-2 . This >> branch is good enough to run Unigine Heaven, but still has a lot of >> known shortcomings. (Both at the core and the nouveau levels.) >> >> Any advice or suggestions for debugging this would be greatly >> appreciated. And let me know if you'd like me to generate additional >> info on this. For example I can supply a full command trace that can >> be piped to demmt, if that's helpful. >> >> Thanks in advance, >> >> -ilia