Ilia Mirkin
2015-Jun-25 14:41 UTC
[Nouveau] What are the restrictions around loading indirect constbuf values
Hello, We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby it appears that instructions like 00000028: b5000409 08000780 add rn f32 $r2 $r2 neg c0[$a1] 00000040: b500060d 08004780 add rn f32 $r3 $r3 neg c0[$a1+0x4] or with nvdisasm: .headerflags @"EF_CUDA_SM12 EF_CUDA_PTX_SM(EF_CUDA_SM12)" /*0000*/ FADD R2, R2, -c[0x0][A1+0x0]; /* 0x08000780b5000409 */ /*0008*/ FADD R3, R3, -c[0x0][A1+0x1]; /* 0x08004780b500060d */ don't appear to execute properly. However just MOV'ing the values into registers works fine. This was observed on a G92 chip. See bug https://bugs.freedesktop.org/show_bug.cgi?id=91056. I was hoping you could save me some time and let me know what instructions can load things like c0[$a1+4] (or maybe it's only in combination with the modifier?), and which Tesla-family GPU's have those restrictions. Thanks, -ilia
Ilia Mirkin
2015-Jun-30 05:53 UTC
[Nouveau] What are the restrictions around loading indirect constbuf values
On Thu, Jun 25, 2015 at 10:41 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> Hello, > > We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby > it appears that instructions like > > 00000028: b5000409 08000780 add rn f32 $r2 $r2 neg c0[$a1] > 00000040: b500060d 08004780 add rn f32 $r3 $r3 neg c0[$a1+0x4] > > or with nvdisasm: > > .headerflags @"EF_CUDA_SM12 EF_CUDA_PTX_SM(EF_CUDA_SM12)" > /*0000*/ FADD R2, R2, -c[0x0][A1+0x0]; /* 0x08000780b5000409 */ > /*0008*/ FADD R3, R3, -c[0x0][A1+0x1]; /* 0x08004780b500060d */ > > don't appear to execute properly. However just MOV'ing the values into > registers works fine. This was observed on a G92 chip. See bug > https://bugs.freedesktop.org/show_bug.cgi?id=91056. > > I was hoping you could save me some time and let me know what > instructions can load things like c0[$a1+4] (or maybe it's only in > combination with the modifier?), and which Tesla-family GPU's have > those restrictions.Hm, there's something more subtle going on here. Please disregard. A simple shader on my GT215 for both vertex and fragment demonstrates that those instructions work at least some of the time. (I didn't have a nv50-era card plugged in when I was asking the question, so I couldn't check for myself.) Perhaps there's something more subtle going on here, like non-uniformity across execution units... -ilia
Ilia Mirkin
2015-Jul-02 18:54 UTC
[Nouveau] What are the restrictions around loading indirect constbuf values
On Tue, Jun 30, 2015 at 1:53 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> On Thu, Jun 25, 2015 at 10:41 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> Hello, >> >> We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby >> it appears that instructions like >> >> 00000028: b5000409 08000780 add rn f32 $r2 $r2 neg c0[$a1] >> 00000040: b500060d 08004780 add rn f32 $r3 $r3 neg c0[$a1+0x4] >> >> or with nvdisasm: >> >> .headerflags @"EF_CUDA_SM12 EF_CUDA_PTX_SM(EF_CUDA_SM12)" >> /*0000*/ FADD R2, R2, -c[0x0][A1+0x0]; /* 0x08000780b5000409 */ >> /*0008*/ FADD R3, R3, -c[0x0][A1+0x1]; /* 0x08004780b500060d */ >> >> don't appear to execute properly. However just MOV'ing the values into >> registers works fine. This was observed on a G92 chip. See bug >> https://bugs.freedesktop.org/show_bug.cgi?id=91056. >> >> I was hoping you could save me some time and let me know what >> instructions can load things like c0[$a1+4] (or maybe it's only in >> combination with the modifier?), and which Tesla-family GPU's have >> those restrictions. > > Hm, there's something more subtle going on here. Please disregard. A > simple shader on my GT215 for both vertex and fragment demonstrates > that those instructions work at least some of the time. (I didn't have > a nv50-era card plugged in when I was asking the question, so I > couldn't check for myself.) Perhaps there's something more subtle > going on here, like non-uniformity across execution units...Just to close the loop on this one, turns out our code emission was just plain wrong. The IR was asking for c0[$a1], but were really emitting c0[0] when it was in the third position of a mad instruction. Oops! -ilia