thr3ads.net - Nouveau - [Nouveau] What are the restrictions around loading indirect constbuf values [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2015-Jun-25 14:41 UTC

[Nouveau] What are the restrictions around loading indirect constbuf values

Hello,

We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby
it appears that instructions like

00000028: b5000409 08000780     add rn f32 $r2 $r2 neg c0[$a1]
00000040: b500060d 08004780     add rn f32 $r3 $r3 neg c0[$a1+0x4]

or with nvdisasm:

        .headerflags    @"EF_CUDA_SM12 EF_CUDA_PTX_SM(EF_CUDA_SM12)"
        /*0000*/         FADD R2, R2, -c[0x0][A1+0x0];  /* 0x08000780b5000409 */
        /*0008*/         FADD R3, R3, -c[0x0][A1+0x1];  /* 0x08004780b500060d */

don't appear to execute properly. However just MOV'ing the values into
registers works fine. This was observed on a G92 chip. See bug
https://bugs.freedesktop.org/show_bug.cgi?id=91056.

I was hoping you could save me some time and let me know what
instructions can load things like c0[$a1+4] (or maybe it's only in
combination with the modifier?), and which Tesla-family GPU's have
those restrictions.

Thanks,

  -ilia

Ilia Mirkin

2015-Jun-30 05:53 UTC

head link

[Nouveau] What are the restrictions around loading indirect constbuf values

On Thu, Jun 25, 2015 at 10:41 AM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> Hello,
>
> We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby
> it appears that instructions like
>
> 00000028: b5000409 08000780     add rn f32 $r2 $r2 neg c0[$a1]
> 00000040: b500060d 08004780     add rn f32 $r3 $r3 neg c0[$a1+0x4]
>
> or with nvdisasm:
>
>         .headerflags    @"EF_CUDA_SM12
EF_CUDA_PTX_SM(EF_CUDA_SM12)"
>         /*0000*/         FADD R2, R2, -c[0x0][A1+0x0];  /*
0x08000780b5000409 */
>         /*0008*/         FADD R3, R3, -c[0x0][A1+0x1];  /*
0x08004780b500060d */
>
> don't appear to execute properly. However just MOV'ing the values
into
> registers works fine. This was observed on a G92 chip. See bug
> https://bugs.freedesktop.org/show_bug.cgi?id=91056.
>
> I was hoping you could save me some time and let me know what
> instructions can load things like c0[$a1+4] (or maybe it's only in
> combination with the modifier?), and which Tesla-family GPU's have
> those restrictions.
Hm, there's something more subtle going on here. Please disregard. A
simple shader on my GT215 for both vertex and fragment demonstrates
that those instructions work at least some of the time. (I didn't have
a nv50-era card plugged in when I was asking the question, so I
couldn't check for myself.) Perhaps there's something more subtle
going on here, like non-uniformity across execution units...

  -ilia

Ilia Mirkin

2015-Jul-02 18:54 UTC

head link

[Nouveau] What are the restrictions around loading indirect constbuf values

On Tue, Jun 30, 2015 at 1:53 AM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> On Thu, Jun 25, 2015 at 10:41 AM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> Hello,
>>
>> We recently tracked down a bug on Tesla GPUs (i.e. G80-GT218) whereby
>> it appears that instructions like
>>
>> 00000028: b5000409 08000780     add rn f32 $r2 $r2 neg c0[$a1]
>> 00000040: b500060d 08004780     add rn f32 $r3 $r3 neg c0[$a1+0x4]
>>
>> or with nvdisasm:
>>
>>         .headerflags    @"EF_CUDA_SM12
EF_CUDA_PTX_SM(EF_CUDA_SM12)"
>>         /*0000*/         FADD R2, R2, -c[0x0][A1+0x0];  /*
0x08000780b5000409 */
>>         /*0008*/         FADD R3, R3, -c[0x0][A1+0x1];  /*
0x08004780b500060d */
>>
>> don't appear to execute properly. However just MOV'ing the
values into
>> registers works fine. This was observed on a G92 chip. See bug
>> https://bugs.freedesktop.org/show_bug.cgi?id=91056.
>>
>> I was hoping you could save me some time and let me know what
>> instructions can load things like c0[$a1+4] (or maybe it's only in
>> combination with the modifier?), and which Tesla-family GPU's have
>> those restrictions.
>
> Hm, there's something more subtle going on here. Please disregard. A
> simple shader on my GT215 for both vertex and fragment demonstrates
> that those instructions work at least some of the time. (I didn't have
> a nv50-era card plugged in when I was asking the question, so I
> couldn't check for myself.) Perhaps there's something more subtle
> going on here, like non-uniformity across execution units...
Just to close the loop on this one, turns out our code emission was
just plain wrong. The IR was asking for c0[$a1], but were really
emitting c0[0] when it was in the third position of a mad instruction.
Oops!

  -ilia

Nouveau - Jun 2015 - What are the restrictions around loading indirect constbuf values

[Nouveau] What are the restrictions around loading indirect constbuf values

[Nouveau] What are the restrictions around loading indirect constbuf values

[Nouveau] What are the restrictions around loading indirect constbuf values