thr3ads.net - Nouveau - [Nouveau] Tesla shader ISA question [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Ilia Mirkin

2014-Feb-28 07:37 UTC

[Nouveau] Tesla shader ISA question

Hello,

I've recently run into an unknown bit in Tesla shaders, and was hoping
you could shed some light on it. I believe they're related to clamping
of some sort. Here are 2 examples (from diff shaders):

a0000401 cc054780     cvt rpi f32 $r0 f32 $r2 [unknown: 00000000 00010000]
a000060d 8c014780     cvt rni s32 $r3 f32 $r3 [unknown: 00000000 00010000]

[This is intel-style syntax, cvt = convert/move, rni/rpi = rounding
mode stuff, hope that clears up the syntax...]

The destination register tends to go to a texture-related instruction
input, in some cases the layer (which is why I suspect clamping). Both
of these were seen on shaders compiled for GT215+ chips. What effect
does turning it on have exactly? Also, is this bit available on
earlier chips (if so, how early)?

Thanks,

  -ilia

Andy Ritger

2014-Apr-09 19:30 UTC

head link

[Nouveau] Tesla shader ISA question

Hi Ilia, sorry for the slow response.

This isn't my area of expertise, but as I understand it:

* You've correctly decoded both instructions:
   * The first is a float32-to-float32 conversion, applying ceil()
   * The second is a float32-to-signed-int32 conversion, rounding to 
     the nearest even integer

* For the {float,int}-to-{float,int} operations, bit 48 indicates
 whether the input is signed (1==signed), but that bit is ignored when 
 the source format is float (always signed). That should apply across 
 the entire Tesla architecture.

Where did the Tesla shader come from that you were decoding? I assume 
it was produced by the NVIDIA proprietary driver?

If I had to guess, I'd speculate that the shader compiler in the NVIDIA 
proprietary driver used an uninitialized variable, and then overwrote the 
bits that mattered for the opcode it was producing, leaving uninitialized
data in the unused bits.

I hope that helps,
- Andy Ritger

On Thu, Feb 27, 2014 at 11:37:40PM -0800, Ilia Mirkin
wrote:> Hello,
> 
> I've recently run into an unknown bit in Tesla shaders, and was hoping
> you could shed some light on it. I believe they're related to clamping
> of some sort. Here are 2 examples (from diff shaders):
> 
> a0000401 cc054780     cvt rpi f32 $r0 f32 $r2 [unknown: 00000000 00010000]
> a000060d 8c014780     cvt rni s32 $r3 f32 $r3 [unknown: 00000000 00010000]
> 
> [This is intel-style syntax, cvt = convert/move, rni/rpi = rounding
> mode stuff, hope that clears up the syntax...]
> 
> The destination register tends to go to a texture-related instruction
> input, in some cases the layer (which is why I suspect clamping). Both
> of these were seen on shaders compiled for GT215+ chips. What effect
> does turning it on have exactly? Also, is this bit available on
> earlier chips (if so, how early)?
> 
> Thanks,
> 
>   -ilia

Ilia Mirkin

2014-Apr-09 19:41 UTC

head link

[Nouveau] Tesla shader ISA question

On Wed, Apr 9, 2014 at 3:30 PM, Andy Ritger <aritger at nvidia.com>
wrote:> Hi Ilia, sorry for the slow response.
>
> This isn't my area of expertise, but as I understand it:
>
> * You've correctly decoded both instructions:
>    * The first is a float32-to-float32 conversion, applying ceil()
>    * The second is a float32-to-signed-int32 conversion, rounding to
>      the nearest even integer
>
> * For the {float,int}-to-{float,int} operations, bit 48 indicates
>  whether the input is signed (1==signed), but that bit is ignored when
>  the source format is float (always signed). That should apply across
>  the entire Tesla architecture.
>
> Where did the Tesla shader come from that you were decoding? I assume
> it was produced by the NVIDIA proprietary driver?
Yes, it came from looking at how the ARB_texture_cube_map_array stuff
was implemented for GT21x (it has that texprep instruction that munges
the tex args). When comparing to the nouveau-generated code, I noticed
that we limit the array index to 511 "manually", whereas the NVIDIA
proprietary driver did not. But it was setting that unknown bit. I was
hoping that the unknown bit was some "limit to 512" magic. Perhaps we
can just get rid of the clamping, I'm sure that the spec would frown
on trying to access out-of-bounds layers...
>
> If I had to guess, I'd speculate that the shader compiler in the NVIDIA
> proprietary driver used an uninitialized variable, and then overwrote the
> bits that mattered for the opcode it was producing, leaving uninitialized
> data in the unused bits.
Makes sense, thanks so much for looking into it!
>
> I hope that helps,
> - Andy Ritger
>
>
> On Thu, Feb 27, 2014 at 11:37:40PM -0800, Ilia Mirkin wrote:
>> Hello,
>>
>> I've recently run into an unknown bit in Tesla shaders, and was
hoping
>> you could shed some light on it. I believe they're related to
clamping
>> of some sort. Here are 2 examples (from diff shaders):
>>
>> a0000401 cc054780     cvt rpi f32 $r0 f32 $r2 [unknown: 00000000
00010000]
>> a000060d 8c014780     cvt rni s32 $r3 f32 $r3 [unknown: 00000000
00010000]
>>
>> [This is intel-style syntax, cvt = convert/move, rni/rpi = rounding
>> mode stuff, hope that clears up the syntax...]
>>
>> The destination register tends to go to a texture-related instruction
>> input, in some cases the layer (which is why I suspect clamping). Both
>> of these were seen on shaders compiled for GT215+ chips. What effect
>> does turning it on have exactly? Also, is this bit available on
>> earlier chips (if so, how early)?
>>
>> Thanks,
>>
>>   -ilia

Reasonably Related Threads

Search for more possibly parallel threads

Nouveau - Feb 2014 - Tesla shader ISA question

[Nouveau] Tesla shader ISA question

[Nouveau] Tesla shader ISA question

[Nouveau] Tesla shader ISA question

Reasonably Related Threads