And an additional question: I have a trace here where a reserved bit
from CommonWord0 is set. Is that just random values that aren't
cleared by the driver, or does it have some significance? Here is the
full shader:
HEADER:
0x06040461 0 = { SPH = VTG | VERSION = 3 | KIND = VP_B |
SASS_VERSION = 2 | LDST_ENABLE | SO_MASK = 0 | 0x2000000 }
0x00000000 1 = { LMEM_POS_ALLOC = 0 | PATCH_ATTRIBUTES = 0 }
0x00000000 2 = { LMEM_NEG_ALLOC = 0 | THREADS_PER_PRIM = 0 }
0x00000000 3 = { WARP_CSTACK_SIZE = 0 | OUTPUT_PRIM = 0 }
0x00000000 4 = { MAX_OUTPUT_VERTS = 0 | MIN_OUT_READ_SLOT = 0 |
MAX_OUT_READ_SLOT = 0 }
0x00000000 ATTR_EN_0 = 0
0x00000000 ATTR_EN_1 = 0
0x00000000 ATTR_EN_2 = 0
0x00000000 ATTR_EN_3 = 0
0x00000000 ATTR_EN_4 = 0
0x00000000 ATTR_EN_5 = { 0 }
0x00000000 11 = 0
0x00000000 12 = 0
0x0001f000 EXPORT_EN_0 = { HPOS = 0xf | 0x10000 }
0x00000000 EXPORT_EN_1 = 0
0x00000000 EXPORT_EN_2 = 0
0x00000000 EXPORT_EN_3 = 0
0x00000000 EXPORT_EN_4 = 0
0x00000000 EXPORT_EN_5 = { CLIP_DISTANCE = 0 | UNK12 = 0 }
0x00000000 19 = 0
CODE:
00000000: a01088b0 08bcb810 sched 0x2c 0x22 0x4 0x28 0x4 0x2e 0x2f
00000008: 0b1ffc1e 5b601c07 set $p0 0x1 ge u32 0x0 c0[0x3858]
00000010: 1000003c 12000000 $p0 bra 0x38
00000018: 0a1c0002 64c03c07 mov b32 $r0 c0[0x3850]
00000020: 0a9c0006 64c03c07 mov b32 $r1 c0[0x3854]
00000028: 001c0000 cc800000 ld b32 $r0 cg g[$r0d]
00000030: 041c003c 12000000 bra 0x40
00000038: 7f9c0002 e4c03c00 C mov b32 $r0 0x0
00000040: 9c108010 090c8c10 C sched 0x4 0x20 0x4 0x27 0x4 0x23 0x43
00000048: 001c2802 e5c00000 cvt rn f32 $r0 u32 $r0
00000050: 341c0006 64c03c00 mov b32 $r1 c0[0x1a0]
00000058: 349c000a 64c03c00 mov b32 $r2 c0[0x1a4]
00000060: 351c000e 64c03c00 mov b32 $r3 c0[0x1a8]
00000068: 359c0012 64c03c00 mov b32 $r4 c0[0x1ac]
00000070: 381ffc06 7f03fc00 st b32 a[0x70] $r1 0x0 0x0
00000078: 3a1ffc0a 7f03fc00 st b32 a[0x74] $r2 0x0 0x0
00000080: 3c110d0c 08000001 sched 0x43 0x43 0x4 0x4f 0x0 0x0 0x0
00000088: 3c1ffc0e 7f03fc00 st b32 a[0x78] $r3 0x0 0x0
00000090: 3e1ffc12 7f03fc00 st b32 a[0x7c] $r4 0x0 0x0
00000098: 401ffc02 7f03fc00 st b32 a[0x80] $r0 0x0 0x0
000000a0: 001c003c 18000000 exit
000000a8: fc1c003c 12007fff C bra 0xa8
000000b0: 001c3c02 85800000 nop
000000b8: 001c3c02 85800000 nop
On Sat, May 23, 2015 at 5:35 PM, Ilia Mirkin <imirkin at alum.mit.edu>
wrote:> On Thu, May 21, 2015 at 11:32 AM, Ilia Mirkin <imirkin at
alum.mit.edu> wrote:
>> On Thu, May 21, 2015 at 10:05 AM, Robert Morell <rmorell at
nvidia.com> wrote:
>>> Hi Ilia,
>>>
>>> On Sat, May 02, 2015 at 12:34:21PM -0400, Ilia Mirkin wrote:
>>>> Hi,
>>>>
>>>> As I'm looking to add some support to nouveau for features
like atomic
>>>> counters and images, I'm running into some confusion about
what the
>>>> first word of the shader header means. Here is the definition
as we
>>>> have it today:
>>>
>>> [...]
>>>
>>>> However I know that these are somewhat wrong. I've seen
shaders that
>>>> use gmem accesses (i.e. mov r0, [r0]) that just have the LMEM
enable
>>>> bit set (and they use no lmem). And I've seen additional
bits set, esp
>>>> relating to images, but I haven't spent enough time looking
at all the
>>>> variations to make sense of it yet. For example, I think that
Fermi
>>>> and Kepler+ have different meanings for some of the bits.
>>>
>>> Those look pretty close :)
>>>
>>>> I was hoping you could just release the docs for the shader
headers,
>>>> or at least the first word of the shader header.
>>>
>>> We've posted the specification for the full Shader Program
Header to our
>>> GPU documentation site here:
>>>
>>>
ftp://download.nvidia.com/open-gpu-doc/Shader-Program-Header/1/Shader-Program-Header.html
>>>
>>> I hope it helps clear things up.
>>
>> Yep, just a few follow-up questions:
>>
>> - SPH Type 1 and type 2 appear to be flipped wrt the tables --
"When
>> PS is used, field SphType in CommonWord0 must be set to 1; similarly,
>> when VTG is used, SphType in CommonWord0 must be set to 2." But
the
>> "Table 1. SPH Type 1 Definition" is clearly meant for VTG and
table 2
>> is clearly meant for PS...
>> - You skip over SassVersion -- what is that?
>> - You have a funny note in there -- "Triangles generated by the
>> geometry shader always have all their edge flags set to TRUE" --
that
>> is the *only* reference to edge flags in the whole document. Right now
>> we do some crazy thing to get edge flags right on fermi+ (and I think
>> we just get them wrong on tesla). Is there a way to emit edge flags
>> from vertex shader?
>> - To be clear: DoesLoadOrStore -- *any* load/store? Even LDC? ALD?
>
> Oh, and one more little correction:
>
> """
> The SPH field OutputTopology sets the primitive topology of the
> vertices that are output from the pipe stage. This field is only used
> with geometry shaders, where the value must be greater than zero and
> has a maximum of 1024. The allowed values are: ... [the correct values
> for OutputTopology]
> """
>
> The 1024 thing seems like it probably applies to MaxOutputVertexCount
> in CommonWord4.
>
> -ilia