thr3ads.net - Nouveau - [Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Luca Barbieri

2010-Jan-18 16:27 UTC

[Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

So, basically, you allocate the rasterizer units according to the
vertex shader, and when the fragment shader comes up, you say "write
rasterizer output 4 to fragment input 1000000"?

The current nouveau drivers can't do this.
There are "routing" registers in hardware, but I think the nVidia
proprietary driver (at least without GLSL) leaves them unaltered after
initialization and I don't think we really know how they would work.
They are also very likely limited to at most 256 values (maybe even
less, such as 16), even if they can actually be made to work.

The way the current pre-nv50 driver works is that there are 8 slots,
each of which has an interpolator and a fixed associated vertex shader
output and fixed fragment input. This seems a rather obvious way to
design hardware, and so shouldn't be uncommon.

Thus, the inputs/outputs can't be packed, because that will break if
the fragment shader doesn't use a vertex output.
And there is no way to correct that when the fragment program comes
up, other than recompiling the vertex shader, which would be very
desirable to avoid having to do.

Non-GLSL programs can only use the 8 texcoords, so there is no problem
there since hardware supports 8 slots.

Thus, I think my proposed solution is the simplest and most efficient approach.
Any other solution would require much more, and slower, code in the
Gallium drivers for nv30, nv40, and maybe Intel too.

Corbin Simpson

2010-Jan-18 19:41 UTC

head link

[Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

Actually, we don't even bother worrying about the rasterizer's routing
table until we've bound a pair of shaders and start drawing. Right
before the draw call, we re-generate, among other things, routing
tables for the vert shader and the rasterizer.

This is *incredibly* powerful, because it means we only have to
compile the shaders once, and load the rasterizer tables based on
those shaders. I even baked up a CSO to cache the tables, but it
turned out to be an overall slowdown.

If you get this patch in, then you'll still have to fight with every
other state tracker that doesn't prettify their TGSI. It would be a
much better approach to attempt to RE the routing tables.

Also FYI the r300-r500 rasterizer can only handle, off the top of my
head, 16 sets of vectors total (8 colors, 8 texcoords) so you're not
the only ones with this kind of limitation. The situation gets better
for r600 and nv50.

~ C.

On Mon, Jan 18, 2010 at 8:27 AM, Luca Barbieri <luca at luca-barbieri.com>
wrote:> So, basically, you allocate the rasterizer units according to the
> vertex shader, and when the fragment shader comes up, you say "write
> rasterizer output 4 to fragment input 1000000"?
>
> The current nouveau drivers can't do this.
> There are "routing" registers in hardware, but I think the nVidia
> proprietary driver (at least without GLSL) leaves them unaltered after
> initialization and I don't think we really know how they would work.
> They are also very likely limited to at most 256 values (maybe even
> less, such as 16), even if they can actually be made to work.
>
> The way the current pre-nv50 driver works is that there are 8 slots,
> each of which has an interpolator and a fixed associated vertex shader
> output and fixed fragment input. This seems a rather obvious way to
> design hardware, and so shouldn't be uncommon.
>
> Thus, the inputs/outputs can't be packed, because that will break if
> the fragment shader doesn't use a vertex output.
> And there is no way to correct that when the fragment program comes
> up, other than recompiling the vertex shader, which would be very
> desirable to avoid having to do.
>
> Non-GLSL programs can only use the 8 texcoords, so there is no problem
> there since hardware supports 8 slots.
>
> Thus, I think my proposed solution is the simplest and most efficient
approach.
> Any other solution would require much more, and slower, code in the
> Gallium drivers for nv30, nv40, and maybe Intel too.
>
>
------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for
Conference
> attendees to learn about information security's most important issues
through
> interactions with peers, luminaries and emerging and established companies.
> p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> Mesa3d-dev mailing list
> Mesa3d-dev at lists.sourceforge.net
> lists.sourceforge.net/lists/listinfo/mesa3d-dev
>


-- 
Only fools are easily impressed by what is only
barely beyond their reach. ~ Unknown

Corbin Simpson
<MostAwesomeDude at gmail.com>

Luca Barbieri

2010-Jan-18 20:06 UTC

head link

[Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

> If you get this patch in, then you'll still have to fight with every
> other state tracker that doesn't prettify their TGSI. It would be a
> much better approach to attempt to RE the routing tables.
I don't think there any users of the Gallium interface that need more
than 8 vertex outputs/fragment inputs and don't use sequential values
starting at 0, except the GLSL linker without this patch.

ARB_fragment_program and ARB_vertex_program is limited to texcoord
slots, and Mesa should advertise only 8 of them.
Also users of this interface will likely only use as many as they
need, sequentially.

Vega, xorg seem to only use up to 2 slots.
g3dvl up to 8 (starting from 0, of course).

Cards with less than 8 slots may sometimes still have problems, but
such cards will probably be DX8 cards that don't work anyway.

Furthermore, even if you can route things, usings vertex outputs and
fragment inputs with lower indices may be more efficient anyway.

As for REing the tables, it may not be possible.
This is the code that apparently sets them up right now:
	/* vtxprog output routing */
	so_method(so, screen->curie, 0x1fc4, 1);
	so_data  (so, 0x06144321);
	so_method(so, screen->curie, 0x1fc8, 2);
	so_data  (so, 0xedcba987);
	so_data  (so, 0x00000021);
	so_method(so, screen->curie, 0x1fd0, 1);
	so_data  (so, 0x00171615);
	so_method(so, screen->curie, 0x1fd4, 1);
	so_data  (so, 0x001b1a19);

This makes me think that only 4 bits might be used for the values
(look at the arithmetic progressions of 4-bit values), so that there
is a limit of 16 vertex output/fragment inputs.
If GLSL starts at index 10, we are still in trouble because less than
8 varyings will be available.

Also leaving vertex outputs/fragment inputs unused by starting at high
values may be bad for performance even if supported, as it may lead to
a bigger register file and thus less simultaneous GPU threads running.

In other words, having GLSL start at index 10 is easily avoided, and
causes problems nothing else causes, so why not just stop doing that?

Apparently Analagous Threads

Search for more possibly parallel threads

Nouveau - Jan 2010 - [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

[Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

[Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

[Nouveau] [Mesa3d-dev] [PATCH] glsl: put varyings in texcoord slots

Apparently Analagous Threads