On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> > On 25.05.2015 07:17, Dave Airlie wrote: >> >> On 25 May 2015 at 08:11, Marek Olšák <maraeo at gmail.com> wrote: >>> >>> It's the same on Radeon. There are 2x ClipOrCullDistance output >>> vectors and a mask saying it should clip or cull or do nothing. >>> >>> Marek >>> >> My thinking was gallium should have a single semantic and a mask in >> the shader definition maybe. >> >> though it doesn't solve the does nvidia do the right thing with >> cull[0] and clip[0], and what is the right thing. >> >> Dave. > > > I'm still convinced that both clip[0] and cull[0] should be possible. Plus i > have written a shader_test for this a while ago which you pushed to piglit > (fs-cull-and-clip-distance-different.shader_test). If i remember right > nvidia passed that test just fine.My take (and note that I last read the extension many months ago) is that you're supposed to figure out the max gl_ClipDistance[] written, and then write all your cull distances above that. So if you, e.g., have something like gl_ClipDistance[5] = 1; gl_CullDistance[0] = 1; Then it would decide that there are 6 clip distances (or if there's an explicit out float gl_ClipDistance[n], then use that), and 1 cull distance. In the TGSI, I'm thinking this might look approximately like PROPERTY CULL_MASK (1<<6) DCL OUT[0], CLIPDIST[0] DCL OUT[1], CLIPDIST[1] MOV OUT[1].y, 1 (clip distance[5]) MOV OUT[1].z, 1 (cull distance[0]) Then basically you'd have (rast->clip_enable & shader->actual_clip_writes_mask) | cull_mask the enabled distances cull_mask = cull mask This would work *very* well for nouveau, not sure how suitable it is for other hardware. Cheers, -ilia
Another thing to consider is linking shaders that occur before the rasterizer (e.g. any two shaders from VS->TCS->TES->GS). The maximum number of written distances is still 8, but what happens if VS writes 1x clip and 7x cull and GS reads 8x clip and no cull? In this case it's basically a generic varying. How is linking separate shaders supposed to work with one combined clip-or-cull array? It doesn't seem to be possible. Marek On Mon, May 25, 2015 at 5:07 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:> On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> >> On 25.05.2015 07:17, Dave Airlie wrote: >>> >>> On 25 May 2015 at 08:11, Marek Olšák <maraeo at gmail.com> wrote: >>>> >>>> It's the same on Radeon. There are 2x ClipOrCullDistance output >>>> vectors and a mask saying it should clip or cull or do nothing. >>>> >>>> Marek >>>> >>> My thinking was gallium should have a single semantic and a mask in >>> the shader definition maybe. >>> >>> though it doesn't solve the does nvidia do the right thing with >>> cull[0] and clip[0], and what is the right thing. >>> >>> Dave. >> >> >> I'm still convinced that both clip[0] and cull[0] should be possible. Plus i >> have written a shader_test for this a while ago which you pushed to piglit >> (fs-cull-and-clip-distance-different.shader_test). If i remember right >> nvidia passed that test just fine. > > My take (and note that I last read the extension many months ago) is > that you're supposed to figure out the max gl_ClipDistance[] written, > and then write all your cull distances above that. So if you, e.g., > have something like > > gl_ClipDistance[5] = 1; > gl_CullDistance[0] = 1; > > Then it would decide that there are 6 clip distances (or if there's an > explicit out float gl_ClipDistance[n], then use that), and 1 cull > distance. In the TGSI, I'm thinking this might look approximately like > > PROPERTY CULL_MASK (1<<6) > DCL OUT[0], CLIPDIST[0] > DCL OUT[1], CLIPDIST[1] > MOV OUT[1].y, 1 (clip distance[5]) > MOV OUT[1].z, 1 (cull distance[0]) > > Then basically you'd have > > (rast->clip_enable & shader->actual_clip_writes_mask) | cull_mask > the enabled distances > cull_mask = cull mask > > This would work *very* well for nouveau, not sure how suitable it is > for other hardware. > > Cheers, > > -ilia
Tobias Klausmann
2015-May-27 19:05 UTC
[Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 27.05.2015 18:28, Marek Olšák wrote:> Another thing to consider is linking shaders that occur before the > rasterizer (e.g. any two shaders from VS->TCS->TES->GS). The maximum > number of written distances is still 8, but what happens if VS writes > 1x clip and 7x cull and GS reads 8x clip and no cull?i think this should be rejected anyway (in the glsl?!), constraining vs output to be the same as gs input where the last definition is the valid one....> In this case > it's basically a generic varying. How is linking separate shaders > supposed to work with one combined clip-or-cull array? It doesn't seem > to be possible. > > Marek > > On Mon, May 25, 2015 at 5:07 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: >> On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann >> <tobias.johannes.klausmann at mni.thm.de> wrote: >>> On 25.05.2015 07:17, Dave Airlie wrote: >>>> On 25 May 2015 at 08:11, Marek Olšák <maraeo at gmail.com> wrote: >>>>> It's the same on Radeon. There are 2x ClipOrCullDistance output >>>>> vectors and a mask saying it should clip or cull or do nothing. >>>>> >>>>> Marek >>>>> >>>> My thinking was gallium should have a single semantic and a mask in >>>> the shader definition maybe. >>>> >>>> though it doesn't solve the does nvidia do the right thing with >>>> cull[0] and clip[0], and what is the right thing. >>>> >>>> Dave. >>> >>> I'm still convinced that both clip[0] and cull[0] should be possible. Plus i >>> have written a shader_test for this a while ago which you pushed to piglit >>> (fs-cull-and-clip-distance-different.shader_test). If i remember right >>> nvidia passed that test just fine.Ah btw, if we follow Brian Paul, overlapping indexes are fine! (and it is way more intuitive to use for a shader developer)>> My take (and note that I last read the extension many months ago) is >> that you're supposed to figure out the max gl_ClipDistance[] written, >> and then write all your cull distances above that. So if you, e.g., >> have something like >> >> gl_ClipDistance[5] = 1; >> gl_CullDistance[0] = 1; >> >> Then it would decide that there are 6 clip distances (or if there's an >> explicit out float gl_ClipDistance[n], then use that), and 1 cull >> distance. In the TGSI, I'm thinking this might look approximately like >> >> PROPERTY CULL_MASK (1<<6) >> DCL OUT[0], CLIPDIST[0] >> DCL OUT[1], CLIPDIST[1] >> MOV OUT[1].y, 1 (clip distance[5]) >> MOV OUT[1].z, 1 (cull distance[0]) >> >> Then basically you'd have >> >> (rast->clip_enable & shader->actual_clip_writes_mask) | cull_mask >> the enabled distances >> cull_mask = cull mask >> >> This would work *very* well for nouveau, not sure how suitable it is >> for other hardware. >> >> Cheers, >> >> -ilia
Tobias Klausmann
2015-Jul-08 20:04 UTC
[Nouveau] [RFC PATCH 00/11] Implement ARB_cull_distance
On 25.05.2015 17:07, Ilia Mirkin wrote:> On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann > <tobias.johannes.klausmann at mni.thm.de> wrote: >> On 25.05.2015 07:17, Dave Airlie wrote: >>> On 25 May 2015 at 08:11, Marek Olšák <maraeo at gmail.com> wrote: >>>> It's the same on Radeon. There are 2x ClipOrCullDistance output >>>> vectors and a mask saying it should clip or cull or do nothing. >>>> >>>> Marek >>>> >>> My thinking was gallium should have a single semantic and a mask in >>> the shader definition maybe. >>> >>> though it doesn't solve the does nvidia do the right thing with >>> cull[0] and clip[0], and what is the right thing. >>> >>> Dave. >> >> I'm still convinced that both clip[0] and cull[0] should be possible. Plus i >> have written a shader_test for this a while ago which you pushed to piglit >> (fs-cull-and-clip-distance-different.shader_test). If i remember right >> nvidia passed that test just fine. > My take (and note that I last read the extension many months ago) is > that you're supposed to figure out the max gl_ClipDistance[] written, > and then write all your cull distances above that. So if you, e.g., > have something like > > gl_ClipDistance[5] = 1; > gl_CullDistance[0] = 1; > > Then it would decide that there are 6 clip distances (or if there's an > explicit out float gl_ClipDistance[n], then use that), and 1 cull > distance. In the TGSI, I'm thinking this might look approximately like > > PROPERTY CULL_MASK (1<<6) > DCL OUT[0], CLIPDIST[0] > DCL OUT[1], CLIPDIST[1] > MOV OUT[1].y, 1 (clip distance[5]) > MOV OUT[1].z, 1 (cull distance[0]) > > Then basically you'd have > > (rast->clip_enable & shader->actual_clip_writes_mask) | cull_mask > the enabled distances > cull_mask = cull mask > > This would work *very* well for nouveau, not sure how suitable it is > for other hardware. > > Cheers, > > -iliaI wonder where this step should be implemented after all. It was brought up that llvmpipe already supports cull_distance (it does!), so maybe we should implement this in the drivers to evade llvmpipe breakage. Any suggestions appreciated :) Tobias
On Wed, Jul 8, 2015 at 4:04 PM, Tobias Klausmann <tobias.johannes.klausmann at mni.thm.de> wrote:> > > On 25.05.2015 17:07, Ilia Mirkin wrote: >> >> On Mon, May 25, 2015 at 9:40 AM, Tobias Klausmann >> <tobias.johannes.klausmann at mni.thm.de> wrote: >>> >>> On 25.05.2015 07:17, Dave Airlie wrote: >>>> >>>> On 25 May 2015 at 08:11, Marek Olšák <maraeo at gmail.com> wrote: >>>>> >>>>> It's the same on Radeon. There are 2x ClipOrCullDistance output >>>>> vectors and a mask saying it should clip or cull or do nothing. >>>>> >>>>> Marek >>>>> >>>> My thinking was gallium should have a single semantic and a mask in >>>> the shader definition maybe. >>>> >>>> though it doesn't solve the does nvidia do the right thing with >>>> cull[0] and clip[0], and what is the right thing. >>>> >>>> Dave. >>> >>> >>> I'm still convinced that both clip[0] and cull[0] should be possible. >>> Plus i >>> have written a shader_test for this a while ago which you pushed to >>> piglit >>> (fs-cull-and-clip-distance-different.shader_test). If i remember right >>> nvidia passed that test just fine. >> >> My take (and note that I last read the extension many months ago) is >> that you're supposed to figure out the max gl_ClipDistance[] written, >> and then write all your cull distances above that. So if you, e.g., >> have something like >> >> gl_ClipDistance[5] = 1; >> gl_CullDistance[0] = 1; >> >> Then it would decide that there are 6 clip distances (or if there's an >> explicit out float gl_ClipDistance[n], then use that), and 1 cull >> distance. In the TGSI, I'm thinking this might look approximately like >> >> PROPERTY CULL_MASK (1<<6) >> DCL OUT[0], CLIPDIST[0] >> DCL OUT[1], CLIPDIST[1] >> MOV OUT[1].y, 1 (clip distance[5]) >> MOV OUT[1].z, 1 (cull distance[0]) >> >> Then basically you'd have >> >> (rast->clip_enable & shader->actual_clip_writes_mask) | cull_mask >> the enabled distances >> cull_mask = cull mask >> >> This would work *very* well for nouveau, not sure how suitable it is >> for other hardware. >> >> Cheers, >> >> -ilia > > I wonder where this step should be implemented after all. It was brought up > that llvmpipe already supports cull_distance (it does!), so maybe we should > implement this in the drivers to evade llvmpipe breakage. Any suggestions > appreciated :) > > TobiasI believe that the later feedback from Brian was that my approach was a bad one and we should use CULLDIST instead, which also reflects how GL has it. However it's important to specify *somewhere* how many clip distances are used since it all gets lowered into the 2x vec4. It might be annoying to derive it from writes to CLIPDIST[0/1].xyzw dest masks. Although nouveau might already do that anyways... -ilia