bugzilla-daemon at freedesktop.org
2016-Jun-03 07:53 UTC
[Nouveau] [Bug 96355] New: Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 Bug ID: 96355 Summary: Performance: extra&costly SSBO validation even when SSBO aren't used Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/DRI/nouveau Assignee: nouveau at lists.freedesktop.org Reporter: gregory.hainaut at gmail.com QA Contact: nouveau at lists.freedesktop.org Hello, I'm currently trying to profile my application (PCSX2) with Mesa. I don't know if my GPU (GTX760) is properly reclocked but my app is often CPU limited. It could just be the IO operation that are very slow. Anyway, Perf-Event shows that nvc0_validate_buffers is (too) often called. + 8.78% 7.98% pcsx2_GSReplayL nouveau_dri.so nvc0_validate_buffers My understanding of the code is that every time we switch a shader program, a full SSBO bind/validation is called. nvc0_set_shader_buffers will dirty buffer state (NVC0_NEW_3D_BUFFERS). The trick is that my application doesn't use SSBO (only UBO). Is it expected to call SSBO validation code when the shader program doesn't use them? If not, a validation shortcut will be nice. If it can help, here the backtrace from nvc0_set_shader_buffers #0 nvc0_set_shader_buffers (pipe=0x87c51e0, shader=0, start=16, nr=16, buffers=0x0) at nvc0/nvc0_state.c:1331 #1 0xf464acc4 in st_bind_ssbos (shader=0x8b106bc, shader_type=0, st=0x877ca38, st=0x877ca38) at state_tracker/st_atom_storagebuf.c:86 #2 0xf464ad0d in bind_vs_ssbos (st=0x877ca38) at state_tracker/st_atom_storagebuf.c:101 #3 0xf4647411 in st_validate_state (st=0x877ca38, pipeline=ST_PIPELINE_RENDER) at state_tracker/st_atom.c:289 #4 0xf46638ef in st_draw_vbo (ctx=0x8801f60, prims=0xffffa990, nr_prims=1, ib=0xffffa980, index_bounds_valid=0 '\000', min_index=4294967295, max_index=4294967295, tfb_vertcount=0x0, stream=0, indirect=0x0) at state_tracker/st_draw.c:176 #5 0xf46270f9 in vbo_validated_drawrangeelements (ctx=ctx at entry=0x8801f60, mode=mode at entry=4, index_bounds_valid=0 '\000', start=4294967295, end=4294967295, count=6, type=5125, indices=0x25258, basevertex=19047, numInstances=1, baseInstance=0) at vbo/vbo_exec_array.c:849 #6 0xf46274bc in vbo_exec_DrawElementsBaseVertex (mode=4, count=6, type=5125, indices=0x25258, basevertex=19047) at vbo/vbo_exec_array.c:1007 #7 0xf6ddf422 in shared_dispatch_stub_702 (mode=4, count=6, type=5125, indices=0x25258, basevertex=19047) at shared-glapi/glapi_mapi_tmp.h:21235 #8 0xf6362e0a in Draw (this=<optimized out>, this=<optimized out>, basevertex=<optimized out>, mode=<optimized out>) Feel free to ask trace/debug info. Best regards -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160603/e0e3a73a/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-03 08:01 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #1 from gregory.hainaut at gmail.com --- As a side note, I potentially have a similar behavior with shader image (st_bind_*_images). I need to double check my engine as I used them sometimes. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160603/230bba37/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-03 08:17 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #2 from Samuel Pitoiset <samuel.pitoiset at gmail.com> --- Hi Gregory, Thanks for profiling Nouveau with perf, that's very nice. :-) Well, if your application doesn't use SSBO's, nvc0_validate_buffers() should not be called yeah. But this might happen when we switch between different contexts. Anyway, improving the validation path is on our todolist. :) Well, according to your backtrace, nvc0_set_shader_buffers() is called and will dirty NVC0_NEW_3D_BUFFERS, which will then call nvc0_validate_buffers() at draw time. I wonder why it's called if you are sure that your application doesn't use any SSBO's... Can you extract some shaders from your application to make sure no SSBO's are used? You can use NV50_PROG_DEBUG=1 for example (this will dump the TGSI code). -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160603/91cb9b62/attachment-0001.html>
bugzilla-daemon at freedesktop.org
2016-Jun-03 15:15 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #3 from gregory.hainaut at gmail.com --- Hi Samuel,> Thanks for profiling Nouveau with perf, that's very nice. :-)Well it is nice that I can do profiling :)> Well, if your application doesn't use SSBO's, nvc0_validate_buffers() > should not be called yeah. But this might happen when we switch between > different contexts. Anyway, improving the validation path is on our todolist. :)Yes, I'm sure. I don't know how to use SSBO.> I wonder why it's called if you are sure that your application doesn't use > any SSBO's...src/mesa/state_tracker/st_atom_storagebuf.c st_bind_*_ssbos struct contains the ST_NEW_*_PROGRAM flags. So every time, you call glUseProgram (or the 4.1 pipeline equivalent), flags will be asserted and a validation will be triggered. It is the same for the image in st_bind_*_images struct in st_atom_image.c. It is nice for the performance.> Can you extract some shaders from your application to make sure no SSBO's > are used? You can use NV50_PROG_DEBUG=1 for example (this will dump the TGSI code).All my shader could be found in glsl format (bit a mess of ifdef but no SSBO ;)) https://github.com/PCSX2/pcsx2/tree/master/plugins/GSdx/res/glsl Here an example (I'm not sure if it is the TGSI format). FRAG DCL IN[0], GENERIC[0], PERSPECTIVE DCL IN[1], GENERIC[3], PERSPECTIVE DCL OUT[0], COLOR DCL OUT[1], COLOR[1] DCL SAMP[0] DCL SAMP[1] DCL SVIEW[0], 2D, FLOAT DCL SVIEW[1], 2D, FLOAT DCL CONST[1][0] DCL CONST[2][0..1] DCL CONST[3][0..1] DCL CONST[4][0] DCL CONST[5][0..1] DCL CONST[6][0..7] DCL CONST[7][0] DCL TEMP[0..1], LOCAL IMM[0] FLT32 { 0.0000, 255.0000, 0.0500, 0.0078} IMM[1] FLT32 { 0.0039, 0.0000, 0.0000, 0.0000} 0: MOV TEMP[0].xy, IN[1].xyyy 1: TEX TEMP[0].w, TEMP[0], SAMP[0], 2D 2: MOV TEMP[1].y, IMM[0].xxxx 3: MOV TEMP[1].x, TEMP[0].wwww 4: TRUNC TEMP[0], IN[0] 5: MOV TEMP[1].xy, TEMP[1].xyyy 6: TEX TEMP[1], TEMP[1], SAMP[1], 2D 7: MAD TEMP[1], TEMP[1], IMM[0].yyyy, IMM[0].zzzz 8: TRUNC TEMP[1], TEMP[1] 9: MUL TEMP[0], TEMP[0], TEMP[1] 10: MUL TEMP[0], TEMP[0], IMM[0].wwww 11: TRUNC TEMP[0], TEMP[0] 12: MIN TEMP[0], TEMP[0], IMM[0].yyyy 13: MUL TEMP[1], TEMP[0], IMM[1].xxxx 14: MUL TEMP[0].x, TEMP[0].wwww, IMM[0].wwww 15: MOV OUT[0], TEMP[1] 16: MOV OUT[1], TEMP[0].xxxx 17: END -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160603/5deb5df4/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-03 15:32 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #4 from Ilia Mirkin <imirkin at alum.mit.edu> --- Right ... other things deal with this by using the cso_cache (or the backend driver handles it). We probably should for this as well. Add a per-buffer dirty bit and only set it if it's actually changed. Or add it to the cso_context logic. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160603/4c55fdf6/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-03 16:16 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #5 from Samuel Pitoiset <samuel.pitoiset at gmail.com> --- Thanks for the report. We will fix it. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160603/1aade55c/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-04 11:32 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #6 from gregory.hainaut at gmail.com --- Thanks you. I did a quick benchmark of my testcase: raw GIT => Mean by frame: 32.083336ms (31.168831fps) GIT + hack to remove the new program flags from SSBO and images => Mean by frame: 21.586538ms (46.325169fps) Note: testcase uses lots of shader bind, so I guess it is kinds of a worst case for the perf. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160604/a33a9dff/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-05 03:53 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #7 from Ilia Mirkin <imirkin at alum.mit.edu> --- I've pushed out some changes to nvc0 to reduce overhead of updating ssbo/images. There are additional patches I've sent out to validate ssbo/images more often in the st (right now we miss some cases). Let me know if the profile looks any better now. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160605/baccbff8/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-05 10:18 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #8 from Gediminas Jakutis <gediminas at varciai.lt> --- I don't know about the reporter's case, but I have ran some benchmarks and tests with f018456901ee291181ecce74c30b19c9f6731f06 (latest revision before those four patches) and fd6bbc2ee205ed02f66a8d8ef5b2adf4005d588c (the latest revision, with the four patches) on my GTX 770 + FX-8320 @ 4.1GHz, focusing on CPU-bound cases. The results are all to the better - on most games I tested I see 4-10% performance boost. Am only going to list a pair of highlights: · Age of Wonders III, my own severely CPU limited testcase: 21 fps -> 26 fps, a jump by a whooping 23.8% (still CPU-bound, though). · Payday 2, well, this game has no [reproducable] way to benchmark it, but the gameplay used to be nightmare filled with severe rubber-banding, running just some 18-22 fps in many situations, all while painfully CPU-bound. Now, most of rubber-banding is either gone or is a lot less noticeable. The framerate in these aforementioned situations went up to 25-60; dipping below 30 very rarely, while mostly maintaining over 2x performance boost. Basically, these four patches made the game *playable* on nouveau. (The game is still very painfully CPU-bound, though.) So, at least here, I can see clear performance benefits. Will leave to be marked as RESOLVED by the reporter; don't want to hijack his issue. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160605/1efd277f/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-05 11:09 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #9 from gregory.hainaut at gmail.com --- Hello, It is much better. I disabled my cpu turbo to reduce perf variation hence the smaller value. I'm now around 33-34 fps with latest git. For reference, if I disable validation completely validation with an hack. I'm around 35-36fps. It isn't completely free but it feels good enough. Maybe one can create a benchmark test ping-pong between 2 differents programs (could be the same compiled twice). Issue can be closed. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160605/693ac253/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-05 14:42 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #10 from gregory.hainaut at gmail.com --- Hi Ilia, You told me by IRC that you validate all SSBOs when one is updated. I suspecting a similar patter for UBO. I.e. all UBOs are validated when one is updated. Potentially validation is even done for all shader stages. Anyway, I move a bit my UBO declaration to reduce the number of active UBO for a draw call. And I managed to win a couples of fps (67 fps => 70 fps). So it might worth to investigate further the single SSBO/UBO bucket validation. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160605/54aaa513/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-05 15:41 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #11 from Ilia Mirkin <imirkin at alum.mit.edu> --- (In reply to gregory.hainaut from comment #10)> Hi Ilia, > > You told me by IRC that you validate all SSBOs when one is updated. I > suspecting a similar patter for UBO. I.e. all UBOs are validated when one is > updated.Nope. UBOs (and textures) have their individual validation "buckets".> > Potentially validation is even done for all shader stages. Anyway, I move a > bit my UBO declaration to reduce the number of active UBO for a draw call. > And I managed to win a couples of fps (67 fps => 70 fps). > > So it might worth to investigate further the single SSBO/UBO bucket > validation.There are different stages of validation. It's all extremely confusing. st/mesa validates everything, because it has to - which UBO is bound to where is based on program uniform settings: binding &st->ctx->UniformBufferBindings[shader->UniformBlocks[i]->Binding]; So if either of those are updated, we have to revalidate. However there's a CSO cache backing UBOs, which will avoid propagating the set to the backend if nothing has changed. I don't think we can do much better than this without some much larger rejiggers. Perhaps there are still some things we can do to speed up common scenarios like "there are no ubos" or "there are no ssbos" or "there are no images". But it doesn't seem immediately apparent to me. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160605/5ac73732/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-06 08:11 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 gregory.hainaut at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #12 from gregory.hainaut at gmail.com --- Actually what I saw is that all UBOs are validated when programs are switched. But I guess it is normal. I need to dig further. Thanks for the fixes. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160606/1c14405e/attachment.html>
bugzilla-daemon at freedesktop.org
2016-Jun-06 09:17 UTC
[Nouveau] [Bug 96355] Performance: extra&costly SSBO validation even when SSBO aren't used
https://bugs.freedesktop.org/show_bug.cgi?id=96355 --- Comment #13 from Karol Herbst <freedesktop at karolherbst.de> --- (In reply to Gediminas Jakutis from comment #8)> I don't know about the reporter's case, but I have ran some benchmarks and > tests with f018456901ee291181ecce74c30b19c9f6731f06 (latest revision before > those four patches) and fd6bbc2ee205ed02f66a8d8ef5b2adf4005d588c (the latest > revision, with the four patches) on my GTX 770 + FX-8320 @ 4.1GHz, focusing > on CPU-bound cases. > > The results are all to the better - on most games I tested I see 4-10% > performance boost. Am only going to list a pair of highlights: > > · Age of Wonders III, my own severely CPU limited testcase: 21 fps -> 26 > fps, a jump by a whooping 23.8% (still CPU-bound, though). > · Payday 2, well, this game has no [reproducable] way to benchmark it, but > the gameplay used to be nightmare filled with severe rubber-banding, running > just some 18-22 fps in many situations, all while painfully CPU-bound. Now, > most of rubber-banding is either gone or is a lot less noticeable. The > framerate in these aforementioned situations went up to 25-60; dipping below > 30 very rarely, while mostly maintaining over 2x performance boost. > Basically, these four patches made the game *playable* on nouveau. (The game > is still very painfully CPU-bound, though.) > > So, at least here, I can see clear performance benefits. > Will leave to be marked as RESOLVED by the reporter; don't want to hijack > his issue.I saw the same thing with PAYDAY 2, but I couldn't restore the low perf so I guess they just reworked their engine while they added the SMAA and SSAO thing, so I doubt those patches had anything to do with that :/ -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20160606/77ac1135/attachment-0001.html>