Bryan Cantrill
2011-Jul-02 18:27 UTC
[dtrace-discuss] Reaping enablings on defunct providers
All, A longstanding problem that we have had is that enablings on defunct providers (e.g., USDT probes on dead processes) are not reaped: the probes will exist as long as there exists an enabling for them. When processes are turning over frequently (or when enablings are long-running), this can clog up the probe space to the point that DTrace probe creation will silently fail (an absolutely maddening failure mode). This has been hit several times over the years (we were nailed by it on our build machines at Fishworks) -- so when Theo Schlossnagle mentioned to me that he was getting killed by this problem in an environment with rapidly turning over Postgres processes, I was embarrassed that I hadn''t tackled it earlier. As it turns out, it was a tad thorny for locking reasons, but a patch for this problem is attached. We have integrated this into our bits at Joyent (internal ticket is OS-454, "enablings on defunct providers prevent providers from unregistering"), so you''ll see this show up soon at http://github.com/joyent/illumos-joyent -- but I wanted to give everyone here a heads-up. Anyway, patch is attached, with my thanks to Adam for a helpful discussion on fasttrap''s asynchronous provider retiring mechanics. Note that Adam hasn''t (yet) reviewed this, and its integration upstream should wait until he''s had a chance to look it over. Please let me know if you have any questions or comments! Thanks, Bryan -------------- next part -------------- A non-text attachment was scrubbed... Name: dtrace-reap.patch Type: application/octet-stream Size: 14770 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20110702/17593755/attachment.obj>
Theo Schlossnagle
2011-Jul-02 18:37 UTC
[dtrace-discuss] Reaping enablings on defunct providers
This is awesome. Thanks for taking the time to dive in and solve this one. We do indeed have some high-turnover postgres systems that are impossible to effectively trace due to this issue. I''m excited to get this into a production kernel over here... in due time :-) On Sat, Jul 2, 2011 at 2:27 PM, Bryan Cantrill <bryan at joyent.com> wrote:> All, > > A longstanding problem that we have had is that enablings on defunct > providers (e.g., USDT probes on dead processes) are not reaped: ?the > probes will exist as long as there exists an enabling for them. ?When > processes are turning over frequently (or when enablings are > long-running), this can clog up the probe space to the point that > DTrace probe creation will silently fail (an absolutely maddening > failure mode). ?This has been hit several times over the years (we > were nailed by it on our build machines at Fishworks) -- so when Theo > Schlossnagle mentioned to me that he was getting killed by this > problem in an environment with rapidly turning over Postgres > processes, I was embarrassed that I hadn''t tackled it earlier. ?As it > turns out, it was a tad thorny for locking reasons, but a patch for > this problem is attached. ?We have integrated this into our bits at > Joyent (internal ticket is OS-454, "enablings on defunct providers > prevent providers from unregistering"), so you''ll see this show up > soon at http://github.com/joyent/illumos-joyent -- but I wanted to > give everyone here a heads-up. > > Anyway, patch is attached, with my thanks to Adam for ?a helpful > discussion on fasttrap''s asynchronous provider retiring mechanics. > Note that Adam hasn''t (yet) reviewed this, and its integration > upstream should wait until he''s had a chance to look it over. Please > let me know if you have any questions or comments! > > ? ? ? ?Thanks, > ? ? ? ?Bryan >-- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle
Adam Leventhal
2011-Jul-12 23:38 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
Hey Bryan, This is great stuff, and -- as you say -- something that we''ve all wanted for a long long time. The code looks great. The only addition I''d ask for is if you considered having a case with bufpolicy=ring and/or anonymous tracing. I had a few questions: Can you explain the purpose of dtrace_unregister_defunct_reap? Is the idea to try for providers that were recently defunct and then give up after a minute? Should the fasttrap cleanup stuff use a taskq rather than its timeout stuff (not asking you to do it of course)? Can you talk a little about the approach you took? Obviously if you''re using speculative tracing or bufpolicy=ring or anonymous tracing it limits the efficacy of what you''ve done. Would it have been possible to disable the ECBs without destroying them? I assume that''s horribly facile, but I''d be interested to understand. dtrace_buffer_t structures should be cache-size aligned since they''re per-CPU structures, yes? Would that be worth noting explicitly? And why did you elect to have padding in two places rather than just 7 64-bit values at the end? Thanks; again, this is really great. Adam On Sat, Jul 2, 2011 at 11:27 AM, Bryan Cantrill <bryan at joyent.com> wrote:> All, > > A longstanding problem that we have had is that enablings on defunct > providers (e.g., USDT probes on dead processes) are not reaped: ?the > probes will exist as long as there exists an enabling for them. ?When > processes are turning over frequently (or when enablings are > long-running), this can clog up the probe space to the point that > DTrace probe creation will silently fail (an absolutely maddening > failure mode). ?This has been hit several times over the years (we > were nailed by it on our build machines at Fishworks) -- so when Theo > Schlossnagle mentioned to me that he was getting killed by this > problem in an environment with rapidly turning over Postgres > processes, I was embarrassed that I hadn''t tackled it earlier. ?As it > turns out, it was a tad thorny for locking reasons, but a patch for > this problem is attached. ?We have integrated this into our bits at > Joyent (internal ticket is OS-454, "enablings on defunct providers > prevent providers from unregistering"), so you''ll see this show up > soon at http://github.com/joyent/illumos-joyent -- but I wanted to > give everyone here a heads-up. > > Anyway, patch is attached, with my thanks to Adam for ?a helpful > discussion on fasttrap''s asynchronous provider retiring mechanics. > Note that Adam hasn''t (yet) reviewed this, and its integration > upstream should wait until he''s had a chance to look it over. Please > let me know if you have any questions or comments! > > ? ? ? ?Thanks, > ? ? ? ?Bryan > > _______________________________________________ > Developer mailing list > Developer at lists.illumos.org > http://lists.illumos.org/m/listinfo/developer > >-- Adam Leventhal, Delphix http://dtrace.org/blogs/ahl 275 Middlefield Road, Suite 50 Menlo Park, CA 94025 http://www.delphix.com
Bryan Cantrill
2011-Jul-13 00:09 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
Hey Adam,> This is great stuff, and -- as you say -- something that we''ve all > wanted for a long long time. The code looks great. The only addition > I''d ask for is if you considered having a case with bufpolicy=ring > and/or anonymous tracing.Supporting a ring buffer policy is a tad brutal, because it would require processing the entire ring to verify that the ECB isn''t in it. There are ways we could optimize that, but they''re ugly; unless someone is burning with this use case, I''d rather punt on it. Anonymous tracing will just work; once the enabling is claimed, the buffer will be switched, and it will become reapable.> I had a few questions: > > Can you explain the purpose of dtrace_unregister_defunct_reap? Is the > idea to try for providers that were recently defunct and then give up > after a minute?Yes; you don''t want to keep trying in perpetuity because you may be blocked by either a ring buffer policy or speculative tracing or something else that is managing to cache ECB state. So after a while, you want to give up.> Should the fasttrap cleanup stuff use a taskq rather than its timeout > stuff (not asking you to do it of course)?Eh, it''s fine. ;) I don''t think it would be a problem to reimplement that as a task queue, of course, but I would wait until there was some compelling reason to do so.> Can you talk a little about the approach you took? Obviously if you''re > using speculative tracing or bufpolicy=ring or anonymous tracing it > limits the efficacy of what you''ve done. Would it have been possible > to disable the ECBs without destroying them? I assume that''s horribly > facile, but I''d be interested to understand.It''s probably possible (as Wolfgang Thaler was fond of saying, "it''s just software, man"), but it would be additional complexity for cases that are pretty borderline. The speculative tracing one is hard to get excited about because I don''t think such enablings would be long-lived; ring-buffer enablings are longer lived, of course, but I don''t think they tend to include USDT probes. (Though I''m happy to tackle this as follow-on work if someone has actually encountered this case.) And as I mentioned above, anonymous tracing will just work. (Though long-lived anonymous enablings of USDT probes must be really unusual!)> dtrace_buffer_t structures should be cache-size aligned since they''re > per-CPU structures, yes? Would that be worth noting explicitly? And > why did you elect to have padding in two places rather than just 7 > 64-bit values at the end?It''s just to reduce false sharing. I debated not padding it out; I think it''s unlikely to be a problem in practice as a given line can only be shared by at most two CPUs, and that''s generally not enough sharing to get deeply pathological -- but the extra memory cost is minimal, and there''s value to maintaining as much CPU orthogonality as possible with respect to probe evaluation. As for padding it out in two places, the first pad is only there for 32-bit whereas the pad that I added is bitness-neutral.> Thanks; again, this is really great.Thanks for taking a look! Now on to my other wads. ;) - Bryan
Adam Leventhal
2011-Jul-13 00:15 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
Hey Bryan> Supporting a ring buffer policy is a tad brutal, because it would > require processing the entire ring to verify that the ECB isn''t in it. > ?There are ways we could optimize that, but they''re ugly; unless > someone is burning with this use case, I''d rather punt on it. > Anonymous tracing will just work; once the enabling is claimed, the > buffer will be switched, and it will become reapable.I actually meant a negative test case.> Yes; you don''t want to keep trying in perpetuity because you may be > blocked by either a ring buffer policy or speculative tracing or > something else that is managing to cache ECB state. ?So after a while, > you want to give up.Would that be worth a one- or two-line comment? Adam -- Adam Leventhal, Delphix http://dtrace.org/blogs/ahl 275 Middlefield Road, Suite 50 Menlo Park, CA 94025 http://www.delphix.com
Bryan Cantrill
2011-Jul-13 00:23 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
On Tue, Jul 12, 2011 at 5:15 PM, Adam Leventhal <ahl at delphix.com> wrote:> Hey Bryan > >> Supporting a ring buffer policy is a tad brutal, because it would >> require processing the entire ring to verify that the ECB isn''t in it. >> ?There are ways we could optimize that, but they''re ugly; unless >> someone is burning with this use case, I''d rather punt on it. >> Anonymous tracing will just work; once the enabling is claimed, the >> buffer will be switched, and it will become reapable. > > I actually meant a negative test case.As I imagine you saw, there''s already one there that uses speculative tracing and verifies that enablings are not reaped: usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh Having a test that uses ring buffering is certainly possible, it''s just a more complicated test to write. (I''m currently cheating a bit by using one enabling to kick off another.) But I''m happy to add it if you''d like to see it...>> Yes; you don''t want to keep trying in perpetuity because you may be >> blocked by either a ring buffer policy or speculative tracing or >> something else that is managing to cache ECB state. ?So after a while, >> you want to give up. > > Would that be worth a one- or two-line comment?There''s already the comment that it''s padded out to avoid false sharing; does it merit more than that? - Bryan
Adam Leventhal
2011-Jul-13 00:25 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
>> I actually meant a negative test case. > > As I imagine you saw, there''s already one there that uses speculative > tracing and verifies that enablings are not reaped: > > ?usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh > > Having a test that uses ring buffering is certainly possible, it''s > just a more complicated test to write. (I''m currently cheating a bit > by using one enabling to kick off another.) ?But I''m happy to add it > if you''d like to see it...Yes. I saw that test case. It was just a suggestion to exercise the other path in your code, but it''s up to you.>> Would that be worth a one- or two-line comment? > > There''s already the comment that it''s padded out to avoid false > sharing; does it merit more than that?I guess that''s sufficient, but I needed to remind myself that these were per-CPU data structures. Your call. Adam -- Adam Leventhal, Delphix http://dtrace.org/blogs/ahl 275 Middlefield Road, Suite 50 Menlo Park, CA 94025 http://www.delphix.com
Bryan Cantrill
2011-Jul-13 05:44 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
On Tue, Jul 12, 2011 at 5:25 PM, Adam Leventhal <ahl at delphix.com> wrote:>>> I actually meant a negative test case. >> >> As I imagine you saw, there''s already one there that uses speculative >> tracing and verifies that enablings are not reaped: >> >> ?usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh >> >> Having a test that uses ring buffering is certainly possible, it''s >> just a more complicated test to write. (I''m currently cheating a bit >> by using one enabling to kick off another.) ?But I''m happy to add it >> if you''d like to see it... > > Yes. I saw that test case. It was just a suggestion to exercise the > other path in your code, but it''s up to you.I added that test and integrated into our repo; patch is attached.>>> Would that be worth a one- or two-line comment? >> >> There''s already the comment that it''s padded out to avoid false >> sharing; does it merit more than that? > > I guess that''s sufficient, but I needed to remind myself that these > were per-CPU data structures. Your call.I''m going to leave that one as it is; in addition to the comment next to the member, the block comment immediately above the structure also spells that out: /* * DTrace Buffers * * Principal buffers, aggregation buffers, and speculative buffers are all * managed with the dtrace_buffer structure. By default, this structure * includes twin data buffers -- dtb_tomax and dtb_xamot -- that serve as the * active and passive buffers, respectively. For speculative buffers, * dtb_xamot will be NULL; for "ring" and "fill" buffers, dtb_xamot will point * to a scratch buffer. For all buffer types, the dtrace_buffer structure is * always allocated on a per-CPU basis; a single dtrace_buffer structure is * never shared among CPUs. (That is, there is never true sharing of the * dtrace_buffer structure; to prevent false sharing of the structure, it must * always be aligned to the coherence granularity -- generally 64 bytes.) * ... Let me know if you have any other comments; otherwise, I''ll assume that you''ve moved on to reviewing unprivileged tick probes. ;) - Bryan -------------- next part -------------- A non-text attachment was scrubbed... Name: dtrace-reap-testring.patch Type: application/octet-stream Size: 2790 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20110712/1cc4491e/attachment.obj>
Adam Leventhal
2011-Jul-13 16:32 UTC
[dtrace-discuss] [illumos-Developer] Reaping enablings on defunct providers
Hey Bryan, Cool; looks good to me. Adam On Tue, Jul 12, 2011 at 10:44 PM, Bryan Cantrill <bryan at joyent.com> wrote:> On Tue, Jul 12, 2011 at 5:25 PM, Adam Leventhal <ahl at delphix.com> wrote: >>>> I actually meant a negative test case. >>> >>> As I imagine you saw, there''s already one there that uses speculative >>> tracing and verifies that enablings are not reaped: >>> >>> ?usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh >>> >>> Having a test that uses ring buffering is certainly possible, it''s >>> just a more complicated test to write. (I''m currently cheating a bit >>> by using one enabling to kick off another.) ?But I''m happy to add it >>> if you''d like to see it... >> >> Yes. I saw that test case. It was just a suggestion to exercise the >> other path in your code, but it''s up to you. > > I added that test and integrated into our repo; patch is attached. > >>>> Would that be worth a one- or two-line comment? >>> >>> There''s already the comment that it''s padded out to avoid false >>> sharing; does it merit more than that? >> >> I guess that''s sufficient, but I needed to remind myself that these >> were per-CPU data structures. Your call. > > I''m going to leave that one as it is; in addition to the comment next > to the member, the block comment immediately above the structure also > spells that out: > > /* > ?* DTrace Buffers > ?* > ?* Principal buffers, aggregation buffers, and speculative buffers are all > ?* managed with the dtrace_buffer structure. ?By default, this structure > ?* includes twin data buffers -- dtb_tomax and dtb_xamot -- that serve as the > ?* active and passive buffers, respectively. ?For speculative buffers, > ?* dtb_xamot will be NULL; for "ring" and "fill" buffers, dtb_xamot will point > ?* to a scratch buffer. ?For all buffer types, the dtrace_buffer structure is > ?* always allocated on a per-CPU basis; a single dtrace_buffer structure is > ?* never shared among CPUs. ?(That is, there is never true sharing of the > ?* dtrace_buffer structure; to prevent false sharing of the structure, it must > ?* always be aligned to the coherence granularity -- generally 64 bytes.) > ?* > ?... > > Let me know if you have any other comments; otherwise, I''ll assume > that you''ve moved on to reviewing unprivileged tick probes. ;) > > ? ? ? ?- Bryan >-- Adam Leventhal, Delphix http://dtrace.org/blogs/ahl 275 Middlefield Road, Suite 50 Menlo Park, CA 94025 http://www.delphix.com