Peter Harvey
2007-Jun-26 16:01 UTC
[dtrace-discuss] Mixed up output when using speculation() and the pid provider with MT
Can the pid provider mix up speculation buffers with MT processes? I''m using speculation() and the pid provider to trace nscd(1M) calling the NIS+ cache manager (nis_cachemgr, aka NCM). The script is attached. I speculate on calls to the NIS+ API (to get the lookup string) and then commit if the NCM is called. The problem I see is mixed up output. The tracing of one NCM call is sometimes interspersed with another or apparently out of order, eg: ==Looking up: 129.156.44.4 Calling nis_list(): [addr=129.156.44.4]ipnodes.org_dir 0.001 NCM: org_dir.sun.com. ret=20 (No such name) 0.005 NCM: ipnodes.org_dir.sun.com. ret=20 (No such name) Total time for 127.0.0.1: 0.006 secs Looking up: 129.156.173.199 Calling nis_list(): [addr=129.156.173.199]ipnodes.org_dir 0.001 NCM: org_dir.sun.com. ret=20 (No such name) 0.003 NCM: ipnodes.org_dir.sun.com. ret=20 (No such name) Total time for 129.156.173.199: 0.005 secs 0.003 NCM: ipnodes.org_dir.sun.com. ret=20 (No such name) Total time for 129.156.44.4: 0.005 secs Looking up: ::1 Calling nis_list(): [addr=::1]ipnodes.org_dir 0.002 NCM: org_dir.sun.com. ret=20 (No such name) 0.003 NCM: ipnodes.org_dir.sun.com. ret=20 (No such name) Total time for ::1: 0.005 secs Looking up: 127.0.0.1 Calling nis_list(): [addr=127.0.0.1]ipnodes.org_dir 0.002 NCM: org_dir.sun.com. ret=20 (No such name) == I''m using the usual self->spec to make sure my speculation buffers are unique for each thread. It''s almost as if a single kernel thread is handling different userspace threads which doesn''t make sense to me. I''ve tried tracing the tid, curthread and value of self->spec with my printf() but it hasn''t proven illuminating. The DTrace script has helped me understand a problem we were having with NIS+ but I''m left wondering if my script is broken or there''s some issue with speculation buffers. -- Peter -- This message posted from opensolaris.org
Peter Harvey
2007-Jun-26 16:03 UTC
[dtrace-discuss] Re: Mixed up output when using speculation() and the pid provider with MT
My attachment appears to have disappeared, trying again ... nis_list_cache_calls+.d has [i]tid[/i], [i]curthread[/i] and [i]self->spec[/i] added to the tracing. -- Peter -- This message posted from opensolaris.org -------------- next part -------------- A non-text attachment was scrubbed... Name: nis_list_cache_calls.d Type: application/octet-stream Size: 1666 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070626/02aded06/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: nis_list_cache_calls+.d Type: application/octet-stream Size: 1845 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20070626/02aded06/attachment-0001.obj>
Peter Harvey
2007-Jun-27 08:27 UTC
[dtrace-discuss] Mixed up output when using speculation() and the pid provider with MT
And the answer is ... provided by Jonathan Adams - thanks!> This can happen if the program switches CPUs for whatever reason. The > way speculations work is: > > 1. your script calls speculate(), which allocates a set of per-CPU buffers > 2. data put into the speculation goes into the buffer for the current CPU > 3. commit() causes the data from the speculation for the current CPU > to be copied into the current CPU''s main buffer, and an asynchronous > periodic task cross-calls the other CPUs, which copy their parts of > the speculation into their CPU''s main buffer > 4. The dtrace client pulls down the data, and prints it out in CPU order. > > If you need time-ordered data, you will have to trace timestamp and > sort dtrace(1M)''s output.If I add timestamps and cpu to my output then sort it I get output like this: [some edits for 72 column] 2730873334891200,0x203: Looking up: ad.doubleclick.net 2730873334926160,0x203: Call nis_list(): [name=ad]ipnodes.org_dir.doubleclick.net 2730873336875600,0x200: 0.000 NCM: org_dir.doubleclick.net.eu.cte. ... 2730873338328480,0x3: 0.002 NCM: ipnodes.org_dir.doubleclick.net ... 2730873339544720,0x3: 0.003 NCM: org_dir.doubleclick.net.cte.sun ... 2730873341762480,0: 0.004 NCM: ipnodes.org_dir.doubleclick.net ... 2730873343252080,0: 0.006 NCM: org_dir.doubleclick.net.sun.com ... 2730873344510080,0: 0.008 NCM: ipnodes.org_dir.doubleclick.net ... 2730873344540000,0: Total time for ad.doubleclick.net: 0.009 secs Notice how the thread hops across CPUs. My assumption was that there was a single speculation buffer and a single thread would dump recording output into it sequentially. Out-of-time-order DTrace output is common and sorting with timestamps an easy solution but I''d assumed that was only true when tracing multiple threads. The manual mentions per-CPU speculation buffers here: A committing speculative buffer will not be made available to subsequent speculation() calls until each per-CPU speculative buffer has been completely copied into its corresponding per-CPU principal buffer. -- Peter