Hi! Here''s the problem I''m facing: I need to sample a particular attribute of all the LWPs in a given process. Now, sampling itself is no problem -- I can use profile/tick provider and all is good. The problem is: I can''t seem to think of a DTrace-friendly way to iterate over all the LWPs and report the value of the attribute I''m interested in. Any suggestions? Thanks, Roman.
On Mon, Mar 24, 2008 at 10:35:43AM -0700, Roman Shaposhnik wrote:> Any suggestions?Either don''t use DTrace if you need to iterate, or start the target via DTrace so you can keep track of all the LWPs in your D script.
On Mon, 2008-03-24 at 12:37 -0500, Nicolas Williams wrote:> On Mon, Mar 24, 2008 at 10:35:43AM -0700, Roman Shaposhnik wrote: > > Any suggestions? > > Either don''t use DTrace if you need to iterate, or start the target via > DTrace so you can keep track of all the LWPs in your D script.And how exactly starting the target via DTrace is going to make it easier? As far as I can tell I have two problems to deal with: 1. reporting N values every X milliseconds. 2. getting N values #1 can be sort of solved by using aggregations (although what I would really like to have is a %A specifier in printf for formating and outputting associative arrays). #2 is way trickier and I really don''t know how to solve it short of digging all the functions inside the kernel that actually modify the attributes I''m looking for and using fbt::funcs:entry to keep my array/aggregation in sync with what''s going on inside the kernel structures. If you happen to think that starting the target via DTrace can help address #2 -- please tell me more. Thanks, Roman.
What I meant by the "don''t use DTrace if you need to iterate" option is this: try using MDB. Given what you say you might find it a lot easier to sample the items you need via MDB.
Hi Roman,> Here''s the problem I''m facing: I need to sample a particular > attribute of all the LWPs in a given process. Now, sampling > itself is no problem -- I can use profile/tick provider and > all is good. The problem is: I can''t seem to think of a > DTrace-friendly way to iterate over all the LWPs and report > the value of the attribute I''m interested in. > > Any suggestions?To be honest, I''m not completely sure what you''re after but I''ll throw this in there anyway. If you just want to iterate over all the lwp''s in a given process you can use a tick probe to do this (you seem to already know this though...). An example for looking at ''nscd'' which is based upon Adam''s example he posted on his blog the other day: #!/usr/sbin/dtrace -s #pragma D option quiet BEGIN { self->pidp = `pidhash[$1 & (`pid_hashsz - 1)]; pidp = self->pidp; printf("pid = %d\n", self->pidp->pid_id); } BEGIN /self->pidp->pid_id == $1/ { this->slot = (*(uint32_t *)self->pidp) >> 8; procp = `procdir[this->slot].pe_proc; procname = stringof(procp->p_user.u_comm); t = procp->p_tlist; } tick-50ms /pidp && t != NULL/ { printf("%s lwps %d/thr (%d): %d syscalls\n", procname, procp->p_lwpcnt, t->t_tid, t->t_lwp->lwp_ru.sysc); t = t->t_forw; } This produces: # ./lwp.d 100179 pid = 100179 nscd lwps 33/thr (1): 57 syscalls nscd lwps 33/thr (2): 2050124 syscalls nscd lwps 33/thr (3): 5 syscalls nscd lwps 33/thr (4): 25402 syscalls nscd lwps 33/thr (5): 18 syscalls nscd lwps 33/thr (6): 3306 syscalls nscd lwps 33/thr (7): 71123 syscalls nscd lwps 33/thr (8): 2798 syscalls nscd lwps 33/thr (9): 2908 syscalls nscd lwps 33/thr (10): 470 syscalls nscd lwps 33/thr (11): 11695 syscalls nscd lwps 33/thr (12): 470 syscalls nscd lwps 33/thr (13): 2679 syscalls nscd lwps 33/thr (14): 27171 syscalls nscd lwps 33/thr (15): 83908 syscalls nscd lwps 33/thr (16): 4113 syscalls nscd lwps 33/thr (17): 470 syscalls nscd lwps 33/thr (18): 8048 syscalls nscd lwps 33/thr (19): 470 syscalls nscd lwps 33/thr (20): 378 syscalls nscd lwps 33/thr (21): 238 syscalls nscd lwps 33/thr (22): 414 syscalls nscd lwps 33/thr (23): 240 syscalls nscd lwps 33/thr (24): 28 syscalls nscd lwps 33/thr (25): 8747 syscalls nscd lwps 33/thr (26): 589 syscalls nscd lwps 33/thr (27): 246 syscalls nscd lwps 33/thr (28): 11681 syscalls nscd lwps 33/thr (29): 470 syscalls nscd lwps 33/thr (30): 69942 syscalls nscd lwps 33/thr (31): 5171 syscalls nscd lwps 33/thr (32): 470 syscalls nscd lwps 33/thr (499): 1680 syscalls There may well be bugs in this or much better ways of doing this though but it appears to work. Jon.
On Tue, 2008-03-25 at 14:03 +0000, Jon Haslam wrote:> Hi Roman, > > > Here''s the problem I''m facing: I need to sample a particular > > attribute of all the LWPs in a given process. Now, sampling > > itself is no problem -- I can use profile/tick provider and > > all is good. The problem is: I can''t seem to think of a > > DTrace-friendly way to iterate over all the LWPs and report > > the value of the attribute I''m interested in. > > > > Any suggestions? > > To be honest, I''m not completely sure what you''re afterWell, I guess that makes you a mind reader, ''cause the rest of your reply seems to be exactly what I was looking for. ;-) Now, as far as implementation goes, I still have a few questions:> self->pidp = `pidhash[$1 & (`pid_hashsz - 1)];what is backtick doing here in front of pidhash? Is it a way of accessing an arbitrary kernel variable?> BEGIN > /self->pidp->pid_id == $1/Wow! Could you, please, elaborate on why exactly the predicate is needed here? My reading of the first BEGIN statement seems to suggest that the following will always be true: self->pidp->pid_id == $1 Or is it just a safety measure that prevents us from getting garbage from pidhash?> { > this->slot = (*(uint32_t *)self->pidp) >> 8; > procp = `procdir[this->slot].pe_proc; > procname = stringof(procp->p_user.u_comm); > t = procp->p_tlist;Wow! That''s some powerful kernel magic, if you ask me. ;-)> tick-50ms > /pidp && t != NULL/ > { > printf("%s lwps %d/thr (%d): %d syscalls\n", procname, > procp->p_lwpcnt, > t->t_tid, t->t_lwp->lwp_ru.sysc); > t = t->t_forw;Now, here comes the crucial question: AFAIK, p_tlist points to a circular list of kernel threads. We are traversing this list using t = t->t_forw. Now, what happens if ''t'' points to a member of the list that used to be valid but has been deallocated in between the two ticks of tick-50ms? Thanks, Roman.
>> self->pidp = `pidhash[$1 & (`pid_hashsz - 1)]; > > what is backtick doing here in front of pidhash? Is it a way > of accessing an arbitrary kernel variable?Yes. The backquote is a scoping operator for kernel variables. See the External Variables section in the docs: http://wikis.sun.com/display/DTrace/Variables#Variables-ExternalVariables>> BEGIN >> /self->pidp->pid_id == $1/ > > Wow! Could you, please, elaborate on why exactly > the predicate is needed here? My reading of the first > BEGIN statement seems to suggest that the following > will always be true: > self->pidp->pid_id == $1 > > Or is it just a safety measure that prevents us from > getting garbage from pidhash?Yes, it''s just there to ensure that we have extracted the correct struct pid from the pidhash.>> this->slot = (*(uint32_t *)self->pidp) >> 8; >> procp = `procdir[this->slot].pe_proc; >> procname = stringof(procp->p_user.u_comm); >> t = procp->p_tlist; > > Wow! That''s some powerful kernel magic, if you ask me. ;-)I stole this from Adam''s last blog entry so he''s the sorcerer. Check it out if you haven''t seen it as it has a brief explanation of what he''s doing there.>> tick-50ms >> /pidp && t != NULL/ >> { >> printf("%s lwps %d/thr (%d): %d syscalls\n", procname, >> procp->p_lwpcnt, >> t->t_tid, t->t_lwp->lwp_ru.sysc); >> t = t->t_forw; > > Now, here comes the crucial question: AFAIK, p_tlist points to > a circular list of kernel threads. We are traversing this list > using t = t->t_forw. Now, what happens if ''t'' points to > a member of the list that used to be valid but has been > deallocated in between the two ticks of tick-50ms?Using time based probes to iterate over data structures is problematic as the data structures may well change beneath you. I offered this up as an example of how to iterate over data structures using a tick probe as it gets referenced quite a bit but there aren''t that many examples around of how to do it. The important point about this technique is for the user to be aware of its limitations and how the data they are observing is modified. Jon.
On Wed, 2008-03-26 at 14:13 +0000, Jon Haslam wrote:> >> tick-50ms > >> /pidp && t != NULL/ > >> { > >> printf("%s lwps %d/thr (%d): %d syscalls\n", procname, > >> procp->p_lwpcnt, > >> t->t_tid, t->t_lwp->lwp_ru.sysc); > >> t = t->t_forw; > > > > Now, here comes the crucial question: AFAIK, p_tlist points to > > a circular list of kernel threads. We are traversing this list > > using t = t->t_forw. Now, what happens if ''t'' points to > > a member of the list that used to be valid but has been > > deallocated in between the two ticks of tick-50ms? > > Using time based probes to iterate over data structures is > problematic as the data structures may well change beneath you. > I offered this up as an example of how to iterate over data structures > using a tick probe as it gets referenced quite a bit but there aren''t that > many examples around of how to do it. The important point about > this technique is for the user to be aware of its limitations and > how the data they are observing is modified.Thanks for confirming my hunch. I guess at this point the only question I have left is: what is really going to happen the next time I do t->t_forw? A SEGV? Or I''ll be just off to chasing these pointers forever? See, I really don''t know kernel well enough to make even an educated guess here. Any help would be appreciated. Thanks, Roman.
>> Using time based probes to iterate over data structures is >> problematic as the data structures may well change beneath you. >> I offered this up as an example of how to iterate over data structures >> using a tick probe as it gets referenced quite a bit but there aren''t that >> many examples around of how to do it. The important point about >> this technique is for the user to be aware of its limitations and >> how the data they are observing is modified. > > Thanks for confirming my hunch. I guess at this point the only question > I have left is: what is really going to happen the next time > I do t->t_forw? A SEGV? Or I''ll be just off to chasing these pointers > forever? See, I really don''t know kernel well enough to make > even an educated guess here. Any help would be appreciated.If the data that you''re using becomes invalid the logic of your script may be affected or you may see runtime errors reported by dtrace(1M) when you try and dereference invalid pointers (for example). However, you shouldn''t see any failures (such as a SEGV or panic) as safety is baked into the design of DTrace - you can dereference all the bad pointers you want from within your D script and all you see is a ton of error messages reported back to you. Jon.