Hi all, While experimenting with the sched provider, I found some unexpected behaviors of the enqueue and dequeue probes. I had na?vely assumed that a cpu''s currently running thread would always be the head of that cpu''s runqueue, and that enqueue and dequeue events would sum to zero, but neither seems to be the case. 1. Threads can apparently be moved between runqueues while on-cpu (and without being migrated) 2. Moving a thread from one queue to another (or between front/back of a queue) generates an enqueue event but no dequeue 3. A thread won''t necessarily be rescheduled on the same cpu as the one it was last enqueued on 4. A thread which is both dequeued and descheduled can be rescheduled later with no intervening enqueue events. Does anybody know how to make sense of this behavior? FWIW, I had been hoping to use sched:::*-cpu and sched:::*queue to implement per-process load monitoring at fairly high frequency (1-10msec intervals). /proc/$pid/usage is too expensive because it iterates over all threads in the process at every call while holding locks that serialize other syscalls. Thanks! Ryan The script: sched::resume:on-cpu/pid == $1/ { printf("t@%d running on cpu %d\n", tid, cpu); } sched::resume:off-cpu/pid == $1/ { printf("t@%d descheduled\n", tid); } sched:::enqueue/pid == $1/ { printf("t@%d enqueued on cpu %d\n", args[0]->pr_lwpid, args[2]->cpu_id); } sched:::dequeue/pid == $1/ { printf("t@%d dequeued\n", args[0]->pr_lwpid); } Exerpts from (and commentary about) the output produced when tracing a bash shell that executes ''ls'': 16 16 resume:on-cpu t at 1 running on cpu 16 16 5065 setfrontdq:enqueue t at 1 enqueued on cpu 40 16 5055 resume:off-cpu t at 1 descheduled 16 16 resume:on-cpu t at 1 running on cpu 16 Why did it come back to cpu 16 instead of cpu 40? 16 5065 setfrontdq:enqueue t at 1 enqueued on cpu 40 16 5066 setbackdq:enqueue t at 1 enqueued on cpu 56 16 5066 setbackdq:enqueue t at 1 enqueued on cpu 16 Was it not dequeued from cpus {16, 40, 56} or the dequeue events simply not reported? 16 5068 disp:dequeue t at 1 dequeued 16 5055 resume:off-cpu t at 1 descheduled 16 16 resume:on-cpu t at 1 running on cpu 16 Doesn''t it need to join a runqueue before running? -- This message posted from opensolaris.org
max at bruningsystems.com
2009-Apr-30 10:17 UTC
[dtrace-discuss] Question about sched:::*queue
Hi Ryan, Ryan wrote:> Hi all, > > While experimenting with the sched provider, I found some unexpected behaviors of the enqueue and dequeue probes. I had na?vely assumed that a cpu''s currently running thread would always be the head of that cpu''s runqueue, and that enqueue and dequeue events would sum to zero, but neither seems to be the case. >The current running thread on a cpu is not on any dispq. The runqueue (dispq) is where runnable threads wait. So when threads are enqueued, they are either switching out, or going to sleep. Threads are dequeued when they are switching in. I''ll say more about your dtrace output below...> 1. Threads can apparently be moved between runqueues while on-cpu (and without being migrated) > 2. Moving a thread from one queue to another (or between front/back of a queue) generates an enqueue event but no dequeue > 3. A thread won''t necessarily be rescheduled on the same cpu as the one it was last enqueued on > 4. A thread which is both dequeued and descheduled can be rescheduled later with no intervening enqueue events. > > Does anybody know how to make sense of this behavior? > > FWIW, I had been hoping to use sched:::*-cpu and sched:::*queue to implement per-process load monitoring at fairly high frequency (1-10msec intervals). /proc/$pid/usage is too expensive because it iterates over all threads in the process at every call while holding locks that serialize other syscalls. > > Thanks! > Ryan > > The script: > sched::resume:on-cpu/pid == $1/ { printf("t@%d running on cpu %d\n", tid, cpu); } > sched::resume:off-cpu/pid == $1/ { printf("t@%d descheduled\n", tid); } > sched:::enqueue/pid == $1/ { printf("t@%d enqueued on cpu %d\n", args[0]->pr_lwpid, args[2]->cpu_id); } > sched:::dequeue/pid == $1/ { printf("t@%d dequeued\n", args[0]->pr_lwpid); } > > Exerpts from (and commentary about) the output produced when tracing a bash shell that executes ''ls'': > > 16 16 resume:on-cpu t at 1 running on cpu 16 > 16 5065 setfrontdq:enqueue t at 1 enqueued on cpu 40 > 16 5055 resume:off-cpu t at 1 descheduled > 16 16 resume:on-cpu t at 1 running on cpu 16 > > Why did it come back to cpu 16 instead of cpu 40? >It didn''t. It was running on cpu16. Then it was preempted and place on the runqueue for cpu 40. After it places itself on the runqueue for cpu 40, it switches off (that''s the off-cpu). Then it began running again on cpu 16.> 16 5065 setfrontdq:enqueue t at 1 enqueued on cpu 40 > 16 5066 setbackdq:enqueue t at 1 enqueued on cpu 56 > 16 5066 setbackdq:enqueue t at 1 enqueued on cpu 16 > > Was it not dequeued from cpus {16, 40, 56} or the dequeue events simply not reported? >Your predicate only looks for one thread. The dequeue is being done by the running thread (not t at 1). You might find it more informational to not use a predicate to see better what events are taking place, by whom, and for whom.> 16 5068 disp:dequeue t at 1 dequeued > 16 5055 resume:off-cpu t at 1 descheduled > 16 16 resume:on-cpu t at 1 running on cpu 16 > > Doesn''t it need to join a runqueue before running? >Enqueues can be done either by the current thread enqueueing itself, or if the thread is woken up following a blocking synchronization primitive. Try the script without the predicate, or change the predicate to watch the thread being switched instead of the current executing thread. Also, if the process you are looking at is multithreaded, the predicate can match different threads at different times. max
Max wrote:> The current running thread on a cpu is not on any dispq. The runqueue > (dispq) is where > runnable threads wait. So when threads are enqueued, they are either > switching out, or going to sleep. Threads are dequeued when they are > switching in.OK, that makes sense.> Enqueues can be done either by the current thread enqueueing itself, > or if the thread is woken up following a blocking synchronization > primitive. > Try the script without the predicate, or change the predicate to watch > the thread being switched instead of the current executing thread. > Also, if the process you are looking at is multithreaded, the predicate > can match different threads at different times.Aha! That was it, all right. I updated the predicates for the *queue probes to filter on the pid being acted upon, updated the printf to show the acting process/thread as well, and added timestamps to everything so I could sort the output (forgot about that before...) and now the output is perfect: t at 1 enqueued on cpu 1 by 6231/t at 1 t at 1 dequeued from cpu 1 by 0/t at 0 t at 1 on cpu 1 t at 1 off cpu 1 t at 1 enqueued on cpu 1 by 6231/t at 1 t at 1 dequeued from cpu 1 by 0/t at 0 t at 1 on cpu 1 t at 1 enqueued on cpu 1 by 10152/t at 1 t at 1 dequeued from cpu 1 by 10152/t at 1 t at 1 off cpu 1 t at 1 enqueued on cpu 1 by 0/t at 1 t at 1 dequeued from cpu 1 by 0/t at 0 t at 1 on cpu 1 t at 1 off cpu 1 Thanks, Ryan
max at bruningsystems.com
2009-Apr-30 11:16 UTC
[dtrace-discuss] Question about sched:::*queue
Ryan Johnson wrote:> Max wrote: >> The current running thread on a cpu is not on any dispq. The >> runqueue (dispq) is where >> runnable threads wait. So when threads are enqueued, they are either >> switching out, or going to sleep. Threads are dequeued when they are >> switching in.Oops. Should be "So when threads are enqueued, they are switching out. (Threads that are going to sleep are placed on a sleepq/turnstile).> OK, that makes sense. >> Enqueues can be done either by the current thread enqueueing itself, >> or if the thread is woken up following a blocking synchronization >> primitive. >> Try the script without the predicate, or change the predicate to watch >> the thread being switched instead of the current executing thread. >> Also, if the process you are looking at is multithreaded, the predicate >> can match different threads at different times. > Aha! That was it, all right. I updated the predicates for the *queue > probes to filter on the pid being acted upon, updated the printf to > show the acting process/thread as well, and added timestamps to > everything so I could sort the output (forgot about that before...) > and now the output is perfect: > > t at 1 enqueued on cpu 1 by 6231/t at 1 > t at 1 dequeued from cpu 1 by 0/t at 0 > t at 1 on cpu 1 > t at 1 off cpu 1 > t at 1 enqueued on cpu 1 by 6231/t at 1 > t at 1 dequeued from cpu 1 by 0/t at 0 > t at 1 on cpu 1 > t at 1 enqueued on cpu 1 by 10152/t at 1 > t at 1 dequeued from cpu 1 by 10152/t at 1 > t at 1 off cpu 1 > t at 1 enqueued on cpu 1 by 0/t at 1 > t at 1 dequeued from cpu 1 by 0/t at 0 > t at 1 on cpu 1 > t at 1 off cpu 1 > > Thanks, > Ryan > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >