Hello DTracers, The CPU caps project (http://www.opensolaris.org/os/project/rm/rctls/cpu-caps/) introduces kernel "wait queues" where threads may be placed to enforce caps. I would like to make this visible through DTrace and to add two new probes to the sched provider with the following semantics: cpucaps-sleep Probe that fires immediately before the current thread is placed on a wait queue. The lwpsinfo_t of the waiting thread is pointed to by args[0]. The psinfo_t of the process containing the waiting thread is pointed to by args[1]. cpucaps-wakeup Probe that fires immediately after a thread is removed from a wait queue. The lwpsinfo_t of the waiting thread is pointed to by args[0]. The psinfo_t of the process containing the waiting thread is pointed to by args[1]. For example, the following D script shows the number of seconds processes spend on CPU and on wait queues. It an reasonable estimate of which process and to what extent are affected by CPU caps: #!/usr/sbin/dtrace -s #pragma D option quiet sched:::cpucaps-sleep { sleep[args[1]->pr_pid] = timestamp; } sched:::cpucaps-wakeup /sleep[args[1]->pr_pid]/ { this->delta = timestamp - sleep[args[1]->pr_pid]; @sleeps[args[1]->pr_fname] = sum(this->delta); @total[args[1]->pr_fname] = sum(this->delta); } sched:::on-cpu /sleep[curpsinfo->pr_pid]/ { oncpu[curpsinfo->pr_pid] = timestamp; } sched:::off-cpu /oncpu[curpsinfo->pr_pid]/ { this->delta = timestamp - oncpu[curpsinfo->pr_pid]; @cpu[curpsinfo->pr_fname] = sum(this->delta); @total[curpsinfo->pr_fname] = sum(this->delta); } END { /* * Normalize data to print results in seconds */ normalize (@cpu, 1000000000); normalize (@sleeps, 1000000000); normalize (@total, 1000000000); printf ("ON-CPU times:\n"); printa ("%-18s %@u\n", @cpu); printf ("\nWait times:\n"); printa ("%-18s %@u\n", @sleeps); printf ("\nTotal times:\n"); printa ("%-18s %@u\n", @total); } Any comments/suggestions/objections? - Alexander Kolbasov http://blogs.sun.com/akolb
Alexander Kolbasov wrote:>Hello DTracers, > >The CPU caps project (http://www.opensolaris.org/os/project/rm/rctls/cpu-caps/) >introduces kernel "wait queues" where threads may be placed to enforce caps. >I would like to make this visible through DTrace and to add two new probes to >the sched provider with the following semantics: > >cpucaps-sleep Probe that fires immediately before the current thread is > placed on a wait queue. The lwpsinfo_t of the waiting thread is > pointed to by args[0]. The psinfo_t of the process containing > the waiting thread is pointed to by args[1]. > >cpucaps-wakeup Probe that fires immediately after a thread is removed > from a wait queue. The lwpsinfo_t of the waiting thread is pointed to > by args[0]. The psinfo_t of the process containing the waiting > thread is pointed to by args[1]. > >Is it worthwhile adding the queue to the args here, so that it is possible to breakup the time spent sleeping on different queues? With wakeup, you mention that the probe fires after a thread has been removed from a wait queue - does this mean it is not yet on a CPU/run queue? If so, can it go back on a wait queue again and not end up on an active queue? If it did happen, would another sleep probe be generated for this? Darren
Darren,> Alexander Kolbasov wrote: > > >Hello DTracers, > > > >The CPU caps project (http://www.opensolaris.org/os/project/rm/rctls/cpu-caps/) > >introduces kernel "wait queues" where threads may be placed to enforce caps. > >I would like to make this visible through DTrace and to add two new probes to > >the sched provider with the following semantics: > > > >cpucaps-sleep Probe that fires immediately before the current thread is > > placed on a wait queue. The lwpsinfo_t of the waiting thread is > > pointed to by args[0]. The psinfo_t of the process containing > > the waiting thread is pointed to by args[1]. > > > >cpucaps-wakeup Probe that fires immediately after a thread is removed > > from a wait queue. The lwpsinfo_t of the waiting thread is pointed to > > by args[0]. The psinfo_t of the process containing the waiting > > thread is pointed to by args[1]. > > > > > > Is it worthwhile adding the queue to the args here, so that it is > possible to breakup the time spent sleeping on different queues?The queue itself is an internal data structure that should not be visible to the stable probe. While lwpsinfo_t and psinfo_t are well-known stable structures, the wait queue is not. The user can get to the thread''s project or zone - that should be sufficient for most purposes.> With wakeup, you mention that the probe fires after a thread > has been removed from a wait queue - does this mean it is not > yet on a CPU/run queue?Right. It only means that it is eligible for being placed on the run queue. You can use other sched provider probes to figure out what actually happens to it afterwords.> If so, can it go back on a wait queue again and not end up on an active queue?In theory - yes, in practice (given the current implementation) - no.> If it did happen, would another sleep probe be generated for this?Yes. - Alex
Alexander Kolbasov wrote:>Darren, > > >>Alexander Kolbasov wrote: >> >> >>>Hello DTracers, >>> >>>The CPU caps project (http://www.opensolaris.org/os/project/rm/rctls/cpu-caps/) >>>introduces kernel "wait queues" where threads may be placed to enforce caps. >>>I would like to make this visible through DTrace and to add two new probes to >>>the sched provider with the following semantics: >>> >>>cpucaps-sleep Probe that fires immediately before the current thread is >>> placed on a wait queue. The lwpsinfo_t of the waiting thread is >>> pointed to by args[0]. The psinfo_t of the process containing >>> the waiting thread is pointed to by args[1]. >>> >>>cpucaps-wakeup Probe that fires immediately after a thread is removed >>> from a wait queue. The lwpsinfo_t of the waiting thread is pointed to >>> by args[0]. The psinfo_t of the process containing the waiting >>> thread is pointed to by args[1]. >>> >>> >>> >>Is it worthwhile adding the queue to the args here, so that it is >>possible to breakup the time spent sleeping on different queues? >> > >The queue itself is an internal data structure that should not be visible to >the stable probe. While lwpsinfo_t and psinfo_t are well-known stable >structures, the wait queue is not. The user can get to the thread''s project >or zone - that should be sufficient for most purposes. >Hmmm, I was thinking that it might be useful just to have the queue pointer available as an opaque token for use as aggregation key material, not as something to look inside of. Darren
> >Darren, > > > > > >>>cpucaps-sleep Probe that fires immediately before the current thread is > >>> placed on a wait queue. The lwpsinfo_t of the waiting thread is > >>> pointed to by args[0]. The psinfo_t of the process containing > >>> the waiting thread is pointed to by args[1]. > >>> > >>>cpucaps-wakeup Probe that fires immediately after a thread is removed > >>> from a wait queue. The lwpsinfo_t of the waiting thread is pointed to > >>> by args[0]. The psinfo_t of the process containing the waiting > >>> thread is pointed to by args[1]. > >>> > >>> > >>> > >>Is it worthwhile adding the queue to the args here, so that it is > >>possible to breakup the time spent sleeping on different queues? > >> > > > >The queue itself is an internal data structure that should not be visible to > >the stable probe. While lwpsinfo_t and psinfo_t are well-known stable > >structures, the wait queue is not. The user can get to the thread''s project > >or zone - that should be sufficient for most purposes. > > > > Hmmm, I was thinking that it might be useful just to have the > queue pointer available as an opaque token for use as aggregation > key material, not as something to look inside of.Why do you think that aggregating by the opaque queue pointer is more useful than aggregating by zone or project? The queue itself is always associated with a zone or project. - Alex Kolbasov
Alexander Kolbasov wrote:>>>Darren, >>> >>> >>> >>>>>cpucaps-sleep Probe that fires immediately before the current thread is >>>>>placed on a wait queue. The lwpsinfo_t of the waiting thread is >>>>>pointed to by args[0]. The psinfo_t of the process containing >>>>>the waiting thread is pointed to by args[1]. >>>>> >>>>>cpucaps-wakeup Probe that fires immediately after a thread is removed >>>>>from a wait queue. The lwpsinfo_t of the waiting thread is pointed to >>>>>by args[0]. The psinfo_t of the process containing the waiting >>>>>thread is pointed to by args[1]. >>>>> >>>>> >>>>> >>>>> >>>>Is it worthwhile adding the queue to the args here, so that it is >>>>possible to breakup the time spent sleeping on different queues? >>>> >>>> >>>The queue itself is an internal data structure that should not be visible to >>>the stable probe. While lwpsinfo_t and psinfo_t are well-known stable >>>structures, the wait queue is not. The user can get to the thread''s project >>>or zone - that should be sufficient for most purposes. >>> >>> >>Hmmm, I was thinking that it might be useful just to have the >>queue pointer available as an opaque token for use as aggregation >>key material, not as something to look inside of. >> > >Why do you think that aggregating by the opaque queue pointer is more useful >than aggregating by zone or project? The queue itself is always associated >with a zone or project. >Hmmm, it does seem like a relatively meaningless thing to use, put like that... From reading your description, it seemed to me that there was this other piece of important information here (sleep queue itself) that was not represented in the information you want to present via dtrace. I think it would be worthwhile being able to get that piece of information easily via dtrace rather than having to derive it by looking at the project/zone and walking through more data structures. I don''t know how it would or could be useful, but it seems important, even if I have to be using mdb in another window to make use of it. Darren
> Alexander Kolbasov wrote: > > >>>Darren, > >>> > >>> > >>> > >>>>>cpucaps-sleep Probe that fires immediately before the current thread is > >>>>>placed on a wait queue. The lwpsinfo_t of the waiting thread is > >>>>>pointed to by args[0]. The psinfo_t of the process containing > >>>>>the waiting thread is pointed to by args[1]. > >>>>> > >>>>>cpucaps-wakeup Probe that fires immediately after a thread is removed > >>>>>from a wait queue. The lwpsinfo_t of the waiting thread is pointed to > >>>>>by args[0]. The psinfo_t of the process containing the waiting > >>>>>thread is pointed to by args[1]. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>Is it worthwhile adding the queue to the args here, so that it is > >>>>possible to breakup the time spent sleeping on different queues? > >>>> > >>>> > >>>The queue itself is an internal data structure that should not be visible to > >>>the stable probe. While lwpsinfo_t and psinfo_t are well-known stable > >>>structures, the wait queue is not. The user can get to the thread''s project > >>>or zone - that should be sufficient for most purposes. > >>> > >>> > >>Hmmm, I was thinking that it might be useful just to have the > >>queue pointer available as an opaque token for use as aggregation > >>key material, not as something to look inside of. > >> > > > >Why do you think that aggregating by the opaque queue pointer is more useful > >than aggregating by zone or project? The queue itself is always associated > >with a zone or project. > > > > Hmmm, it does seem like a relatively meaningless thing to use, put > like that... > > From reading your description, it seemed to me that there was this > other piece of important information here (sleep queue itself) that > was not represented in the information you want to present via dtrace. > > I think it would be worthwhile being able to get that piece of > information easily via dtrace rather than having to derive it by > looking at the project/zone and walking through more data > structures. I don''t know how it would or could be useful, but > it seems important, even if I have to be using mdb in another > window to make use of it.As a consumer of a _stable_ probe, you do not need to know the internal implementation details and you can write meaningful scripts expressed in higher-level terms. As a developer interested in inner workings, you can easily get to the wait queue by simply doing something like ((kthread_t *)arg0)->t_proj->kpj_cpucap By doing so you explicitly depend on non-stable implementation details. You can do it, but it can''t be the supported stable interface. After all, we do not expose internal details of sleep queues as well - for the same reasons. Without knowing internal implementation details, it is impossible to use such opaque pointer in a meaningful way. - Alex Kolbasov