Hi, I am trying to figure out accurate thread CPU utilization using the sched provider. When I sample for 5 seconds and multiply by the number of CPU threads, the reported seconds exceeds 5s X number of CPU threads. sched:::on-cpu /self->ts == 0/ { self->ts = timestamp/nanosec; } sched:::off-cpu /self->ts != 0/ { @a[pid,tid] = sum((timestamp/nanosec) - self->ts); self->ts = 0; } tick-5s { trunc(@a,10); printa(@a); } ====================Here is the output 3-------------1----------3 1982----------165-------3 6039----------52571---3 12110---------6006-----6 1982----------25785---8 1982----------25930--8 12110---------5996-----9 6039----------52902----12 11359----------3006-----14 0-------------0----------4751 The total of the above exceeds 5X256 (T5440 with 256CPU threads). Any ideas? Thanks -- This message posted from opensolaris.org
One would think that you could leverage the data the system already keeps in the thread''s microstate accounting (it''s kept in the lwp structure attached to the thread structure). Additionally, you can do do the conversion to seconds when you display rather than everytime on/off-cpu. Something such as this might be a starting point (dtrace -C -s ms.d): #include <sys/msacct.h> /* * curthread->t_lwp == NULL -> kernel thread with no user level component */ sched:::on-cpu /curthread->t_lwp != NULL && self->seen == 0/ { this->klwp = curthread->t_lwp; self->utime = this->klwp->lwp_mstate.ms_acct[LMS_USER]; self->stime = this->klwp->lwp_mstate.ms_acct[LMS_SYSTEM]; self->slptime = this->klwp->lwp_mstate.ms_acct[LMS_SLEEP]; self->latency = this->klwp->lwp_mstate.ms_acct[LMS_WAIT_CPU]; self->seen = 1; } sched:::off-cpu /self->seen != 0/ { this->klwp = curthread->t_lwp; @utimes[pid, tid, execname] = max(this->klwp->lwp_mstate.ms_acct[LMS_USER] - self->utime); @stimes[pid, tid, execname] = max(this->klwp->lwp_mstate.ms_acct[LMS_SYSTEM] - self->stime); @slptimes[pid, tid, execname] = max(this->klwp->lwp_mstate.ms_acct[LMS_SLEEP] - self->slptime); @latency[pid, tid, execname] = max(this->klwp->lwp_mstate.ms_acct[LMS_WAIT_CPU] - self->slptime); } profile-20s { exit (0); } END { trunc(@utimes, 20); trunc(@stimes, 20); trunc(@slptimes, 20); trunc(@latency, 20); /* * Turn this off as seeing just zeroes is pretty boring... */ /* normalize(@utimes, 1000000000); normalize(@stimes, 1000000000); normalize(@slptimes, 1000000000); normalize(@latency, 1000000000); */ printf("\nUser times\n"); printa("%5u %5u %16s %@9u\n", @utimes); printf("\nSystem times\n"); printa("%5u %5u %16s %@9u\n", @stimes); printf("\nSleep times\n"); printa("%5u %5u %16s %@9u\n", @slptimes); printf("\nLatency\n"); printa("%5u %5u %16s %@9u\n", @latency); } On 7/15/2010 1:49 PM, tester wrote:> Hi, > > I am trying to figure out accurate thread CPU utilization using the sched provider. When I sample for 5 seconds and multiply by the number of CPU threads, the reported seconds exceeds 5s X number of CPU threads. > > sched:::on-cpu > /self->ts == 0/ > { > self->ts = timestamp/nanosec; > } > > sched:::off-cpu > /self->ts != 0/ > { > > @a[pid,tid] = sum((timestamp/nanosec) - self->ts); > self->ts = 0; > > } > > tick-5s { > > trunc(@a,10); > printa(@a); > > } > > ====================> Here is the output > > 3-------------1----------3 > 1982----------165-------3 > 6039----------52571---3 > 12110---------6006-----6 > 1982----------25785---8 > 1982----------25930--8 > 12110---------5996-----9 > 6039----------52902----12 > 11359----------3006-----14 > 0-------------0----------4751 > > The total of the above exceeds 5X256 (T5440 with 256CPU threads). Any ideas? > > Thanks >-- Oracle <http://www.oracle.com> James Litchfield | Senior Consultant Phone: +1 4082237059 <tel:+1%204082237059> | Mobile: +1 4082180790 <tel:+1%204082180790> Oracle Oracle ACS California Green Oracle <http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100715/5ce61eaf/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 658 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100715/5ce61eaf/attachment.gif> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 356 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20100715/5ce61eaf/attachment-0001.gif>
After the printa I think you need a trunc.> trunc(@a);Shouldn`t your output be pid, tid, time in seconds ? is this output post processed ? what does the original output look like ? rick PS I would think timestamp/nanosec would be subject to rounding errors. On Thu, Jul 15, 2010 at 01:49:05PM -0700, tester wrote:> X-CAA-SPAM: N00000 > X-Authority-Analysis: v=1.1 cv=C5hH5D9sZx7oBLYfm9Qeni98yGeIDIO6Mit34naBfEk> c=1 sm=1 a=Rf9Kass9ZIkA:10 a=ood2b7iyd8MA:10 a=mMDRw7eodGMA:10 > a=kj9zAlcOel0A:10 a=zRG8QCroJ/deLx9kZ5Uhsg==:17 a=ep_KMAzDAAAA:8 > a=Dc4nqrrUMXSxFipiTD0A:9 a=jCEtMmcRhls7fIlIBkU65qnZ4ocA:4 > a=CjuIK1q_8ugA:10 a=RD1bzJOsQUcA:10 a=TC1Ce7_l9HwA:10 > a=t6jm5M9rnSlXE7un:21 a=1GJxXsc5shYW460U:21 > a=zRG8QCroJ/deLx9kZ5Uhsg==:117 > X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on > lv-drc6.opensolaris.org > X-Spam-Level: > X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM, > RDNS_NONE,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.3.1 > X-Original-To: dtrace-discuss at opensolaris.org > Delivered-To: dtrace-discuss at opensolaris.org > Date: Thu, 15 Jul 2010 13:49:05 PDT > From: tester <solaris.identity at gmail.com> > To: dtrace-discuss at opensolaris.org > Subject: [dtrace-discuss] precision thread CPU usage > X-BeenThere: dtrace-discuss at opensolaris.org > X-Mailman-Version: 2.1.13 > Precedence: list > List-Id: DTrace General Discussion <dtrace-discuss.opensolaris.org> > List-Unsubscribe: > <http://mail.opensolaris.org/mailman/options/dtrace-discuss>, > <mailto:dtrace-discuss-request at opensolaris.org?subject=unsubscribe> > List-Archive: <http://mail.opensolaris.org/pipermail/dtrace-discuss> > List-Post: <mailto:dtrace-discuss at opensolaris.org> > List-Help: <mailto:dtrace-discuss-request at opensolaris.org?subject=help> > List-Subscribe: > <http://mail.opensolaris.org/mailman/listinfo/dtrace-discuss>, > <mailto:dtrace-discuss-request at opensolaris.org?subject=subscribe> > Errors-To: dtrace-discuss-bounces at opensolaris.org > > Hi, > > I am trying to figure out accurate thread CPU utilization using the sched provider. When I sample for 5 seconds and multiply by the number of CPU threads, the reported seconds exceeds 5s X number of CPU threads. > > sched:::on-cpu > /self->ts == 0/ > { > self->ts = timestamp/nanosec; > } > > sched:::off-cpu > /self->ts != 0/ > { > > @a[pid,tid] = sum((timestamp/nanosec) - self->ts); > self->ts = 0; > > } > > tick-5s { > > trunc(@a,10); > printa(@a); > > } > > ====================> Here is the output > > 3-------------1----------3 > 1982----------165-------3 > 6039----------52571---3 > 12110---------6006-----6 > 1982----------25785---8 > 1982----------25930--8 > 12110---------5996-----9 > 6039----------52902----12 > 11359----------3006-----14 > 0-------------0----------4751 > > The total of the above exceeds 5X256 (T5440 with 256CPU threads). Any ideas? > > Thanks > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org-- Rickey C. Weisner Software Development and Performance Specialist Principal Software Engineer Systems Quality Office Oracle Corporation cell phone: 615-308-1147 email: rick.weisner at oracle.com
Thank you James for showing a cleaner way to do this. -- This message posted from opensolaris.org
> After the printa I think you need a > trunc. > > trunc(@a); > > Shouldn`t your output be > pid, tid, time in seconds ? > is this output post processed ? > > what does the original output look like ? > rick > > PS > I would think timestamp/nanosec would be subject to > rounding errors. > > > On Thu, Jul 15, 2010 at 01:49:05PM -0700, tester > wrote: > > X-CAA-SPAM: N00000 > > X-Authority-Analysis: v=1.1 > cv=C5hH5D9sZx7oBLYfm9Qeni98yGeIDIO6Mit34naBfEk> > c=1 sm=1 a=Rf9Kass9ZIkA:10 a=ood2b7iyd8MA:10 > a=mMDRw7eodGMA:10 > > a=kj9zAlcOel0A:10 a=zRG8QCroJ/deLx9kZ5Uhsg==:17 > a=ep_KMAzDAAAA:8 > > a=Dc4nqrrUMXSxFipiTD0A:9 > a=jCEtMmcRhls7fIlIBkU65qnZ4ocA:4 > > a=CjuIK1q_8ugA:10 a=RD1bzJOsQUcA:10 > a=TC1Ce7_l9HwA:10 > > a=t6jm5M9rnSlXE7un:21 a=1GJxXsc5shYW460U:21 > > a=zRG8QCroJ/deLx9kZ5Uhsg==:117 > > X-Spam-Checker-Version: SpamAssassin 3.3.1 > (2010-03-16) on > > lv-drc6.opensolaris.org > > X-Spam-Level: > > X-Spam-Status: No, score=-1.0 required=5.0 > tests=AWL,BAYES_00,FREEMAIL_FROM, > > RDNS_NONE,T_TO_NO_BRKTS_FREEMAIL autolearn=no > version=3.3.1 > > X-Original-To: dtrace-discuss at opensolaris.org > > Delivered-To: dtrace-discuss at opensolaris.org > > Date: Thu, 15 Jul 2010 13:49:05 PDT > > From: tester <solaris.identity at gmail.com> > > To: dtrace-discuss at opensolaris.org > > Subject: [dtrace-discuss] precision thread CPU > usage > > X-BeenThere: dtrace-discuss at opensolaris.org > > X-Mailman-Version: 2.1.13 > > Precedence: list > > List-Id: DTrace General Discussion > <dtrace-discuss.opensolaris.org> > > List-Unsubscribe: > > > <http://mail.opensolaris.org/mailman/options/dtrace-d > iscuss>, > > > <mailto:dtrace-discuss-request at opensolaris.org?subjec > t=unsubscribe> > > List-Archive: > <http://mail.opensolaris.org/pipermail/dtrace-discuss> > > List-Post: <mailto:dtrace-discuss at opensolaris.org> > > List-Help: > <mailto:dtrace-discuss-request at opensolaris.org?subject > =help> > > List-Subscribe: > > > <http://mail.opensolaris.org/mailman/listinfo/dtrace- > discuss>, > > > <mailto:dtrace-discuss-request at opensolaris.org?subjec > t=subscribe> > > Errors-To: dtrace-discuss-bounces at opensolaris.org > > > > Hi, > > > > I am trying to figure out accurate thread CPU > utilization using the sched provider. When I sample > for 5 seconds and multiply by the number of CPU > threads, the reported seconds exceeds 5s X number of > CPU threads. > > > > sched:::on-cpu > > /self->ts == 0/ > > { > > self->ts = timestamp/nanosec; > > } > > > > sched:::off-cpu > > /self->ts != 0/ > > { > > > > @a[pid,tid] = sum((timestamp/nanosec) - > self->ts); > > self->ts = 0; > > > > } > > > > tick-5s { > > > > trunc(@a,10); > > printa(@a); > > > > } > > > > ====================> > Here is the output > > > > 3-------------1----------3 > > 1982----------165-------3 > > 6039----------52571---3 > > 12110---------6006-----6 > > 1982----------25785---8 > > 1982----------25930--8 > > 12110---------5996-----9 > > 6039----------52902----12 > > 11359----------3006-----14 > > 0-------------0----------4751 > > > > The total of the above exceeds 5X256 (T5440 with > 256CPU threads). Any ideas? > > > > Thanks > > -- > > This message posted from opensolaris.org > > _______________________________________________ > > dtrace-discuss mailing list > > dtrace-discuss at opensolaris.org > > -- > > Rickey C. Weisner > Software Development and Performance Specialist > Principal Software Engineer > Systems Quality Office > Oracle Corporation > cell phone: 615-308-1147 > email: rick.weisner at oracle.com > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >Rick, That was it, I was missing the last trunc. I feel so dumb. The o/p I showed earlier is pid, tid and seconds. Thanks for pointing the rounding errors also. Thanks -- This message posted from opensolaris.org