Sean Liu
2008-Feb-13 19:58 UTC
[dtrace-discuss] How to measure the CPU time of a process and its children? And non-lock ps?
Hi there, On one of my clusters I have following issues. This particular cluster has two V240s with 2 CPUs each. 1. The CPU usage is high, often 100%, with both high sy and us time ( about 1:1 ) 2. mutex is high, which goes as high as 6000+ / CPU (!) 3. cross call is high, as high as 7000/CPU The cluster is running following stuff: 1. Veritas Cluster Server 5.0MP1, with 20 something service groups 2. MeasureWare from HP OVO Before you say "upgrade the hardware!", there are a few problems 1. The mutex is so high, that adding more CPUs would just worsen the problem. 2. xcall is high, additional CPUs will have the burden too. So here are the things I did: 1. Shutdown VCS, with everything else running. user space cpu usage dropped about 20-35% kernel space cpu usage dropped by about 30% mutex dropped from thousands to tens with brief spikes of hundreds 2. Shutdown measure ware CPU usage became 92-99% idle! mutex dropped further, but it''s of no significance now. 3. With measure ware dowe, started up VCS Now mutex does not go high a lot cpu util in sy went high a little but CPU usage in user space went high quite a lot, idle time goes to 0% often So in short, measure ware and VCS is killing the box. lockstat shows proc filesystem locking ( mutex spin! ) is high: root at cbbceusmq2d:/tmp#more lockstat.out Adaptive mutex spin: 68526 events in 5.048 seconds (13575 events/sec) Count indv cuml rcnt spin Lock Caller ------------------------------------------------------------------------------- 67056 98% 98% 0.00 3 pidlock pr_readdir_procdir+0x74 272 0% 98% 0.00 3 pidlock pr_lookup_procdir+0xa0 216 0% 99% 0.00 4 0x6000abd0200 cv_wait_sig+0x13c Which is not too much of a surprise for VCS - lots of monitoring are simply running versions of "ps" to check if a process is up and running I can simply go to veritas and say their product is a piece of sh*t, but I wonder: 1. Is Sun Cluster any better on this front? 2. For most of these checking/monitoring, is it possible to provide some kind of non-locking proc operations so this specific type of applications can use? -- This message posted from opensolaris.org
Sean Liu
2008-Feb-13 20:01 UTC
[dtrace-discuss] How to measure the CPU time of a process and its children? And non-lock ps?
I also meant to ask how to measure cpu time of a process and it''s children? Let''s say we have an agent, it starts up monitoring periodically - how do we measure how much CPU time this agent and it''s children ( and grand children ) used? -- This message posted from opensolaris.org
Brendan Gregg - Sun Microsystems
2008-Feb-14 00:01 UTC
[dtrace-discuss] How to measure the CPU time of a process and its children? And non-lock ps?
G''Day Sean, On Wed, Feb 13, 2008 at 12:01:08PM -0800, Sean Liu wrote:> I also meant to ask how to measure cpu time of a process and it''s children? > Let''s say we have an agent, it starts up monitoring periodically - how do we measure how much CPU time this agent and it''s children ( and grand children ) used?DTrace can do this using the sched provider and vtimestamp, or in rough terms using profile and progenyof(), eg: # dtrace -n ''profile-1001 /pid == $target || progenyof($target)/ { @["on-CPU, 1001 Hertz count:"] = count(); }'' -p 929727 dtrace: description ''profile-1001 '' matched 1 probe ^C on-CPU, 1001 Hertz count: 638 So PID 929727 was on-CPU for 638 samples at 1001 Hertz. However process CPU time is already tracked by microstate accounting and provided via procfs, and can be shown using /usr/bin/ptime. This does include children time (the man page needs updating). There are other ways to get to this procfs data... Brendan -- Brendan [CA, USA]
Sean Liu
2008-Feb-14 17:13 UTC
[dtrace-discuss] How to measure the CPU time of a process and its children? And non-lock ps?
Hi Brendan, Thanks for your reply and I''ll look into it. Both ways look quite promising. Sean -- This message posted from opensolaris.org
Sean Liu
2008-Feb-14 20:41 UTC
[dtrace-discuss] How to measure the CPU time of a process and its children? And non-lock ps?
It turned out ps wasn''t really what was using most CPU resource. The MQ agent ( and its probes ) is. I put in a small script to output continuously every second: #!/usr/sbin/dtrace -qs profile-1001 /pid == $target || progenyof($target)/ { @a["on-CPU, 1001 Hertz count:"] = count(); } tick-1 { printa(@a); trunc(@a) } This is from Application Agent: on-CPU, 1001 Hertz count: 2 on-CPU, 1001 Hertz count: 2 on-CPU, 1001 Hertz count: 2 on-CPU, 1001 Hertz count: 510 on-CPU, 1001 Hertz count: 410 on-CPU, 1001 Hertz count: 677 on-CPU, 1001 Hertz count: 796 on-CPU, 1001 Hertz count: 750 on-CPU, 1001 Hertz count: 828 on-CPU, 1001 Hertz count: 518 on-CPU, 1001 Hertz count: 751 on-CPU, 1001 Hertz count: 1086 on-CPU, 1001 Hertz count: 604 on-CPU, 1001 Hertz count: 992 on-CPU, 1001 Hertz count: 568 on-CPU, 1001 Hertz count: 736 ... And MQ agent: ... on-CPU, 1001 Hertz count: 1696 on-CPU, 1001 Hertz count: 1743 on-CPU, 1001 Hertz count: 1784 on-CPU, 1001 Hertz count: 1679 on-CPU, 1001 Hertz count: 1691 on-CPU, 1001 Hertz count: 1547 ... Since profile probes fire on both CPUs, a 1001 Hz probe should fire 2002 times a second correct? So 1743 fires is almost like 85% of CPU used. Thanks Brendan for the idea. Sean -- This message posted from opensolaris.org
Brendan Gregg - Sun Microsystems
2008-Feb-14 21:05 UTC
[dtrace-discuss] How to measure the CPU time of a process and its children? And non-lock ps?
G''Day Sean, On Thu, Feb 14, 2008 at 12:41:35PM -0800, Sean Liu wrote:> It turned out ps wasn''t really what was using most CPU resource. The MQ agent ( and its probes ) is. > I put in a small script to output continuously every second: > #!/usr/sbin/dtrace -qs > > profile-1001 > /pid == $target || progenyof($target)/ > { > @a["on-CPU, 1001 Hertz count:"] = count(); > } > > tick-1 > { > printa(@a); > trunc(@a) > }Yes, this makes sense.> This is from Application Agent: > on-CPU, 1001 Hertz count: 2 > on-CPU, 1001 Hertz count: 2 > on-CPU, 1001 Hertz count: 2 > on-CPU, 1001 Hertz count: 510 > on-CPU, 1001 Hertz count: 410 > on-CPU, 1001 Hertz count: 677 > on-CPU, 1001 Hertz count: 796 > on-CPU, 1001 Hertz count: 750 > on-CPU, 1001 Hertz count: 828 > on-CPU, 1001 Hertz count: 518 > on-CPU, 1001 Hertz count: 751 > on-CPU, 1001 Hertz count: 1086 > on-CPU, 1001 Hertz count: 604 > on-CPU, 1001 Hertz count: 992 > on-CPU, 1001 Hertz count: 568 > on-CPU, 1001 Hertz count: 736 > ... > > And MQ agent: > ... > on-CPU, 1001 Hertz count: 1696 > on-CPU, 1001 Hertz count: 1743 > on-CPU, 1001 Hertz count: 1784 > on-CPU, 1001 Hertz count: 1679 > on-CPU, 1001 Hertz count: 1691 > on-CPU, 1001 Hertz count: 1547 > ... > > Since profile probes fire on both CPUs, a 1001 Hz probe should fire 2002 times a second correct?Yes, it will fire 2002 times a second. There may be a tiny variance (such as 0.1%) in the output (due to how aggregates are printed from the user-land dtrace consumer, and how it is CPU scheduled) - but the numbers will sum correctly.> So 1743 fires is almost like 85% of CPU used.Yes, it strongly suggests that. We don''t know that for certain since this is a simple (yet useful) script which samples rather than traces. For what you are doing (identifying largest CPU consumer) sampling is probably adequate. You can get it to print out percentages, if that helps readability. Something like: profile-1001 /pid == $target || progenyof($target)/ { @a["on-CPU, percentage:"] = count(); } tick-1 { normalize(@a, (1001 * `ncpus_online) / 100); printa(@a); trunc(@a) } The normalize() won''t be spot on, since I believe the float will be converted to an integer before it is used. So there will be some small rounding error, in addition to that of it being sample based. But again, if the point is to identify large CPU consumers - it should work fine. Those percentages are for all CPUs. Drop the "* `ncpus_online" for per-CPU percentages.> Thanks Brendan for the idea.no worries, Brendan -- Brendan [CA, USA]