J.J. Shore
2007-Oct-09 10:15 UTC
[dtrace-discuss] Dtrace toolkit cputimes tool - IDLE times are understated
The tool DTraceToolKit-0.99/Kernal/cputimes under reads the IDLE time when
some of the CPUs are completely idle for long periods.
I have an 8 CPU box which is not doing much. If I run cputimes on it for an
interval of 5 seconds and add up the times then the sum only comes to about 50%
of the available time. I understand that this is because the probes,
sysinfo:unix:idle_enter:idlethread
sched:::on-cpu and
sched:::remain-cpu
will not fire for any given CPU until it has been busy. cputimes relies on one
of these probes firing to start the counting for each CPU. If a CPU is totally
idle it will not be counted by cputimes.
I cannot find any warning about this scenario in the docs.
Does anyone agree with me or have I missed the point?
Outputs:-
Number of CPUs:-
kstat -p cpu_info:::brand
cpu_info:0:cpu_info0:brand UltraSPARC-IV+
cpu_info:1:cpu_info1:brand UltraSPARC-IV+
cpu_info:2:cpu_info2:brand UltraSPARC-IV+
cpu_info:3:cpu_info3:brand UltraSPARC-IV+
cpu_info:16:cpu_info16:brand UltraSPARC-IV+
cpu_info:17:cpu_info17:brand UltraSPARC-IV+
cpu_info:18:cpu_info18:brand UltraSPARC-IV+
cpu_info:19:cpu_info19:brand UltraSPARC-IV+
Output from cputimes
# ./cputimes 5
2007 Oct 9 11:02:48,
THREADS TIME (ns)
KERNEL 753817000
PROCESS 2743407100
IDLE 16330537400
2007 Oct 9 11:02:53,
THREADS TIME (ns)
KERNEL 347982400
PROCESS 6815626900
IDLE 12777698400
2007 Oct 9 11:02:58,
THREADS TIME (ns)
KERNEL 198875000
PROCESS 427450900
IDLE 19304229300
The total for each ouput comes to approximately 50% of 40 000 000 000 which
is the available time for each interval (number of cpus * interval in ns).
Whilst
CPU times was running I ran the following one-liner in another window to see
which CPUs are being used:-
dtrace -n
''sysinfo:unix:idle_enter:idlethread,sched:::on-cpu,sched:::remain-cpu
{@cpus[cpu]=count();} tick-60s{exit(0);}''
dtrace: description
''sysinfo:unix:idle_enter:idlethread,sched:::on-cpu,sched:::remain-cpu
'' matched 4 probes
CPU ID FUNCTION:NAME
18 81237 :tick-60s
16 16187
18 52565
17 53135
19 56125
--
This message posted from opensolaris.org
Brendan Gregg - Sun Microsystems
2007-Oct-09 17:43 UTC
[dtrace-discuss] Dtrace toolkit cputimes tool - IDLE times are understated
G''Day, On Tue, Oct 09, 2007 at 03:15:40AM -0700, J.J. Shore wrote:> The tool DTraceToolKit-0.99/Kernal/cputimes under reads the IDLE time when > some of the CPUs are completely idle for long periods. > > I have an 8 CPU box which is not doing much. If I run cputimes on it for an > interval of 5 seconds and add up the times then the sum only comes to about 50% > of the available time. I understand that this is because the probes, > sysinfo:unix:idle_enter:idlethread > sched:::on-cpu and > sched:::remain-cpu > will not fire for any given CPU until it has been busy. cputimes relies on one > of these probes firing to start the counting for each CPU. If a CPU is totally > idle it will not be counted by cputimes.Yes (although I''ve never seen this on my development environment as I have either single or two CPU servers only). I can fix this, but it would require either an fbt probe (making the script unstable) or destructive tomfoolery. I''ll at least note this in the docs, thanks. It might help if cputimes printed the number of seen CPUs during each interval, eg, # ./cputimes 1 3 2005 Apr 27 23:37:58, traced CPUs: 7/8 THREADS TIME (ns) KERNEL 10795499 PROCESS 20941091 IDLE 970707443 Brendan -- Brendan [CA, USA]
J.J. Shore
2007-Oct-10 10:52 UTC
[dtrace-discuss] Dtrace toolkit cputimes tool - IDLE times are understated
Brendan,
Thanks for looking at this problem. I think that your proposals are good ones.
If it does not add to much overhead, then I think that it would also be good
to report the ''total unaccounted time''.
The way I was thinking of finding this is as follows:-
Pass total number of CPUs to the dtrace part as a parameter from the
shell land part (should be easy enough to get this in a shell without much
cost). And then add the following bits to the dtrace part...
inline int OPT_numberofcpus = ''$opt_numberofcpus''
BEGIN
{
.... as it is plus ....
uatimestamp = timestamp; /* start of unaccounted time */
uatime = 0;
}
Add a new probe ...
on-cpu
remain-cpu
/seen[cpu]==0/
{
seen_cpus++;
uatime += timestamp - uatimestamp;
seen[cpu]=1;
}
Modify existing probe...
profile:::tick-1sec
/secs==0/
{
... as it is plus ....
uatime += (timestamp - uatimestamp) * ($OPT_numberofcpus - seen_cpus);
uatimestamp = timestamp;
print uatime
uatime = 0;
}
--
This message posted from opensolaris.org