On Tue, May 04, 2010 at 04:47:16PM -0700, vhiz wrote:> Hi All,
>
> I used the following script from the dtrace manual to measure how long a
particular process runs on the cpu. Modified it to look at the exact timestamps:
> [i]
> sched:::on-cpu
> {
> self->ts = timestamp;
> printf("on=%d\n", self->ts)
> }
> sched:::off-cpu
> /self->ts/
> {
>
> printf("off=%d\n", timestamp)
> self->ts = 0;
> }
> [/i]
> The output looks like this:
> [i]on=24591
> off=24603
> on=24609
> on=24670
> off=24676[/i]
>
> This pattern of two ''on'' timestamps between the
''off'' timestamps
> repeats throughout. Should it not just have 1 ''on'', then
1 ''off'', and
> so on. Can someone please explain.
What version of Solaris you are running (uname -srmvpi)?
What does the output look like for the following script:
---- cut here ----
#!/usr/sbin/dtrace -qs
self hrtime_t ts;
inline hrtime_t nanosec = 1000000000;
#pragma D option aggsortkey
BEGIN
{
base = timestamp;
}
sched:::on-cpu
/self->ts != 0/
{
this->time = timestamp - base;
self->dup = 1;
@b = quantize(timestamp - self->ts);
@bc = count();
printf("%6d dup=T+%d.%09d (%a)", curlwpsinfo->pr_onpro,
this->time / nanosec, this->time % nanosec,
caller);
stack(2);
}
sched:::on-cpu
/self->ts == 0/
{
self->ts = timestamp;
this->time = timestamp - base;
printf("%6d on=T+%d.%09d (%a)", curlwpsinfo->pr_onpro,
this->time / nanosec, this->time % nanosec,
caller);
stack(2);
}
sched:::off-cpu
/self->ts != 0/
{
this->time = timestamp - base;
printf("%6d off=T+%d.%09d\n", curlwpsinfo->pr_onpro,
this->time / nanosec, this->time % nanosec);
@a[self->dup] = quantize(timestamp - self->ts);
@ac[self->dup] = count();
self->ts = 0;
self->dup = 0;
}
END
{
printa("on-off, %@d times with dup=%d%@d", @ac, @a);
printa("\ndup''ed %@d times%@d", @bc, @b);
}
---- cut here ----
On a handy x86 system:
% uname -srmvpi
SunOS 5.11 snv_131 i86pc i386 i86pc
% ./foo.d
...
1 on=T+3.010225491 (genunix`cv_wait_sig_swap_core+0x174)
genunix`cv_wait_sig_swap_core+0x174
genunix`cv_wait_sig_swap+0x18
1 dup=T+3.011065071 (0x0)
1 off=T+3.011076028
...
^C
...
on-off, 20004 times with dup=0
value ------------- Distribution ------------- count
1024 | 0
2048 |@ 582
4096 |@@@@@@@@@@ 5112
8192 |@@@@@@@@ 4169
16384 |@@@@@@@@@ 4696
32768 |@@@@@ 2616
65536 |@@@ 1492
131072 |@ 484
262144 | 127
524288 | 12
1048576 | 38
2097152 | 39
4194304 | 74
8388608 |@ 413
16777216 | 139
33554432 | 11
67108864 | 0
on-off, 7 times with dup=1
value ------------- Distribution ------------- count
8192 | 0
16384 |@@@@@@ 1
32768 |@@@@@@@@@@@ 2
65536 |@@@@@@@@@@@ 2
131072 |@@@@@@@@@@@ 2
262144 | 0
dup''ed 7 times
value ------------- Distribution ------------- count
8192 | 0
16384 |@@@@@@@@@@@ 2
32768 |@@@@@@ 1
65536 |@@@@@@@@@@@@@@@@@ 3
131072 |@@@@@@ 1
262144 | 0
So I''m seeing this, but only occasionally, and it seems to be right
before we go *off* CPU. Looks like a bug.
Cheers,
- jonathan