thr3ads.net - dtrace discuss - [dtrace-discuss] Measuring cpu migrations [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Allan

2009-Nov-18 17:08 UTC

[dtrace-discuss] Measuring cpu migrations

Hi,

I am looking to measure how long a thread takes to migrate between
cpu''s and how often , what I have is below which is checking just one
process is this the correct track to be on here?

My aim is to look at a process and look at upping the reboose_interval on some
of our servers.


#!/usr/sbin/dtrace -s
sched:::off-cpu
{
    self->cpu = cpu;
    self->timestamp=timestamp;
}

sched:::on-cpu /self->cpu != cpu && execname=="mstragent"/
{
    printf("%s migrated from cpu %d to cpu %d nsec:%d
\n",execname,self->cpu,cpu,timestamp-self->timestamp);
    printf("%s migrated from cpu %d to cpu %d ms:%d
\n",execname,self->cpu,cpu,(timestamp-self->timestamp)/1000000);
    self->cpu = 0;
    self->timestamp = 0;
}

Sample output is as follows

dtrace: script ''./migr.d'' matched 3 probes
CPU     ID                    FUNCTION:NAME
  1    782                    resume:on-cpu mstragent migrated from cpu 0 to cpu
1 nsec:1123786101964690
mstragent migrated from cpu 0 to cpu 1 ms:1123786101

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:38049334
mstragent migrated from cpu 1 to cpu 0 ms:38

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:39942749
mstragent migrated from cpu 1 to cpu 0 ms:39

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:40069916
mstragent migrated from cpu 1 to cpu 0 ms:40

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:39936750
mstragent migrated from cpu 1 to cpu 0 ms:39

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:39991167
mstragent migrated from cpu 1 to cpu 0 ms:39

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:40012583
mstragent migrated from cpu 1 to cpu 0 ms:40

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:39956000
mstragent migrated from cpu 1 to cpu 0 ms:39

  0    782                    resume:on-cpu mstragent migrated from cpu 1 to cpu
0 nsec:39970999
mstragent migrated from cpu 1 to cpu 0 ms:39

Thanks
Al
-- 
This message posted from opensolaris.org

Rafael Vanoni

2009-Nov-19 03:43 UTC

head link

[dtrace-discuss] Measuring cpu migrations

Allan wrote:> Hi,
> 
> I am looking to measure how long a thread takes to migrate between
cpu''s and how often , what I have is below which is checking just one
process is this the correct track to be on here?
> 
> My aim is to look at a process and look at upping the reboose_interval on
some of our servers.
> 
> 
> #!/usr/sbin/dtrace -s
> sched:::off-cpu
> {
>     self->cpu = cpu;
>     self->timestamp=timestamp;
> }
> 
> sched:::on-cpu /self->cpu != cpu &&
execname=="mstragent"/
> {
>     printf("%s migrated from cpu %d to cpu %d nsec:%d
\n",execname,self->cpu,cpu,timestamp-self->timestamp);
>     printf("%s migrated from cpu %d to cpu %d ms:%d
\n",execname,self->cpu,cpu,(timestamp-self->timestamp)/1000000);
>     self->cpu = 0;
>     self->timestamp = 0;
> }
> [snip] 
Hi Alan

Your script is not accounting for CPU 0, so you need to set a thread 
local flag variable on the off-cpu probe, and predicate the on-cpu on that.

If you''re looking at reducing migrations, the nosteal_nsec variable 
determines how long a thread should remain in the runq before the 
scheduler allows it to be stolen by another CPU. Increasing it might 
help you lower migrations.

Rafael

max at bruningsystems.com

2009-Nov-19 06:31 UTC

head link

[dtrace-discuss] Measuring cpu migrations

Hi Rafael,

Rafael Vanoni wrote:> Allan wrote:
>> Hi,
>>
>> I am looking to measure how long a thread takes to migrate between 
>> cpu''s and how often , what I have is below which is checking
just one
>> process is this the correct track to be on here? 
>> My aim is to look at a process and look at upping the 
>> reboose_interval on some of our servers.
>>
>>
>> #!/usr/sbin/dtrace -s
>> sched:::off-cpu
>> {
>>     self->cpu = cpu;
>>     self->timestamp=timestamp;
>> }
>>
>> sched:::on-cpu /self->cpu != cpu &&
execname=="mstragent"/
>> {
>>     printf("%s migrated from cpu %d to cpu %d nsec:%d 
>> \n",execname,self->cpu,cpu,timestamp-self->timestamp);
>>     printf("%s migrated from cpu %d to cpu %d ms:%d 
>>
\n",execname,self->cpu,cpu,(timestamp-self->timestamp)/1000000);
>>     self->cpu = 0;
>>     self->timestamp = 0;
>> }
>> [snip] 
>
> Hi Alan
>
> Your script is not accounting for CPU 0, so you need to set a thread 
> local flag variable on the off-cpu probe, and predicate the on-cpu on 
> that.
Can you explain this?  Why special accounting for CPU 0?>
> If you''re looking at reducing migrations, the nosteal_nsec
variable
> determines how long a thread should remain in the runq before the 
> scheduler allows it to be stolen by another CPU. Increasing it might 
> help you lower migrations.Thanks for the tip about nosteal_nsec.
max

max at bruningsystems.com

2009-Nov-19 06:35 UTC

head link

[dtrace-discuss] Measuring cpu migrations

Hi,

Rafael Vanoni wrote:> Allan wrote:
>> Hi,
>>
>> I am looking to measure how long a thread takes to migrate between 
>> cpu''s and how often , what I have is below which is checking
just one
>> process is this the correct track to be on here? 
>> My aim is to look at a process and look at upping the 
>> reboose_interval on some of our servers.
>>
>>
>> #!/usr/sbin/dtrace -s
>> sched:::off-cpu
>> {
>>     self->cpu = cpu;
>>     self->timestamp=timestamp;
>> }
>>
>> sched:::on-cpu /self->cpu != cpu &&
execname=="mstragent"/
>> {
>>     printf("%s migrated from cpu %d to cpu %d nsec:%d 
>> \n",execname,self->cpu,cpu,timestamp-self->timestamp);
>>     printf("%s migrated from cpu %d to cpu %d ms:%d 
>>
\n",execname,self->cpu,cpu,(timestamp-self->timestamp)/1000000);
>>     self->cpu = 0;
>>     self->timestamp = 0;
>> }
>> [snip] 
>
> Hi Alan
>
> Your script is not accounting for CPU 0, so you need to set a thread 
> local flag variable on the off-cpu probe, and predicate the on-cpu on 
> that.Oh.  You mean the on-cpu probe firing before the off-cpu when the script 
starts...
Which would explain the large number Allan gets in his output for the first
time the probe fires.

max

Allan

2009-Nov-19 10:20 UTC

head link

[dtrace-discuss] Measuring cpu migrations

Rafael 

Thanks for the update I will be looking to implement the changes. 

Can you advise on the benefit of changing the nosteal_nsec vs rechoose_interval
, my thoughts were if the rechoose interval was increased to an amount larger
than the time spent migrating I would benefit with keeping cpu affinity or would
you set them to the same?

I am looking at a application that has i/o latency issues where it cant handle a
2-3ms delay which we see when we replicate the data via srdf, I was initially
thinking that the thread was being migrated between the cpus which in turn would
add another delay which I was trying to measure other than that of the initial
i/o.  If the thread goes into a sleep state when the i/o is ongoing and then
awakens would the rechoose interval still be exceeded by the sleep time causing
the migration?

Thanks
Al
-- 
This message posted from opensolaris.org

max at bruningsystems.com

2009-Nov-19 11:19 UTC

head link

[dtrace-discuss] Measuring cpu migrations

Allan wrote:> Rafael 
>
> Thanks for the update I will be looking to implement the changes. 
>
> Can you advise on the benefit of changing the nosteal_nsec vs
rechoose_interval , my thoughts were if the rechoose interval was increased to
an amount larger than the time spent migrating I would benefit with keeping cpu
affinity or would you set them to the same?
>
> I am looking at a application that has i/o latency issues where it cant
handle a 2-3ms delay which we see when we replicate the data via srdf, I was
initially thinking that the thread was being migrated between the cpus which in
turn would add another delay which I was trying to measure other than that of
the initial i/o.  If the thread goes into a sleep state when the i/o is ongoing
and then awakens would the rechoose interval still be exceeded by the sleep time
causing the migration?
>   The rechoose_interval value is set so that a thread that sleeps for a short
period of time (< rechoose_interval ticks) will run on the same cpu it
last ran on.  This is to take advantage of a possibly still warm cache on
that cpu.   The migration is occuring because there is some other cpu which
has a lower priority running thread, or the maximum priority of threads 
waiting for
the cpu is the lowest value.  In other words, the system tries to run the
thread after it wakes up as soon as possible, first checking to see if it
is best to run it where it last ran.  The only additional cost for migration
might be a cross call(?).  You might try running the thread bound to
a cpu (via pbind(1) or processor_bind(2)) to see what the difference is in
performance.  (Or try using processor sets).

max

Allan

2009-Nov-19 11:25 UTC

head link

[dtrace-discuss] Measuring cpu migrations

Max

Thanks very much for the input

Regards
Al
-- 
This message posted from opensolaris.org

Rafael Vanoni

2009-Nov-19 21:28 UTC

head link

[dtrace-discuss] Measuring cpu migrations

max at bruningsystems.com wrote:> Allan wrote:
>> Rafael
>> Thanks for the update I will be looking to implement the changes.
>> Can you advise on the benefit of changing the nosteal_nsec vs 
>> rechoose_interval , my thoughts were if the rechoose interval was 
>> increased to an amount larger than the time spent migrating I would 
>> benefit with keeping cpu affinity or would you set them to the same?
>> I am looking at a application that has i/o latency issues where it 
>> cant handle a 2-3ms delay which we see when we replicate the data via 
>> srdf, I was initially thinking that the thread was being migrated 
>> between the cpus which in turn would add another delay which I was 
>> trying to measure other than that of the initial i/o.  If the thread 
>> goes into a sleep state when the i/o is ongoing and then awakens would 
>> the rechoose interval still be exceeded by the sleep time causing the 
>> migration?
>>   
> The rechoose_interval value is set so that a thread that sleeps for a short
> period of time (< rechoose_interval ticks) will run on the same cpu it
> last ran on.  This is to take advantage of a possibly still warm cache on
> that cpu.   The migration is occuring because there is some other cpu which
> has a lower priority running thread, or the maximum priority of threads 
> waiting for the cpu is the lowest value.
That''s correct, but these variables are used in different situations
for
similar purposes. rechoose_interval is used to help decide on which 
CPU''s runq a thread should be placed. nosteal_nsec is used when an idle
CPU checks if there''s any work that can be stolen from another
CPU''s runq.

This means in part that a thread that was kept on a runq to preserve its 
cache warm might still get stolen by another CPU if possible. But the 
code that does the stealing checks for a number of different things, one 
of which is whether the stealing CPU and the current target CPU share 
cache. If they don''t share, then nosteal_nsec is used to see if the 
target thread has been on the runq for long enough.

Please note that there are a number of other things that are taken into 
consideration when placing threads on qs and when stealing them, and 
that changing these variables from their default values is at your own risk.

I''m currently working on optimizations to reduce thread migrations, but
we don''t have a target for integration yet.

Rafael

Apparently Analagous Threads

Search for more apparently analagous threads

dtrace discuss - Nov 2009 - Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

[dtrace-discuss] Measuring cpu migrations

Apparently Analagous Threads