thr3ads.net - dtrace discuss - [dtrace-discuss] dtrace performance overhead [Jun 2009]

If this information is useful, please help other people find it:
Share via:

tester

2009-Jun-02 11:27 UTC

[dtrace-discuss] dtrace performance overhead

Hi,

On a T5220, when using the speculative tracing there is a signifcant increase on
system load. I am using the examples from

http://wikis.sun.com/display/DTrace/Speculative+Tracing

The system  call traced is fork instead of open64. Can that script cause such a
load? The system itself without the script is moderately loaded (usr+sys
~30-40).

Also, I have to bump up the number of speculative buffers to 10 to avoid some of
the warnings. What is a good rule to set the number of speculative buffers? In
this case is it based on the number of fork calls that could be simultaneously
traced?

Thanks
-- 
This message posted from opensolaris.org

Jim Mauro

2009-Jun-02 14:59 UTC

head link

[dtrace-discuss] dtrace performance overhead

Which example are you using, specopen.d, /*the script
that instruments every fbt probe*/?

Please post or be more precise about which script you''re using.

If you''re using specopen.d, than you''re enabling on the
order of 30,000 probes. That''s going to add up, even at
the very reasonable cost of about 1 microsecond per probe
(rough approximation).

In general, the cost of using DTrace in terms of probe effect
is directly proprotional to two things;
- the number of probes you enable.
- the activity of the code path of the enabled probe(s).

In other words, if you instrument the fork system call,
and the system is not executing forks, you won''t feel a thing.
If, on the other hand, the system is going 1000 forks per second,
you''ll feel it.

The specopen.d script enables all the fbt probes. Even with
the predicate ( / self->spec / ), the probe still has to fire before
the predicate gets evaluated.

HTH,
/jim

tester wrote:> Hi,
>
> On a T5220, when using the speculative tracing there is a signifcant
increase on system load. I am using the examples from
>
> http://wikis.sun.com/display/DTrace/Speculative+Tracing
>
> The system  call traced is fork instead of open64. Can that script cause
such a load? The system itself without the script is moderately loaded (usr+sys
~30-40).
>
> Also, I have to bump up the number of speculative buffers to 10 to avoid
some of the warnings. What is a good rule to set the number of speculative
buffers? In this case is it based on the number of fork calls that could be
simultaneously traced?
>
> Thanks
>

tester

2009-Jun-02 15:02 UTC

head link

[dtrace-discuss] dtrace performance overhead

counting system call process during this interval: Dtrace came on top

ioctl dtrace 10609

I am sure if that is from the speculative dtrace script or the script used to
count the system calls.

Thanks
-- 
This message posted from opensolaris.org

Jim Mauro

2009-Jun-02 15:14 UTC

head link

[dtrace-discuss] dtrace performance overhead

I''m sorry, but I am unable to parse this.

What is the question here?

Thanks,
/jim


tester wrote:> counting system call process during this interval: Dtrace came on top
>
> ioctl dtrace 10609
>
> I am sure if that is from the speculative dtrace script or the script used
to count the system calls.
>
> Thanks
>

Jim Mauro

2009-Jun-02 15:48 UTC

head link

[dtrace-discuss] dtrace performance overhead

Ah, OK - I think I get it.

tester wrote:> counting system call process during this interval: Dtrace came on top
>
> ioctl dtrace 10609
>   
Got  it. DTrace executed 10,609 system calls during your sampling period,
more than any other process. I often filter dtrace out in a predicate;

/ execname != "dtrace" /
> I am sure if that is from the speculative dtrace script or the script used
to count the system calls.
>   
Both. If you want to know precisely which one, you can use pid
as an aggregation key in the count aggregation, and correlate the
pid to the script you''re starting. I''m not sure what
you''re chasing
here though...

/jim
> Thanks
>

tester

2009-Jun-03 01:30 UTC

head link

[dtrace-discuss] dtrace performance overhead

Jim,

Thanks. You are right, I was using the specopen.d, but looking for fork errors
instead of open. I did not know that probe has to fire before predicate gets
evaluated. It now makes sense for 40% increase in load during dtracing.  I would
like to see the code path during a fork failure (and ultimately the reason), do
you have any suggestion without incurring a significant overhead. We have an
application that forks around 1000+ processes in 5-10 seconds and occassionaly
the fork fails. I am think it is from lack of VM, but also have a feeling that
there are some hardcoded limits in the kernel. There was plenty of RAM available
duiring the fork failure.

Thanks
-- 
This message posted from opensolaris.org

Michael Schuster

2009-Jun-03 16:19 UTC

head link

[dtrace-discuss] dtrace performance overhead

On 06/02/09 18:30, tester wrote:> Jim,
> 
> Thanks. You are right, I was using the specopen.d, but looking for fork
> errors instead of open. I did not know that probe has to fire before
> predicate gets evaluated. It now makes sense for 40% increase in load
> during dtracing.  I would like to see the code path during a fork
> failure (and ultimately the reason), do you have any suggestion without
> incurring a significant overhead. We have an application that forks
> around 1000+ processes in 5-10 seconds and occassionaly the fork fails.
> I am think it is from lack of VM, but also have a feeling that there are
> some hardcoded limits in the kernel. There was plenty of RAM available
> duiring the fork failure.
I''d suggest you look at speculations - commit the speculation on fork 
failure, discard it on success.

HTH
Michael
-- 
Michael Schuster 	http://blogs.sun.com/recursion
Recursion, n.: see ''Recursion''

Jim Mauro

2009-Jun-03 16:20 UTC

head link

[dtrace-discuss] dtrace performance overhead

Hi (ummm, Tester?) -

First and foremost, what''s the errno on the fork failure?
99% of the time, the errno information is enough to figure out
why forks are failing.

Second, make sure you look in /var/adm/messages - if fork
is failing because of a system resource issue, you''ll often
get a syslog message that tells you what''s wrong.

You really should not need to generate a kernel function call
trace to root-cause the failure of any system call, including fork.
It''s like using an MRI and studying the resuling film when the
problem is a small cut that just needs some disinfectint and
a band-aid. You''re looking much deeper than you need to.

Thanks,
/jim

tester wrote:> Jim,
>
> Thanks. You are right, I was using the specopen.d, but looking for fork
errors instead of open. I did not know that probe has to fire before predicate
gets evaluated. It now makes sense for 40% increase in load during dtracing.  I
would like to see the code path during a fork failure (and ultimately the
reason), do you have any suggestion without incurring a significant overhead. We
have an application that forks around 1000+ processes in 5-10 seconds and
occassionaly the fork fails. I am think it is from lack of VM, but also have a
feeling that there are some hardcoded limits in the kernel. There was plenty of
RAM available duiring the fork failure.
>
> Thanks
>

tester

2009-Jun-03 20:25 UTC

head link

[dtrace-discuss] dtrace performance overhead

Michael,

Thanks. I think that''s what the script from wiki.sun.com (specopen.d)
does. Did I mis interpret your suggestion?

Thanks
-- 
This message posted from opensolaris.org

tester

2009-Jun-03 20:33 UTC

head link

[dtrace-discuss] dtrace performance overhead

Hi Jim,

The app software doesn''t poduce a errno  in its logs (bad software,
although from a leading vendor, I think they inherited it, but a error string
says "not enough space" I tried grepping some of the header files but
could not find a match.

/var/adm/messages: that''s the first thing I looked at, but no help
there either.

I tried to match the error string in src.opensolaris.org, some hits turned up
but I wasn''t confident they were a result of this.

Thank you.
-- 
This message posted from opensolaris.org

Jim Mauro

2009-Jun-03 20:51 UTC

head link

[dtrace-discuss] dtrace performance overhead

Try this;

#!/usr/sbin/dtrace -s

#pragma D option quiet

extern int errno;

syscall::forkall:return,
syscall::vfork:return,
syscall::forksys:return,
syscall::fork1:return
/ arg0 == -1 || arg1 == -1 /
{
    printf("FORKED FAILED, errno: %d, arg0: %d, arg1: %d\n",errno,
arg0,
arg1);
}

tester

2009-Jun-03 21:24 UTC

head link

[dtrace-discuss] dtrace performance overhead

Thanks Jim. Will use this during the next testing window.
-- 
This message posted from opensolaris.org

Jim Mauro

2009-Jun-03 21:27 UTC

head link

[dtrace-discuss] dtrace performance overhead

"not enough space" indicates an errno 28 ENOSPC, which isn''t
listed is the fork man page under ERRORS. Are you sure it''s
fork(2) that''s failing?

It may be errno 12, ENOMEM.

So what does a general memory health profile of the system
look like? Lots of free memory? Plenty of swap space?

How about the process executing the forks - is it a
32-bit or 64-bit process? If 32-bit, is it hitting the limits
of it''s address space?

/jim

tester wrote:> Hi Jim,
>
> The app software doesn''t poduce a errno  in its logs (bad
software, although from a leading vendor, I think they inherited it, but a error
string says "not enough space" I tried grepping some of the header
files but could not find a match.
>
> /var/adm/messages: that''s the first thing I looked at, but no help
there either.
>
> I tried to match the error string in src.opensolaris.org, some hits turned
up but I wasn''t confident they were a result of this.
>
> Thank you.
>

Jim Mauro

2009-Jun-03 21:40 UTC

head link

[dtrace-discuss] dtrace performance overhead

D''oh!

Disregard that last question (address space) - my brain
was thinking thread create failures - it''s not applicable
to fork failures. My bad.

The system memory and swap space health checks
still apply, as well as process count -
grab some "sar -v 1 60" samples....

/jim


Jim Mauro wrote:> "not enough space" indicates an errno 28 ENOSPC, which
isn''t
> listed is the fork man page under ERRORS. Are you sure it''s
> fork(2) that''s failing?
>
> It may be errno 12, ENOMEM.
>
> So what does a general memory health profile of the system
> look like? Lots of free memory? Plenty of swap space?
>
> How about the process executing the forks - is it a
> 32-bit or 64-bit process? If 32-bit, is it hitting the limits
> of it''s address space?
>
> /jim
>
> tester wrote:
>> Hi Jim,
>>
>> The app software doesn''t poduce a errno  in its logs (bad
software,
>> although from a leading vendor, I think they inherited it, but a 
>> error string says "not enough space" I tried grepping some of
the
>> header files but could not find a match.
>>
>> /var/adm/messages: that''s the first thing I looked at, but no
help
>> there either.
>>
>> I tried to match the error string in src.opensolaris.org, some hits 
>> turned up but I wasn''t confident they were a result of this.
>>
>> Thank you.
>>   
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org

tester

2009-Jun-04 01:59 UTC

head link

[dtrace-discuss] dtrace performance overhead

Jim,

Thanks . Here is a snapshot from vmsat 5 o/p during the failure.
I am not sure how this is going to be formatted on the forum.

kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr rm s0 s1 s2   in   sy   cs us sy id


[pre]
 0 0 0 62386728 56443136 338 3941 0 0 0 0 0 0  0  0  0 19267 85670 37553 10 7 83
 0 0 0 56037624 55132248 52 13591 0 0 0 0 0 0  0  0  0 3674 21572 1283 30 46 24
 1 0 0 34962096 53029088 16 11773 0 0 0 0 0 0  0  0  0 4725 16248 1491 13 78 9
 1 0 0 22086880 51409376 15 8730 0 0 0 0 0  0  0  0  0 4976 14733 1741 9 76 16
 2 0 0 12591624 50035952 81 7687 0 0 0 0 0  0  0  0  0 4551 22864 1684 7 68 25
 2 0 0 4891536 48774296 33 6575 0 0 0 0  0  0  0  0  0 3883 25456 2479 6 57 37
 [b]400 0 0 2664312 48843472 12 3134 0 0 0 0 0 0  0  0  0 4154 9433 2123  3 35
62
 [/b]143 0 0 9577832 49808640 31 662 0 0 0 0 0  0  0  0  0 8319 8140 1282  1 95 
4
 0 0 0 9863824 49818808 64 5725 0 0 0 0  0  0  0  0  0 12599 41575 24975 7 25 69
 0 0 0 8854032 49553696 16 86 0 0  0  0  0  0  0  0  0 5499 114000 12349 5 35 60
[/pre]

runqueue is bombarded. 

here is the progression on thenumber of processes forked every 5sec

number of application processes running (sampled every 5 sec)
=============================================      5
       8
       7
       7
       7
       7
      21
      51
      55
     132
     622
     896
    1121
    1293
    1142

Thank you
-- 
This message posted from opensolaris.org

dtrace discuss - Jun 2009 - dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead

[dtrace-discuss] dtrace performance overhead