thr3ads.net - dtrace discuss - [dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting? [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Qihua Wu

2009-Jan-20 09:38 UTC

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

Our system experienced high sys cpu time, (80%+), is there any tool or
script to check which function caused most of the sys cpu time?


Thanks,
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090120/64b30501/attachment.html>

Chad Mynhier

2009-Jan-20 11:23 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

On Tue, Jan 20, 2009 at 4:38 AM, Qihua Wu <dtrace.wu at gmail.com>
wrote:> Our system experienced high sys cpu time, (80%+), is there any tool or
> script to check which function caused most of the sys cpu time?
>
Assuming you can catch the system when it''s experiencing this, you can
do kernel stack sampling to see where the kernel is spending most of
its time:

dtrace -n ''profile-997/arg0 && curthread->t_pri !=
-1/{@c[stack()] count()}END{trunc(@c, 40)}''

You can add other things to the tuple that you''re aggregating on if
you need other information, e.g., ''@c[execname, stack()]'' or
''@c[pid,
stack()]''.

If you don''t care about the stack per se, and if it''s
available to you
(I don''t know off the top of my head which version this went back
into), you could also just aggregate on the kernel function using
''@c[func(arg0)]''.

Chad

Chad Mynhier

2009-Jan-20 14:06 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

On Tue, Jan 20, 2009 at 6:23 AM, Chad Mynhier <cmynhier at gmail.com>
wrote:>
> If you don''t care about the stack per se, and if it''s
available to you
> (I don''t know off the top of my head which version this went back
> into), you could also just aggregate on the kernel function using
> ''@c[func(arg0)]''.
To add a note to this:  even though you could do this, it''s pretty
unlikely that it''s going to be informative.  You''ll likely end
up
spending a lot of time in some utility function coming from a number
of different code paths.  Here''s some sample output:

 genunix`rm_assize                                              3642
 unix`lock_set                                                  3725
 unix`mutex_exit                                                3922
 genunix`sleepq_wakeone_chan                                    4454
 unix`xc_loop                                                   5991
 unix`disp_getwork                                              6755
 unix`utl0                                                      6963
 unix`fp_restore                                                7001
 FJSV,SPARC64-VI`copyout                                        7204
 unix`mutex_enter                                              12547

It does you no good to know that you spent a lot of time in
mutex_enter(), because you don''t have any information as to why.
Better information is more likely to be found deeper in the stack.

Chad

Qihua Wu

2009-Jan-21 00:56 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

We have a database was very slow suddenly, when I used truss and I find the
pread syscall from oracle was slow (130+ ms). But the iostat and vxstat(we
used veritas QIO) show nothing abnormal. From vmstat we could see the sys
cpu is very high.

The problem happened after we encountered an error from in.mpathd. It hard
to believe it is in.mpathd that result in the problem. in.mpathd is a single
thread process (prstat -L), so it shouldn''t use so much sys time. But
it is
very likely in.mapthd was the root cause as we encounter the similar issue
on another database server, the database wasn''t reponsive soon after
in.mpathd throw error.
Jan 14 09:12:53 sjcdb475 in.mpathd[413]: [ID 585766 daemon.error] Cannot
meet requested failure detection time of 10000 ms on (inet nxge0) new
failure detection time for group "mnic" is 89894 ms
Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved
failure detection time 44947 ms on (inet nxge3) for group "mnic"
Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved
failure detection time 22473 ms on (inet nxge0) for group "mnic"

On Tue, Jan 20, 2009 at 10:06 PM, Chad Mynhier <cmynhier at gmail.com>
wrote:
> On Tue, Jan 20, 2009 at 6:23 AM, Chad Mynhier <cmynhier at gmail.com>
wrote:
> >
> > If you don''t care about the stack per se, and if
it''s available to you
> > (I don''t know off the top of my head which version this went
back
> > into), you could also just aggregate on the kernel function using
> > ''@c[func(arg0)]''.
>
> To add a note to this:  even though you could do this, it''s pretty
> unlikely that it''s going to be informative.  You''ll
likely end up
> spending a lot of time in some utility function coming from a number
> of different code paths.  Here''s some sample output:
>
>  genunix`rm_assize                                              3642
>  unix`lock_set                                                  3725
>  unix`mutex_exit                                                3922
>  genunix`sleepq_wakeone_chan                                    4454
>  unix`xc_loop                                                   5991
>  unix`disp_getwork                                              6755
>  unix`utl0                                                      6963
>  unix`fp_restore                                                7001
>  FJSV,SPARC64-VI`copyout                                        7204
>  unix`mutex_enter                                              12547
>
> It does you no good to know that you spent a lot of time in
> mutex_enter(), because you don''t have any information as to why.
> Better information is more likely to be found deeper in the stack.
>
> Chad
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090121/23207ec9/attachment.html>

David Lutz

2009-Jan-22 14:49 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

Qihua Wu <dtrace.wu at gmail.com> wrote:>
> We have a database was very slow suddenly, when I used truss and I find the
> pread syscall from oracle was slow (130+ ms). But the iostat and vxstat(we
> used veritas QIO) show nothing abnormal. From vmstat we could see the sys
> cpu is very high.
> 
> The problem happened after we encountered an error from in.mpathd. It hard
> to believe it is in.mpathd that result in the problem. in.mpathd is a
single
> thread process (prstat -L), so it shouldn''t use so much sys time.
But it is
> very likely in.mapthd was the root cause as we encounter the similar issue
> on another database server, the database wasn''t reponsive soon
after
> in.mpathd throw error.
I think it is much more likely that the in.mpathd errors are a symptom
of a problem rather than an indication that it is the cause.  in.mpathd
is very time sensitive, so if there is an issue that prevents it from getting
cpu  time, you will likely see errors like the ones you mentioned.

I generally use lockstat to do kernel profiling when I see high %sys
values.  For example, "lockstat -kgIW -D 20 sleep 5" run as root or
with
pfexec will help locate hot functions in the kernel or in a device driver.
lockstat is available in Solaris 2.6 and above, and uses dtrace to
collect data in Solaris 10 and above.   For more info, see Example 4
"Basic Kernel Profiling" and Example 5 "Generated-load
Profiling" in
http://docs.sun.com/app/docs/doc/816-5166/lockstat-1m for a
description of how this works.

The "Caller" column in the lockstat output lists the names of
functions
that appear in kernel call stacks.  You can use the OpenSolaris source
browser at http://src.opensolaris.org/source/ to search for these as
symbols in the source code, to try to determine what subsystem is
hot.  If you have a support contract, you could also search for the
function name in the SunSolve collections at http://sunsolve.sun.com/
to see if there is a known performance issue.

You mentioned that you are running vxfs, so keep in mind that function
names beginning with vx are often within the VxFS/VxVM drivers and
won''t appear in the Solaris source code.

HTH,
David

Qihua Wu

2009-Jan-23 00:54 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

Thanks for your invaluable information, we are seeking the privilege to run
lockstat and dtrace from SA.

Could you please also tell me where my following understanding of sys time
is wrong?

pread is a syscall, after the user process called pread, the user process
will be put into sleep queue and sleep, suppose the pread will return after
10ms, then this 10ms will be counted as sleep time.

For the real work of the pread, a kernel thread will request the data from
disk to be read in, suppose the seek time and rotational delay is 7 ms
during which the kernel thread is just waiting, then the other 3ms (10ms
-7ms) will be used to transfer the data from disk to memory, and this 3 ms
will be counted as sys time. Then during the 10ms, 10ms will be counted as
sleep time, 3ms will be counted as sys time, and totally 13ms will be
counted within 10ms. Then the total percent time of sys, sleep, user is
larger than 100% (130%) in this 10ms. Although I am sure the sys%+sleep+usr%
should be 100% for any process, but not sure where in my above statement is
wrong.


Thanks,

On Thu, Jan 22, 2009 at 10:49 PM, David Lutz <David.Lutz at sun.com>
wrote:
> Qihua Wu <dtrace.wu at gmail.com> wrote:
> >
> > We have a database was very slow suddenly, when I used truss and I
find
> the
> > pread syscall from oracle was slow (130+ ms). But the iostat and
> vxstat(we
> > used veritas QIO) show nothing abnormal. From vmstat we could see the
sys
> > cpu is very high.
> >
> > The problem happened after we encountered an error from in.mpathd. It
> hard
> > to believe it is in.mpathd that result in the problem. in.mpathd is a
> single
> > thread process (prstat -L), so it shouldn''t use so much sys
time. But it
> is
> > very likely in.mapthd was the root cause as we encounter the similar
> issue
> > on another database server, the database wasn''t reponsive
soon after
> > in.mpathd throw error.
>
> I think it is much more likely that the in.mpathd errors are a symptom
> of a problem rather than an indication that it is the cause.  in.mpathd
> is very time sensitive, so if there is an issue that prevents it from
> getting
> cpu  time, you will likely see errors like the ones you mentioned.
>
> I generally use lockstat to do kernel profiling when I see high %sys
> values.  For example, "lockstat -kgIW -D 20 sleep 5" run as root
or with
> pfexec will help locate hot functions in the kernel or in a device driver.
> lockstat is available in Solaris 2.6 and above, and uses dtrace to
> collect data in Solaris 10 and above.   For more info, see Example 4
> "Basic Kernel Profiling" and Example 5 "Generated-load
Profiling" in
> http://docs.sun.com/app/docs/doc/816-5166/lockstat-1m for a
> description of how this works.
>
> The "Caller" column in the lockstat output lists the names of
functions
> that appear in kernel call stacks.  You can use the OpenSolaris source
> browser at http://src.opensolaris.org/source/ to search for these as
> symbols in the source code, to try to determine what subsystem is
> hot.  If you have a support contract, you could also search for the
> function name in the SunSolve collections at http://sunsolve.sun.com/
> to see if there is a known performance issue.
>
> You mentioned that you are running vxfs, so keep in mind that function
> names beginning with vx are often within the VxFS/VxVM drivers and
> won''t appear in the Solaris source code.
>
> HTH,
> David
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090123/3d14a4bb/attachment.html>

David Lutz

2009-Jan-23 14:03 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

Qihua Wu <dtrace.wu at gmail.com> wrote:>
> Thanks for your invaluable information, we are seeking the privilege 
> to run lockstat and dtrace from SA.
> 
> Could you please also tell me where my following understanding of sys 
> time is wrong?
> 
> pread is a syscall, after the user process called pread, the user process
> will be put into sleep queue and sleep, suppose the pread will return 
> after 10ms, then this 10ms will be counted as sleep time.
> 
> For the real work of the pread, a kernel thread will request the data from
> disk to be read in, suppose the seek time and rotational delay is 7 ms
> during which the kernel thread is just waiting, then the other 3ms (10ms
> -7ms) will be used to transfer the data from disk to memory, and this 
> 3 ms will be counted as sys time. Then during the 10ms, 10ms will be 
> counted as sleep time, 3ms will be counted as sys time, and totally 13ms
> will be counted within 10ms. Then the total percent time of sys, sleep,
> user is larger than 100% (130%) in this 10ms. Although I am sure the
> sys%+sleep+usr% should be 100% for any process, but not sure where
> in my above statement is wrong.
> 
I think you may have a common misconception systems calls.  In
general, making a system call does not spawn a new thread and
does not put the caller to sleep.  Two of the primary jobs
of the system call layer are to provide a published interface
to kernel routines and to provide gate keeping to prevent
processes from accessing or modifying data that doesn''t belong
to them.  Once a thread passes the gate keeping checks, it
has its privileges raised and begins executing functions in
the kernel code just like functions in the user program or
any linked in libraries.  It is still the same thread, but it
is now able to access and modify data directly in the kernel''s
address space, and call any function in the kernel code.

In your example, the user thread and the kernel thread are
really the same thread.  The %sys just refers to the time that
the thread spent executing functions in the kernel code.  If
the pread system call ran for 10ms, and the thread was asleep
for 7ms of that time, you would have 3ms of %sys time and
no %usr time.

David
> 
> Thanks,
> 
> On Thu, Jan 22, 2009 at 10:49 PM, David Lutz <David.Lutz at sun.com>
wrote:
> 
> > Qihua Wu <dtrace.wu at gmail.com> wrote:
> > >
> > > We have a database was very slow suddenly, when I used truss and
I
> find
> > the
> > > pread syscall from oracle was slow (130+ ms). But the iostat and
> > vxstat(we
> > > used veritas QIO) show nothing abnormal. From vmstat we could see
> the sys
> > > cpu is very high.
> > >
> > > The problem happened after we encountered an error from
in.mpathd.
> It
> > hard
> > > to believe it is in.mpathd that result in the problem. in.mpathd 
> is a
> > single
> > > thread process (prstat -L), so it shouldn''t use so much
sys time.
> But it
> > is
> > > very likely in.mapthd was the root cause as we encounter the
similar
> > issue
> > > on another database server, the database wasn''t
reponsive soon after
> > > in.mpathd throw error.
> >
> > I think it is much more likely that the in.mpathd errors are a symptom
> > of a problem rather than an indication that it is the cause. 
in.mpathd
> > is very time sensitive, so if there is an issue that prevents it from
> > getting
> > cpu  time, you will likely see errors like the ones you mentioned.
> >
> > I generally use lockstat to do kernel profiling when I see high %sys
> > values.  For example, "lockstat -kgIW -D 20 sleep 5" run as
root or
> with
> > pfexec will help locate hot functions in the kernel or in a device
driver.
> > lockstat is available in Solaris 2.6 and above, and uses dtrace to
> > collect data in Solaris 10 and above.   For more info, see Example 4
> > "Basic Kernel Profiling" and Example 5 "Generated-load
Profiling" in
> > http://docs.sun.com/app/docs/doc/816-5166/lockstat-1m for a
> > description of how this works.
> >
> > The "Caller" column in the lockstat output lists the names
of functions
> > that appear in kernel call stacks.  You can use the OpenSolaris source
> > browser at http://src.opensolaris.org/source/ to search for these as
> > symbols in the source code, to try to determine what subsystem is
> > hot.  If you have a support contract, you could also search for the
> > function name in the SunSolve collections at http://sunsolve.sun.com/
> > to see if there is a known performance issue.
> >
> > You mentioned that you are running vxfs, so keep in mind that function
> > names beginning with vx are often within the VxFS/VxVM drivers and
> > won''t appear in the Solaris source code.
> >
> > HTH,
> > David
> >
> >

Steve Scargall

2009-Jan-24 09:07 UTC

head link

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

> We have a database was very slow suddenly, when I used truss and I find the
pread syscall from oracle
> was slow (130+ ms). But the iostat and vxstat(we used veritas QIO) show
nothing abnormal. From vmstat we could see the sys cpu is very high.
There is a known issue with Oracle''s ASM layer that can give this sort
of behaviour.  If this is the problem I think it is you need to speak with
Oracle Support and get a patch/update from them.  It would help if you could
describe your infrastructure:

Q: What OS do you have (uname -a)
Q: What systems are you using?
Q: What version of Oracle do you have?
Q: Do you use Oracle ASM?
Q: Do you see high service times in the Oracle AWR reports but not at the
physical level (iostat,vxstat, or at the storage)?
Q: Do you see the problem when large numbers of users connect within a small
time period?


Hint: Write some D to look at what happens once pread() is called and time each
subsequent function.  Use the ''flowindent'' argument to DTrace
to make things easier to see.  Refine your D to look at the functions which are
taking a long time (similar time to pread).  Quantized graphs are particularly
useful for the second itteration to show the distribution (in time) for each
function of interest.  If you are using ASM specifically look at the ksfdread(),
ksfdshare(), & skgfqio() functions within the ASM Layer using the PID
provider.  Note you''ll need Solaris 10 update 4 or higher for the PID
provider.  You''ll also want to look at and time the open() system call
using the SYSCALL provider.  If you find open() is the thing taking the time
then it''s the Oracle issue I''m thinking of.

You should really open a support call with Sun & Oracle for this to get some
dedicated resource.
-- 
This message posted from opensolaris.org

dtrace discuss - Jan 2009 - high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?

[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?