Qihua Wu
2009-Jan-20 09:38 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
Our system experienced high sys cpu time, (80%+), is there any tool or script to check which function caused most of the sys cpu time? Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090120/64b30501/attachment.html>
Chad Mynhier
2009-Jan-20 11:23 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
On Tue, Jan 20, 2009 at 4:38 AM, Qihua Wu <dtrace.wu at gmail.com> wrote:> Our system experienced high sys cpu time, (80%+), is there any tool or > script to check which function caused most of the sys cpu time? >Assuming you can catch the system when it''s experiencing this, you can do kernel stack sampling to see where the kernel is spending most of its time: dtrace -n ''profile-997/arg0 && curthread->t_pri != -1/{@c[stack()] count()}END{trunc(@c, 40)}'' You can add other things to the tuple that you''re aggregating on if you need other information, e.g., ''@c[execname, stack()]'' or ''@c[pid, stack()]''. If you don''t care about the stack per se, and if it''s available to you (I don''t know off the top of my head which version this went back into), you could also just aggregate on the kernel function using ''@c[func(arg0)]''. Chad
Chad Mynhier
2009-Jan-20 14:06 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
On Tue, Jan 20, 2009 at 6:23 AM, Chad Mynhier <cmynhier at gmail.com> wrote:> > If you don''t care about the stack per se, and if it''s available to you > (I don''t know off the top of my head which version this went back > into), you could also just aggregate on the kernel function using > ''@c[func(arg0)]''.To add a note to this: even though you could do this, it''s pretty unlikely that it''s going to be informative. You''ll likely end up spending a lot of time in some utility function coming from a number of different code paths. Here''s some sample output: genunix`rm_assize 3642 unix`lock_set 3725 unix`mutex_exit 3922 genunix`sleepq_wakeone_chan 4454 unix`xc_loop 5991 unix`disp_getwork 6755 unix`utl0 6963 unix`fp_restore 7001 FJSV,SPARC64-VI`copyout 7204 unix`mutex_enter 12547 It does you no good to know that you spent a lot of time in mutex_enter(), because you don''t have any information as to why. Better information is more likely to be found deeper in the stack. Chad
Qihua Wu
2009-Jan-21 00:56 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
We have a database was very slow suddenly, when I used truss and I find the pread syscall from oracle was slow (130+ ms). But the iostat and vxstat(we used veritas QIO) show nothing abnormal. From vmstat we could see the sys cpu is very high. The problem happened after we encountered an error from in.mpathd. It hard to believe it is in.mpathd that result in the problem. in.mpathd is a single thread process (prstat -L), so it shouldn''t use so much sys time. But it is very likely in.mapthd was the root cause as we encounter the similar issue on another database server, the database wasn''t reponsive soon after in.mpathd throw error. Jan 14 09:12:53 sjcdb475 in.mpathd[413]: [ID 585766 daemon.error] Cannot meet requested failure detection time of 10000 ms on (inet nxge0) new failure detection time for group "mnic" is 89894 ms Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved failure detection time 44947 ms on (inet nxge3) for group "mnic" Jan 14 09:13:53 sjcdb475 in.mpathd[413]: [ID 302819 daemon.error] Improved failure detection time 22473 ms on (inet nxge0) for group "mnic" On Tue, Jan 20, 2009 at 10:06 PM, Chad Mynhier <cmynhier at gmail.com> wrote:> On Tue, Jan 20, 2009 at 6:23 AM, Chad Mynhier <cmynhier at gmail.com> wrote: > > > > If you don''t care about the stack per se, and if it''s available to you > > (I don''t know off the top of my head which version this went back > > into), you could also just aggregate on the kernel function using > > ''@c[func(arg0)]''. > > To add a note to this: even though you could do this, it''s pretty > unlikely that it''s going to be informative. You''ll likely end up > spending a lot of time in some utility function coming from a number > of different code paths. Here''s some sample output: > > genunix`rm_assize 3642 > unix`lock_set 3725 > unix`mutex_exit 3922 > genunix`sleepq_wakeone_chan 4454 > unix`xc_loop 5991 > unix`disp_getwork 6755 > unix`utl0 6963 > unix`fp_restore 7001 > FJSV,SPARC64-VI`copyout 7204 > unix`mutex_enter 12547 > > It does you no good to know that you spent a lot of time in > mutex_enter(), because you don''t have any information as to why. > Better information is more likely to be found deeper in the stack. > > Chad >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090121/23207ec9/attachment.html>
David Lutz
2009-Jan-22 14:49 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
Qihua Wu <dtrace.wu at gmail.com> wrote:> > We have a database was very slow suddenly, when I used truss and I find the > pread syscall from oracle was slow (130+ ms). But the iostat and vxstat(we > used veritas QIO) show nothing abnormal. From vmstat we could see the sys > cpu is very high. > > The problem happened after we encountered an error from in.mpathd. It hard > to believe it is in.mpathd that result in the problem. in.mpathd is a single > thread process (prstat -L), so it shouldn''t use so much sys time. But it is > very likely in.mapthd was the root cause as we encounter the similar issue > on another database server, the database wasn''t reponsive soon after > in.mpathd throw error.I think it is much more likely that the in.mpathd errors are a symptom of a problem rather than an indication that it is the cause. in.mpathd is very time sensitive, so if there is an issue that prevents it from getting cpu time, you will likely see errors like the ones you mentioned. I generally use lockstat to do kernel profiling when I see high %sys values. For example, "lockstat -kgIW -D 20 sleep 5" run as root or with pfexec will help locate hot functions in the kernel or in a device driver. lockstat is available in Solaris 2.6 and above, and uses dtrace to collect data in Solaris 10 and above. For more info, see Example 4 "Basic Kernel Profiling" and Example 5 "Generated-load Profiling" in http://docs.sun.com/app/docs/doc/816-5166/lockstat-1m for a description of how this works. The "Caller" column in the lockstat output lists the names of functions that appear in kernel call stacks. You can use the OpenSolaris source browser at http://src.opensolaris.org/source/ to search for these as symbols in the source code, to try to determine what subsystem is hot. If you have a support contract, you could also search for the function name in the SunSolve collections at http://sunsolve.sun.com/ to see if there is a known performance issue. You mentioned that you are running vxfs, so keep in mind that function names beginning with vx are often within the VxFS/VxVM drivers and won''t appear in the Solaris source code. HTH, David
Qihua Wu
2009-Jan-23 00:54 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
Thanks for your invaluable information, we are seeking the privilege to run lockstat and dtrace from SA. Could you please also tell me where my following understanding of sys time is wrong? pread is a syscall, after the user process called pread, the user process will be put into sleep queue and sleep, suppose the pread will return after 10ms, then this 10ms will be counted as sleep time. For the real work of the pread, a kernel thread will request the data from disk to be read in, suppose the seek time and rotational delay is 7 ms during which the kernel thread is just waiting, then the other 3ms (10ms -7ms) will be used to transfer the data from disk to memory, and this 3 ms will be counted as sys time. Then during the 10ms, 10ms will be counted as sleep time, 3ms will be counted as sys time, and totally 13ms will be counted within 10ms. Then the total percent time of sys, sleep, user is larger than 100% (130%) in this 10ms. Although I am sure the sys%+sleep+usr% should be 100% for any process, but not sure where in my above statement is wrong. Thanks, On Thu, Jan 22, 2009 at 10:49 PM, David Lutz <David.Lutz at sun.com> wrote:> Qihua Wu <dtrace.wu at gmail.com> wrote: > > > > We have a database was very slow suddenly, when I used truss and I find > the > > pread syscall from oracle was slow (130+ ms). But the iostat and > vxstat(we > > used veritas QIO) show nothing abnormal. From vmstat we could see the sys > > cpu is very high. > > > > The problem happened after we encountered an error from in.mpathd. It > hard > > to believe it is in.mpathd that result in the problem. in.mpathd is a > single > > thread process (prstat -L), so it shouldn''t use so much sys time. But it > is > > very likely in.mapthd was the root cause as we encounter the similar > issue > > on another database server, the database wasn''t reponsive soon after > > in.mpathd throw error. > > I think it is much more likely that the in.mpathd errors are a symptom > of a problem rather than an indication that it is the cause. in.mpathd > is very time sensitive, so if there is an issue that prevents it from > getting > cpu time, you will likely see errors like the ones you mentioned. > > I generally use lockstat to do kernel profiling when I see high %sys > values. For example, "lockstat -kgIW -D 20 sleep 5" run as root or with > pfexec will help locate hot functions in the kernel or in a device driver. > lockstat is available in Solaris 2.6 and above, and uses dtrace to > collect data in Solaris 10 and above. For more info, see Example 4 > "Basic Kernel Profiling" and Example 5 "Generated-load Profiling" in > http://docs.sun.com/app/docs/doc/816-5166/lockstat-1m for a > description of how this works. > > The "Caller" column in the lockstat output lists the names of functions > that appear in kernel call stacks. You can use the OpenSolaris source > browser at http://src.opensolaris.org/source/ to search for these as > symbols in the source code, to try to determine what subsystem is > hot. If you have a support contract, you could also search for the > function name in the SunSolve collections at http://sunsolve.sun.com/ > to see if there is a known performance issue. > > You mentioned that you are running vxfs, so keep in mind that function > names beginning with vx are often within the VxFS/VxVM drivers and > won''t appear in the Solaris source code. > > HTH, > David > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/dtrace-discuss/attachments/20090123/3d14a4bb/attachment.html>
David Lutz
2009-Jan-23 14:03 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
Qihua Wu <dtrace.wu at gmail.com> wrote:> > Thanks for your invaluable information, we are seeking the privilege > to run lockstat and dtrace from SA. > > Could you please also tell me where my following understanding of sys > time is wrong? > > pread is a syscall, after the user process called pread, the user process > will be put into sleep queue and sleep, suppose the pread will return > after 10ms, then this 10ms will be counted as sleep time. > > For the real work of the pread, a kernel thread will request the data from > disk to be read in, suppose the seek time and rotational delay is 7 ms > during which the kernel thread is just waiting, then the other 3ms (10ms > -7ms) will be used to transfer the data from disk to memory, and this > 3 ms will be counted as sys time. Then during the 10ms, 10ms will be > counted as sleep time, 3ms will be counted as sys time, and totally 13ms > will be counted within 10ms. Then the total percent time of sys, sleep, > user is larger than 100% (130%) in this 10ms. Although I am sure the > sys%+sleep+usr% should be 100% for any process, but not sure where > in my above statement is wrong. >I think you may have a common misconception systems calls. In general, making a system call does not spawn a new thread and does not put the caller to sleep. Two of the primary jobs of the system call layer are to provide a published interface to kernel routines and to provide gate keeping to prevent processes from accessing or modifying data that doesn''t belong to them. Once a thread passes the gate keeping checks, it has its privileges raised and begins executing functions in the kernel code just like functions in the user program or any linked in libraries. It is still the same thread, but it is now able to access and modify data directly in the kernel''s address space, and call any function in the kernel code. In your example, the user thread and the kernel thread are really the same thread. The %sys just refers to the time that the thread spent executing functions in the kernel code. If the pread system call ran for 10ms, and the thread was asleep for 7ms of that time, you would have 3ms of %sys time and no %usr time. David> > Thanks, > > On Thu, Jan 22, 2009 at 10:49 PM, David Lutz <David.Lutz at sun.com> wrote: > > > Qihua Wu <dtrace.wu at gmail.com> wrote: > > > > > > We have a database was very slow suddenly, when I used truss and I > find > > the > > > pread syscall from oracle was slow (130+ ms). But the iostat and > > vxstat(we > > > used veritas QIO) show nothing abnormal. From vmstat we could see > the sys > > > cpu is very high. > > > > > > The problem happened after we encountered an error from in.mpathd. > It > > hard > > > to believe it is in.mpathd that result in the problem. in.mpathd > is a > > single > > > thread process (prstat -L), so it shouldn''t use so much sys time. > But it > > is > > > very likely in.mapthd was the root cause as we encounter the similar > > issue > > > on another database server, the database wasn''t reponsive soon after > > > in.mpathd throw error. > > > > I think it is much more likely that the in.mpathd errors are a symptom > > of a problem rather than an indication that it is the cause. in.mpathd > > is very time sensitive, so if there is an issue that prevents it from > > getting > > cpu time, you will likely see errors like the ones you mentioned. > > > > I generally use lockstat to do kernel profiling when I see high %sys > > values. For example, "lockstat -kgIW -D 20 sleep 5" run as root or > with > > pfexec will help locate hot functions in the kernel or in a device driver. > > lockstat is available in Solaris 2.6 and above, and uses dtrace to > > collect data in Solaris 10 and above. For more info, see Example 4 > > "Basic Kernel Profiling" and Example 5 "Generated-load Profiling" in > > http://docs.sun.com/app/docs/doc/816-5166/lockstat-1m for a > > description of how this works. > > > > The "Caller" column in the lockstat output lists the names of functions > > that appear in kernel call stacks. You can use the OpenSolaris source > > browser at http://src.opensolaris.org/source/ to search for these as > > symbols in the source code, to try to determine what subsystem is > > hot. If you have a support contract, you could also search for the > > function name in the SunSolve collections at http://sunsolve.sun.com/ > > to see if there is a known performance issue. > > > > You mentioned that you are running vxfs, so keep in mind that function > > names beginning with vx are often within the VxFS/VxVM drivers and > > won''t appear in the Solaris source code. > > > > HTH, > > David > > > >
Steve Scargall
2009-Jan-24 09:07 UTC
[dtrace-discuss] high sys cpu time, any way to use dtrace to do troubleshooting?
> We have a database was very slow suddenly, when I used truss and I find the pread syscall from oracle > was slow (130+ ms). But the iostat and vxstat(we used veritas QIO) show nothing abnormal. From vmstat we could see the sys cpu is very high.There is a known issue with Oracle''s ASM layer that can give this sort of behaviour. If this is the problem I think it is you need to speak with Oracle Support and get a patch/update from them. It would help if you could describe your infrastructure: Q: What OS do you have (uname -a) Q: What systems are you using? Q: What version of Oracle do you have? Q: Do you use Oracle ASM? Q: Do you see high service times in the Oracle AWR reports but not at the physical level (iostat,vxstat, or at the storage)? Q: Do you see the problem when large numbers of users connect within a small time period? Hint: Write some D to look at what happens once pread() is called and time each subsequent function. Use the ''flowindent'' argument to DTrace to make things easier to see. Refine your D to look at the functions which are taking a long time (similar time to pread). Quantized graphs are particularly useful for the second itteration to show the distribution (in time) for each function of interest. If you are using ASM specifically look at the ksfdread(), ksfdshare(), & skgfqio() functions within the ASM Layer using the PID provider. Note you''ll need Solaris 10 update 4 or higher for the PID provider. You''ll also want to look at and time the open() system call using the SYSCALL provider. If you find open() is the thing taking the time then it''s the Oracle issue I''m thinking of. You should really open a support call with Sun & Oracle for this to get some dedicated resource. -- This message posted from opensolaris.org