Foster, Gary
2006-Jul-25 23:49 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
Anyone done any oracle benchmarking using Dtrace? I''ve been trying to delve into figuring out why our oracle processes are consuming such utterly huge amounts of system resources (other than the obvious "because it''s Oracle..." answer) and have been mired down in the ridiculously huge amount of probes that exist in the oracle namespace. Any help (RTFM''s are welcome) or pointers on where to go to start whacking at it with a machete would be gratefully appreciated. I''d ideally like to start out just determining and aggregatiung which internal Oracle functions are being hit heaviest (cpu and wall time) but when I try to adapt the same approach I use for monitoring heavy syscalls I get mired down due to the roughly 30,000 probes that I''m somehow enabling :) Anyone who has any ideas and can get me ontrack might find a couple bottles of award-winning homebrew beer winging their way towards them in the post in the near future! -- Gary F. Sent using the Microsoft Entourage 2004 for Mac Test Drive.
James Dickens
2006-Jul-26 00:38 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
On 7/25/06, Foster, Gary <gfoster at ebay.com> wrote:> Anyone done any oracle benchmarking using Dtrace? I''ve been trying to delve > into figuring out why our oracle processes are consuming such utterly huge > amounts of system resources (other than the obvious "because it''s Oracle..." > answer) and have been mired down in the ridiculously huge amount of probes > that exist in the oracle namespace. > > Any help (RTFM''s are welcome) or pointers on where to go to start whacking > at it with a machete would be gratefully appreciated. I''d ideally like to > start out just determining and aggregatiung which internal Oracle functions > are being hit heaviest (cpu and wall time) but when I try to adapt the same > approach I use for monitoring heavy syscalls I get mired down due to the > roughly 30,000 probes that I''m somehow enabling :) >you may want to start out simple, with just putting probes syscalls, first for count, and then time them to see where oracle is spending the most time, if you find any major problems you can print out user stacks, and/or see how the syscalls are being called. Brendan G has an example where he use dtrace to find out where smc was using its time. http://users.tpg.com.au/adsln4yb/DTrace/smc.html once you find offenders you can figure a way to make oracle behave better in those cases. probing all of oracle''s function alone probabably isn''t going to do much good since you don''t have the source code and you will be just poking around in the dark. James Dickens uadmin.blogspot.com> Anyone who has any ideas and can get me ontrack might find a couple bottles > of award-winning homebrew beer winging their way towards them in the post in > the near future! > > -- Gary F. > > Sent using the Microsoft Entourage 2004 for Mac Test Drive. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
On Tue, 25 Jul 2006, Foster, Gary wrote:> Anyone done any oracle benchmarking using Dtrace? I''ve been trying to delve > into figuring out why our oracle processes are consuming such utterly huge > amounts of system resources (other than the obvious "because it''s Oracle..." > answer) and have been mired down in the ridiculously huge amount of probes > that exist in the oracle namespace. > > Any help (RTFM''s are welcome) or pointers on where to go to start whacking > at it with a machete would be gratefully appreciated. I''d ideally like to > start out just determining and aggregatiung which internal Oracle functions > are being hit heaviest (cpu and wall time) but when I try to adapt the same > approach I use for monitoring heavy syscalls I get mired down due to the > roughly 30,000 probes that I''m somehow enabling :) > > Anyone who has any ideas and can get me ontrack might find a couple bottles > of award-winning homebrew beer winging their way towards them in the post in > the near future!Hi Gary, As James mentioned, the DTraceToolkit is super useful for drilling down on what the system is doing. If you are interested in seeing how that translates to Oracle, you will probably want to start with statspack (in Oracle 10G, they renamed statpack to Automated Workload Repository). Statspack / AWR will give you metrics on everything from latch contention to index usage to SQL statement execution time. Depending on the statspack / AWR report output, you will probably need to consult one or more dynamic performance views to find the source of the problem. These are documented in the Oracle performance and tuning manual. Hope this helps, - Ryan -- UNIX Administrator http://prefetch.net
przemolicc at poczta.fm
2006-Jul-27 07:02 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
On Tue, Jul 25, 2006 at 04:49:38PM -0700, Foster, Gary wrote:> Anyone done any oracle benchmarking using Dtrace? I''ve been trying to delve > into figuring out why our oracle processes are consuming such utterly huge > amounts of system resources (other than the obvious "because it''s Oracle..." > answer) and have been mired down in the ridiculously huge amount of probes > that exist in the oracle namespace. > > Any help (RTFM''s are welcome) or pointers on where to go to start whacking > at it with a machete would be gratefully appreciated. I''d ideally like to > start out just determining and aggregatiung which internal Oracle functions > are being hit heaviest (cpu and wall time) but when I try to adapt the same > approach I use for monitoring heavy syscalls I get mired down due to the > roughly 30,000 probes that I''m somehow enabling :) > > Anyone who has any ideas and can get me ontrack might find a couple bottles > of award-winning homebrew beer winging their way towards them in the post in > the near future!Gary, I can see two steps: at the very begin just use all *stat: vmstat, mpstat, prstat, iostat(/vxstat ?), ... Try to find the bottleneck area: storage, cpu, memory, net (you mentioned "resources" but which one) ? The tools are really useful if you want to have general view of how is you server working. Then, based on the observation, use dtrace. What are your *stat tools are saying ? przemol
Tim Cook
2006-Jul-27 10:17 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
Gary, I''ll give you 2 other suggestions: - StatsPack was designed by Oracle to extract & report on Oracle''s own instrumentation and provide plenty of useful information (as long as you know where to look). The Top 5 wait events often reveals problems that are not obvious from looking at what Solaris can tell you. - You may want to try plockstat on a DB shadow or two, or maybe even one or more DB writers or log writers. I have tried this myself but did not get anything interesting. I would be interested if anyone has... Regards, Tim przemolicc at poczta.fm wrote:> On Tue, Jul 25, 2006 at 04:49:38PM -0700, Foster, Gary wrote: > >> Anyone done any oracle benchmarking using Dtrace? I''ve been trying to delve >> into figuring out why our oracle processes are consuming such utterly huge >> amounts of system resources (other than the obvious "because it''s Oracle..." >> answer) and have been mired down in the ridiculously huge amount of probes >> that exist in the oracle namespace. >> >> Any help (RTFM''s are welcome) or pointers on where to go to start whacking >> at it with a machete would be gratefully appreciated. I''d ideally like to >> start out just determining and aggregatiung which internal Oracle functions >> are being hit heaviest (cpu and wall time) but when I try to adapt the same >> approach I use for monitoring heavy syscalls I get mired down due to the >> roughly 30,000 probes that I''m somehow enabling :) >> >> Anyone who has any ideas and can get me ontrack might find a couple bottles >> of award-winning homebrew beer winging their way towards them in the post in >> the near future! >> > > Gary, > > I can see two steps: > at the very begin just use all *stat: vmstat, mpstat, prstat, iostat(/vxstat ?), ... > Try to find the bottleneck area: storage, cpu, memory, net (you mentioned "resources" but > which one) ? The tools are really useful if you want to have general view of how is > you server working. Then, based on the observation, use dtrace. > What are your *stat tools are saying ? > > przemol > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >-- Tim Cook Performance, Availability and Architecture Engineering <> Sun Microsystems
Foster, Gary
2006-Jul-27 17:39 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
On 7/27/06 3:17 AM, "Tim Cook" <Tim.Cook at Sun.COM> wrote:> Gary, > > I''ll give you 2 other suggestions: > > - StatsPack was designed by Oracle to extract & report on Oracle''s own > instrumentation and provide plenty of useful information (as long as you > know where to look). The Top 5 wait events often reveals problems that > are not obvious from looking at what Solaris can tell you. > > - You may want to try plockstat on a DB shadow or two, or maybe even one > or more DB writers or log writers. I have tried this myself but did not > get anything interesting. I would be interested if anyone has...Thanks for all the feedback and suggestions guys, both on and off list. I guess I should''ve known to be more forthcoming with information :) Ok, here''s the deal. I''ve been handed an oracle server that is dying hard when it''s loaded heavily. I can''t get any Dtrace instrumentation to work when the machine is under load because the load is so heavy the machine becomes completely unresponsive. The problems have so far been partially tracked (by another troubleshooting team) to the Oracle dispatcher which ends up pegging the CPU high. Load average shoots through the roof and the machine becomes completely unresponsive. "We" suspect that it''s a problem somewhere deep in the networking side of things. I''ve managed to get some iostat info and don''t see any sort of disk bottlenecks (we''re using locally attached storage), and about the only real dtrace information I''ve been able to get is syscall usage which matches pretty closely what I''ve seen by searching this list archives and noting what other people have posted about their oracle processes... Nothing suspicious. I don''t know why we suspect the networking code, I''ve just been handed it as a fait accompli and haven''t been given data to back it up so I remain skeptical but am doing as I am told heh. I''ve been given basically "Gary, it''s broken, we think it''s the dispatcher, we think it has something to do with networking, go figure it out" and that''s about the sum total of what anyone has told me. I''m one of the few Dtrace users here at the moment and I''m still at the "flailing about" stage with the tools at best. I leverage the Dtrace toolkit heavily when I need to and modify it occasionally for special cases. I''ve been asked to use Dtrace to try and see if I can hack down some of the brush and figure out where we need to focus in on. Initially this was "Hey Gary, find out what Oracle function calls are being hammered" but I managed to balk hard enough that the request has turned into something a little more reasonable or possible for me to accomplish. At this point in time I think we''re going to dial way back and just try to run a light load generator on the oracle server and I''ll try to use Dtrace to dig down into the networking side of things to see if I can figure out if something in there is hoovering up these huge amounts of CPU. -- Gary F. Sent using the Microsoft Entourage 2004 for Mac Test Drive.
Alexander Kolbasov
2006-Jul-27 23:07 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
Gary,> Ok, here''s the deal. I''ve been handed an oracle server that is dying hard > when it''s loaded heavily. I can''t get any Dtrace instrumentation to work > when the machine is under load because the load is so heavy the machine > becomes completely unresponsive. The problems have so far been partially > tracked (by another troubleshooting team) to the Oracle dispatcher which > ends up pegging the CPU high. Load average shoots through the roof and the > machine becomes completely unresponsive.First of all it would be helpful if you describe the kind of system you are using. What is the load distribution between system time and user time? Are there any excessive interrupts or context switches? Does anything stand out from various standard monitoring tools? - Alexander Kolbasov
Foster, Gary
2006-Jul-28 00:36 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
On 7/27/06 4:07 PM, "Alexander Kolbasov" <akolb at eng.sun.com> wrote:> First of all it would be helpful if you describe the kind of system you are > using.Fair enough... It''s a Niagara 8 core (4 concurrent threads per core) with 32 gig of system RAM. It presents as a 32 CPU system. It''s a bit hairy to work on. -- Gary F. Sent using the Microsoft Entourage 2004 for Mac Test Drive.
Wee Yeh Tan
2006-Jul-28 07:52 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
On 7/28/06, Foster, Gary <gfoster at ebay.com> wrote:> I''m one of the few Dtrace users here at the moment and I''m still at the > "flailing about" stage with the tools at best. I leverage the Dtrace > toolkit heavily when I need to and modify it occasionally for special cases. > I''ve been asked to use Dtrace to try and see if I can hack down some of the > brush and figure out where we need to focus in on. Initially this was "Hey > Gary, find out what Oracle function calls are being hammered" but I managed > to balk hard enough that the request has turned into something a little more > reasonable or possible for me to accomplish.Gary, Try this: dtrace -n ''profile-1001/uid==<oracle_user> && arg1/{@[ustack(1), pid] = count()}'' -n ''tick-10s{trunc(@, 20); printa(@)}'' This takes samples of each CPU at 1001Hz. Hopefully some function calls jump out and you will have something to narrow down on. Otherwise, you can play around with how much you trunc() or the frequency or the reporting interval. -- Just me, Wire ...
Stefan Parvu
2006-Jul-28 08:14 UTC
[dtrace-discuss] Re: Oracle performance monitoring using Dtrace
Hey, Are you trying to observe and undertsand how Oracle is functioning using DTrace ? Or are you more interested to identify certain things your DB instance is doing ? It is a big topic and might be that you have to restrict the scope of your search. I''m currently writting some material which will be published under http://www.nbl.fi/~nbl97/solaris/dtrace/index.html which will address the topic you mentioned, under "Database Supplement". The main idea is to show how can you use DTT (DTraceToolkit) and identify for instance what type of IO Oracle is doing. Or discover if you have 5 DB instances under same machine which one is doing the most IO or similar things... If you are more interested in finding out why Oracle is spending a lot of CPU time then maybe you are better with Oracle''s performance tools: the Statspack. You can start by defining what are you after: - identifying a certain IO pattern: random, sequential - observing Oracle''s components activity: the listener, pmon, smon etc - drill down and analyse one DB instance: network activity, IO, cpu usage DTT in future will have as well some ready scripts which can be used to observe databases. These things take some time to develop and test. Cheers, Stefan This message posted from opensolaris.org
James Foronda
2006-Aug-17 17:16 UTC
[dtrace-discuss] Oracle performance monitoring using Dtrace
Gary, Please see comments inline... Foster, Gary wrote:> On 7/27/06 3:17 AM, "Tim Cook" <Tim.Cook at Sun.COM> wrote: > > >> Gary, >> >> I''ll give you 2 other suggestions: >> >> - StatsPack was designed by Oracle to extract & report on Oracle''s own >> instrumentation and provide plenty of useful information (as long as you >> know where to look). The Top 5 wait events often reveals problems that >> are not obvious from looking at what Solaris can tell you. >>I know that you are asking about DTrace but the best method that I have used to diagnose Oracle performance problems is to follow method-R as described by Cary Millsap and Jeff Holt of Hotsos in their book "Optimizing Oracle Performance". I''ve done several Oracle performance improvement projects and I''ve had 100% success rate (yes, 100%) using this method. I''ve also encountered situations where the problem was not Oracle (it was non-database portion of the app), and as long as you know how to interpret the data, method-R will tell you when Oracle is not the problem. The only problem with method-R is that you need a tool to summarize the Oracle trace file that is generated. Hotsos sells their own profiler that does this very well. There used to be a free profiler called orasrp and I used it but it had some problems processing some of the files I gave it. I also noticed that orasrp have different numbers (compared with hotsos profiler). Some differences are so big such that they can not be attributed to rounding errors. If your problem can be reproduced at will, then I think that method-R can pinpoint the root cause. If you have the budget, I highly recommend the Hotsos profiler (I''ve worked with them on one project but I''m not affiliated with them in any way :) ).>> - You may want to try plockstat on a DB shadow or two, or maybe even one >> or more DB writers or log writers. I have tried this myself but did not >> get anything interesting. I would be interested if anyone has... >> > > Thanks for all the feedback and suggestions guys, both on and off list. I > guess I should''ve known to be more forthcoming with information :) > > Ok, here''s the deal. I''ve been handed an oracle server that is dying hard > when it''s loaded heavily. I can''t get any Dtrace instrumentation to work > when the machine is under load because the load is so heavy the machine > becomes completely unresponsive. The problems have so far been partially > tracked (by another troubleshooting team) to the Oracle dispatcher which > ends up pegging the CPU high. Load average shoots through the roof and the > machine becomes completely unresponsive. > > "We" suspect that it''s a problem somewhere deep in the networking side of > things. I''ve managed to get some iostat info and don''t see any sort of disk > bottlenecks (we''re using locally attached storage), and about the only real > dtrace information I''ve been able to get is syscall usage which matches > pretty closely what I''ve seen by searching this list archives and noting > what other people have posted about their oracle processes... Nothing > suspicious. I don''t know why we suspect the networking code, I''ve just been > handed it as a fait accompli and haven''t been given data to back it up so I > remain skeptical but am doing as I am told heh. I''ve been given basically > "Gary, it''s broken, we think it''s the dispatcher, we think it has something > to do with networking, go figure it out" and that''s about the sum total of > what anyone has told me. > > I''m one of the few Dtrace users here at the moment and I''m still at the > "flailing about" stage with the tools at best. I leverage the Dtrace > toolkit heavily when I need to and modify it occasionally for special cases. > I''ve been asked to use Dtrace to try and see if I can hack down some of the > brush and figure out where we need to focus in on. Initially this was "Hey > Gary, find out what Oracle function calls are being hammered" but I managed > to balk hard enough that the request has turned into something a little more > reasonable or possible for me to accomplish. > > At this point in time I think we''re going to dial way back and just try to > run a light load generator on the oracle server and I''ll try to use Dtrace > to dig down into the networking side of things to see if I can figure out if > something in there is hoovering up these huge amounts of CPU. > > -- Gary F. > Sent using the Microsoft Entourage 2004 for Mac Test Drive. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >