thr3ads.net - dtrace discuss - [dtrace-discuss] Oracle performance monitoring using Dtrace [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Foster, Gary

2006-Jul-25 23:49 UTC

[dtrace-discuss] Oracle performance monitoring using Dtrace

Anyone done any oracle benchmarking using Dtrace?  I''ve been trying to
delve
into figuring out why our oracle processes are consuming such utterly huge
amounts of system resources (other than the obvious "because it''s
Oracle..."
answer) and have been mired down in the ridiculously huge amount of probes
that exist in the oracle namespace.

Any help (RTFM''s are welcome) or pointers on where to go to start
whacking
at it with a machete would be gratefully appreciated.  I''d ideally like
to
start out just determining and aggregatiung which internal Oracle functions
are being hit heaviest (cpu and wall time) but when I try to adapt the same
approach I use for monitoring heavy syscalls I get mired down due to the
roughly 30,000 probes that I''m somehow enabling :)

Anyone who has any ideas and can get me ontrack might find a couple bottles
of award-winning homebrew beer winging their way towards them in the post in
the near future!

-- Gary F.

Sent using the Microsoft Entourage 2004 for Mac Test Drive.

James Dickens

2006-Jul-26 00:38 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

On 7/25/06, Foster, Gary <gfoster at ebay.com>
wrote:> Anyone done any oracle benchmarking using Dtrace?  I''ve been
trying to delve
> into figuring out why our oracle processes are consuming such utterly huge
> amounts of system resources (other than the obvious "because
it''s Oracle..."
> answer) and have been mired down in the ridiculously huge amount of probes
> that exist in the oracle namespace.
>
> Any help (RTFM''s are welcome) or pointers on where to go to start
whacking
> at it with a machete would be gratefully appreciated.  I''d ideally
like to
> start out just determining and aggregatiung which internal Oracle functions
> are being hit heaviest (cpu and wall time) but when I try to adapt the same
> approach I use for monitoring heavy syscalls I get mired down due to the
> roughly 30,000 probes that I''m somehow enabling :)
>you may want to start out simple, with just putting probes syscalls,
first for count, and then time them to see where oracle is spending
the most time, if you find any major problems you can print out  user
stacks, and/or see how the syscalls are being called. Brendan G has an
example where he use dtrace to find out where smc was using its time.
http://users.tpg.com.au/adsln4yb/DTrace/smc.html once you find
offenders you can figure a way to make  oracle behave better in those
cases.


probing all of oracle''s function alone probabably isn''t going
to do
much good since you don''t have the source code and you will be just
poking around in the dark.

James Dickens
uadmin.blogspot.com

> Anyone who has any ideas and can get me ontrack might find a couple bottles
> of award-winning homebrew beer winging their way towards them in the post
in
> the near future!
>
> -- Gary F.
>
> Sent using the Microsoft Entourage 2004 for Mac Test Drive.
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Matty

2006-Jul-26 02:13 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

On Tue, 25 Jul 2006, Foster, Gary wrote:
> Anyone done any oracle benchmarking using Dtrace?  I''ve been
trying to delve
> into figuring out why our oracle processes are consuming such utterly huge
> amounts of system resources (other than the obvious "because
it''s Oracle..."
> answer) and have been mired down in the ridiculously huge amount of probes
> that exist in the oracle namespace.
>
> Any help (RTFM''s are welcome) or pointers on where to go to start
whacking
> at it with a machete would be gratefully appreciated.  I''d ideally
like to
> start out just determining and aggregatiung which internal Oracle functions
> are being hit heaviest (cpu and wall time) but when I try to adapt the same
> approach I use for monitoring heavy syscalls I get mired down due to the
> roughly 30,000 probes that I''m somehow enabling :)
>
> Anyone who has any ideas and can get me ontrack might find a couple bottles
> of award-winning homebrew beer winging their way towards them in the post
in
> the near future!
Hi Gary,

As James mentioned, the DTraceToolkit is super useful for drilling down on 
what the system is doing. If you are interested in seeing how that 
translates to Oracle, you will probably want to start with statspack (in 
Oracle 10G, they renamed statpack to Automated Workload Repository). 
Statspack / AWR will give you metrics on everything from latch contention 
to index usage to SQL statement execution time. Depending on the 
statspack / AWR report output, you will probably need to consult one or 
more dynamic performance views to find the source of the problem. These 
are documented in the Oracle performance and tuning manual.

Hope this helps,
- Ryan
--
UNIX Administrator
http://prefetch.net

przemolicc at poczta.fm

2006-Jul-27 07:02 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

On Tue, Jul 25, 2006 at 04:49:38PM -0700, Foster, Gary
wrote:> Anyone done any oracle benchmarking using Dtrace?  I''ve been
trying to delve
> into figuring out why our oracle processes are consuming such utterly huge
> amounts of system resources (other than the obvious "because
it''s Oracle..."
> answer) and have been mired down in the ridiculously huge amount of probes
> that exist in the oracle namespace.
> 
> Any help (RTFM''s are welcome) or pointers on where to go to start
whacking
> at it with a machete would be gratefully appreciated.  I''d ideally
like to
> start out just determining and aggregatiung which internal Oracle functions
> are being hit heaviest (cpu and wall time) but when I try to adapt the same
> approach I use for monitoring heavy syscalls I get mired down due to the
> roughly 30,000 probes that I''m somehow enabling :)
> 
> Anyone who has any ideas and can get me ontrack might find a couple bottles
> of award-winning homebrew beer winging their way towards them in the post
in
> the near future!
Gary,

I can see two steps:
at the very begin just use all *stat: vmstat, mpstat, prstat, iostat(/vxstat ?),
...
Try to find the bottleneck area: storage, cpu, memory, net (you mentioned
"resources" but
which one) ? The tools are really useful if you want to have general view of how
is
you server working.  Then, based on the observation, use dtrace.
What are your *stat tools are saying ?

przemol

Tim Cook

2006-Jul-27 10:17 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

Gary,

I''ll give you 2 other suggestions:

- StatsPack was designed by Oracle to extract & report on Oracle''s
own
instrumentation and provide plenty of useful information (as long as you 
know where to look).  The Top 5 wait events often reveals problems that 
are not obvious from looking at what Solaris can tell you.

- You may want to try plockstat on a DB shadow or two, or maybe even one 
or more DB writers or log writers.  I have tried this myself but did not 
get anything interesting.  I would be interested if anyone has...

Regards,
Tim

przemolicc at poczta.fm wrote:> On Tue, Jul 25, 2006 at 04:49:38PM -0700, Foster, Gary wrote:
>   
>> Anyone done any oracle benchmarking using Dtrace?  I''ve been
trying to delve
>> into figuring out why our oracle processes are consuming such utterly
huge
>> amounts of system resources (other than the obvious "because
it''s Oracle..."
>> answer) and have been mired down in the ridiculously huge amount of
probes
>> that exist in the oracle namespace.
>>
>> Any help (RTFM''s are welcome) or pointers on where to go to
start whacking
>> at it with a machete would be gratefully appreciated.  I''d
ideally like to
>> start out just determining and aggregatiung which internal Oracle
functions
>> are being hit heaviest (cpu and wall time) but when I try to adapt the
same
>> approach I use for monitoring heavy syscalls I get mired down due to
the
>> roughly 30,000 probes that I''m somehow enabling :)
>>
>> Anyone who has any ideas and can get me ontrack might find a couple
bottles
>> of award-winning homebrew beer winging their way towards them in the
post in
>> the near future!
>>     
>
> Gary,
>
> I can see two steps:
> at the very begin just use all *stat: vmstat, mpstat, prstat,
iostat(/vxstat ?), ...
> Try to find the bottleneck area: storage, cpu, memory, net (you mentioned
"resources" but
> which one) ? The tools are really useful if you want to have general view
of how is
> you server working.  Then, based on the observation, use dtrace.
> What are your *stat tools are saying ?
>
> przemol
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>   
-- 
Tim Cook
Performance, Availability and Architecture Engineering
<> Sun Microsystems

Foster, Gary

2006-Jul-27 17:39 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

On 7/27/06 3:17 AM, "Tim Cook" <Tim.Cook at Sun.COM> wrote:
> Gary,
> 
> I''ll give you 2 other suggestions:
> 
> - StatsPack was designed by Oracle to extract & report on
Oracle''s own
> instrumentation and provide plenty of useful information (as long as you
> know where to look).  The Top 5 wait events often reveals problems that
> are not obvious from looking at what Solaris can tell you.
> 
> - You may want to try plockstat on a DB shadow or two, or maybe even one
> or more DB writers or log writers.  I have tried this myself but did not
> get anything interesting.  I would be interested if anyone has...
Thanks for all the feedback and suggestions guys, both on and off list.  I
guess I should''ve known to be more forthcoming with information :)

Ok, here''s the deal.  I''ve been handed an oracle server that
is dying hard
when it''s loaded heavily.  I can''t get any Dtrace
instrumentation to work
when the machine is under load because the load is so heavy the machine
becomes completely unresponsive.  The problems have so far been partially
tracked (by another troubleshooting team) to the Oracle dispatcher which
ends up pegging the CPU high.  Load average shoots through the roof and the
machine becomes completely unresponsive.

"We" suspect that it''s a problem somewhere deep in the
networking side of
things.  I''ve managed to get some iostat info and don''t see
any sort of disk
bottlenecks (we''re using locally attached storage), and about the only
real
dtrace information I''ve been able to get is syscall usage which matches
pretty closely what I''ve seen by searching this list archives and
noting
what other people have posted about their oracle processes... Nothing
suspicious.  I don''t know why we suspect the networking code,
I''ve just been
handed it as a fait accompli and haven''t been given data to back it up
so I
remain skeptical but am doing as I am told heh.  I''ve been given
basically
"Gary, it''s broken, we think it''s the dispatcher, we
think it has something
to do with networking, go figure it out" and that''s about the sum
total of
what anyone has told me.

I''m one of the few Dtrace users here at the moment and I''m
still at the
"flailing about" stage with the tools at best.  I leverage the Dtrace
toolkit heavily when I need to and modify it occasionally for special cases.
I''ve been asked to use Dtrace to try and see if I can hack down some of
the
brush and figure out where we need to focus in on.  Initially this was "Hey
Gary, find out what Oracle function calls are being hammered" but I managed
to balk hard enough that the request has turned into something a little more
reasonable or possible for me to accomplish.

At this point in time I think we''re going to dial way back and just try
to
run a light load generator on the oracle server and I''ll try to use
Dtrace
to dig down into the networking side of things to see if I can figure out if
something in there is hoovering up these huge amounts of CPU.

-- Gary F.
Sent using the Microsoft Entourage 2004 for Mac Test Drive.

Alexander Kolbasov

2006-Jul-27 23:07 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

Gary,
> Ok, here''s the deal.  I''ve been handed an oracle server
that is dying hard
> when it''s loaded heavily.  I can''t get any Dtrace
instrumentation to work
> when the machine is under load because the load is so heavy the machine
> becomes completely unresponsive.  The problems have so far been partially
> tracked (by another troubleshooting team) to the Oracle dispatcher which
> ends up pegging the CPU high.  Load average shoots through the roof and the
> machine becomes completely unresponsive.
First of all it would be helpful if you describe the kind of system you are
using. What is the load distribution between system time and user time? Are 
there any excessive interrupts or context switches? Does anything stand out 
from various standard monitoring tools?

- Alexander Kolbasov

Foster, Gary

2006-Jul-28 00:36 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

On 7/27/06 4:07 PM, "Alexander Kolbasov" <akolb at eng.sun.com>
wrote:
> First of all it would be helpful if you describe the kind of system you are
> using.
Fair enough... It''s a Niagara 8 core (4 concurrent threads per core)
with 32
gig of system RAM.  It presents as a 32 CPU system.  It''s a bit hairy
to
work on.

-- Gary F.
Sent using the Microsoft Entourage 2004 for Mac Test Drive.

Wee Yeh Tan

2006-Jul-28 07:52 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

On 7/28/06, Foster, Gary <gfoster at ebay.com>
wrote:> I''m one of the few Dtrace users here at the moment and
I''m still at the
> "flailing about" stage with the tools at best.  I leverage the
Dtrace
> toolkit heavily when I need to and modify it occasionally for special
cases.
> I''ve been asked to use Dtrace to try and see if I can hack down
some of the
> brush and figure out where we need to focus in on.  Initially this was
"Hey
> Gary, find out what Oracle function calls are being hammered" but I
managed
> to balk hard enough that the request has turned into something a little
more
> reasonable or possible for me to accomplish.
Gary,

Try this:
  dtrace -n ''profile-1001/uid==<oracle_user> &&
arg1/{@[ustack(1),
pid] = count()}'' -n ''tick-10s{trunc(@, 20);
printa(@)}''

This takes samples of each CPU at 1001Hz.  Hopefully some function
calls jump out and you will have something to narrow down on.

Otherwise, you can play around with how much you trunc() or the
frequency or the reporting interval.


-- 
Just me,
Wire ...

Stefan Parvu

2006-Jul-28 08:14 UTC

head link

[dtrace-discuss] Re: Oracle performance monitoring using Dtrace

Hey,

Are you trying to observe and undertsand how Oracle is functioning using DTrace
? Or are you more interested to identify certain things your DB instance is
doing ?

It is a big topic and might be that you have to restrict the scope of your
search. I''m currently writting some material which will be published
under http://www.nbl.fi/~nbl97/solaris/dtrace/index.html which will address the
topic you mentioned, under "Database Supplement".

The main idea is to show how can you use DTT (DTraceToolkit) and identify for
instance what type of IO Oracle is doing. Or discover if you have 5 DB instances
under same machine which one is doing the most IO or similar things...

If you are more interested in finding out why Oracle is spending a lot of CPU
time then maybe you are better with Oracle''s performance tools: the
Statspack.

You can start by defining what are you after:
 - identifying a certain IO pattern: random, sequential
 - observing Oracle''s components activity: the listener, pmon, smon etc
 - drill down and analyse one DB instance: network activity, IO, cpu usage

DTT in future will have as well some ready scripts which can be used to observe
databases. These things take some time to develop and test.

Cheers,
Stefan
 
 
This message posted from opensolaris.org

James Foronda

2006-Aug-17 17:16 UTC

head link

[dtrace-discuss] Oracle performance monitoring using Dtrace

Gary,

Please see comments inline...

Foster, Gary wrote:> On 7/27/06 3:17 AM, "Tim Cook" <Tim.Cook at Sun.COM> wrote:
>
>   
>> Gary,
>>
>> I''ll give you 2 other suggestions:
>>
>> - StatsPack was designed by Oracle to extract & report on
Oracle''s own
>> instrumentation and provide plenty of useful information (as long as
you
>> know where to look).  The Top 5 wait events often reveals problems that
>> are not obvious from looking at what Solaris can tell you.
>>     
I know that you are asking about DTrace but the best method that I have 
used to diagnose Oracle performance problems is to follow method-R as 
described by Cary Millsap and Jeff Holt of Hotsos in their book 
"Optimizing Oracle Performance".  I''ve done several Oracle
performance
improvement projects and I''ve had 100% success rate (yes, 100%) using 
this method.  I''ve also encountered situations where the problem was
not
Oracle (it was non-database portion of the app), and as long as you know 
how to interpret the data, method-R will  tell you when Oracle is not 
the problem.

The only problem with method-R is that you need a tool to summarize the 
Oracle trace file that is generated.  Hotsos sells their own profiler 
that does this very well.  There used to be a free profiler called 
orasrp and I used it but it had some problems processing some of the 
files I gave it.  I also noticed that orasrp have different numbers 
(compared with hotsos profiler).  Some differences are so big such that 
they can not be attributed to rounding errors.

If your problem can be reproduced at will, then I think that method-R 
can pinpoint the root cause.  If you have the budget, I highly recommend 
the Hotsos profiler (I''ve worked with them on one project but
I''m not
affiliated with them in any way :) ).
>> - You may want to try plockstat on a DB shadow or two, or maybe even
one
>> or more DB writers or log writers.  I have tried this myself but did
not
>> get anything interesting.  I would be interested if anyone has...
>>     
>
> Thanks for all the feedback and suggestions guys, both on and off list.  I
> guess I should''ve known to be more forthcoming with information :)
>
> Ok, here''s the deal.  I''ve been handed an oracle server
that is dying hard
> when it''s loaded heavily.  I can''t get any Dtrace
instrumentation to work
> when the machine is under load because the load is so heavy the machine
> becomes completely unresponsive.  The problems have so far been partially
> tracked (by another troubleshooting team) to the Oracle dispatcher which
> ends up pegging the CPU high.  Load average shoots through the roof and the
> machine becomes completely unresponsive.
>
> "We" suspect that it''s a problem somewhere deep in the
networking side of
> things.  I''ve managed to get some iostat info and don''t
see any sort of disk
> bottlenecks (we''re using locally attached storage), and about the
only real
> dtrace information I''ve been able to get is syscall usage which
matches
> pretty closely what I''ve seen by searching this list archives and
noting
> what other people have posted about their oracle processes... Nothing
> suspicious.  I don''t know why we suspect the networking code,
I''ve just been
> handed it as a fait accompli and haven''t been given data to back
it up so I
> remain skeptical but am doing as I am told heh.  I''ve been given
basically
> "Gary, it''s broken, we think it''s the dispatcher, we
think it has something
> to do with networking, go figure it out" and that''s about the
sum total of
> what anyone has told me.
>
> I''m one of the few Dtrace users here at the moment and
I''m still at the
> "flailing about" stage with the tools at best.  I leverage the
Dtrace
> toolkit heavily when I need to and modify it occasionally for special
cases.
> I''ve been asked to use Dtrace to try and see if I can hack down
some of the
> brush and figure out where we need to focus in on.  Initially this was
"Hey
> Gary, find out what Oracle function calls are being hammered" but I
managed
> to balk hard enough that the request has turned into something a little
more
> reasonable or possible for me to accomplish.
>
> At this point in time I think we''re going to dial way back and
just try to
> run a light load generator on the oracle server and I''ll try to
use Dtrace
> to dig down into the networking side of things to see if I can figure out
if
> something in there is hoovering up these huge amounts of CPU.
>
> -- Gary F.
> Sent using the Microsoft Entourage 2004 for Mac Test Drive.
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

dtrace discuss - Jul 2006 - Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace

[dtrace-discuss] Re: Oracle performance monitoring using Dtrace

[dtrace-discuss] Oracle performance monitoring using Dtrace