thr3ads.net - dtrace discuss - [dtrace-discuss] Looking for help on 2 items... [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Michael Brian - IL

2009-Dec-17 23:51 UTC

[dtrace-discuss] Looking for help on 2 items...

I am pretty new to Dtrace but use the Dtrace Toolkit when trying to
troubleshoot I/O issues
On Oracle.

I am looking for help on how to do the following:
I am trying to answer whether adding more HBA Cards/ports
would be effective.

To do this, I need to know the i/o''s per second
As well as total bandwidth per second.

Has anyone done this before?

Does anyone have any other ideas on how to attack this problem?

I have been tuning Oracle for quite some time now, and I am continually
Asked to prove what I tend to know naturally, that the classic 1 HBA, 2
port card
Isn''t cutting it. 

I also have similar discussions on whether I am saturating the BUS on a
particular box.






 
Brian
 
----------------------
Brian P Michael
Technical Management Consultant
Rolta TUSC, Inc.
michaelb at tusc.com
630-960-2909 x1181
http://www.tusc.com
 
The information contained in this transmission is privileged and
confidential information intended for the use of the individual or
entity named above.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited.  If you have
received this transmission in error, do not read it.  Please immediately
reply to the sender that you have received this communication in error
and then delete it.  Thank you.

Jim Mauro

2009-Dec-18 00:06 UTC

head link

[dtrace-discuss] Looking for help on 2 items...

"iostat -Cx 1" is your friend.

The -C flag will provide a rollup per controller
(c1, c2, etc) so you can determine the IO rate
on a per-controller basis (IOPS and bandwidth).

I''d start there. DTrace rocks, but you should be able
to answer this question with iostat.

/jim


Michael Brian - IL wrote:> I am pretty new to Dtrace but use the Dtrace Toolkit when trying to
> troubleshoot I/O issues
> On Oracle.
>
> I am looking for help on how to do the following:
> I am trying to answer whether adding more HBA Cards/ports
> would be effective.
>
> To do this, I need to know the i/o''s per second
> As well as total bandwidth per second.
>
> Has anyone done this before?
>
> Does anyone have any other ideas on how to attack this problem?
>
> I have been tuning Oracle for quite some time now, and I am continually
> Asked to prove what I tend to know naturally, that the classic 1 HBA, 2
> port card
> Isn''t cutting it. 
>
> I also have similar discussions on whether I am saturating the BUS on a
> particular box.
>
>
>
>
>
>
>  
> Brian
>  
> ----------------------
> Brian P Michael
> Technical Management Consultant
> Rolta TUSC, Inc.
> michaelb at tusc.com
> 630-960-2909 x1181
> http://www.tusc.com
>  
> The information contained in this transmission is privileged and
> confidential information intended for the use of the individual or
> entity named above.  If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution
> or copying of this communication is strictly prohibited.  If you have
> received this transmission in error, do not read it.  Please immediately
> reply to the sender that you have received this communication in error
> and then delete it.  Thank you.
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Mike Gerdts

2009-Dec-18 00:26 UTC

head link

[dtrace-discuss] Looking for help on 2 items...

On Thu, Dec 17, 2009 at 5:51 PM, Michael Brian - IL <MICHAELB at tusc.com>
wrote:> I am pretty new to Dtrace but use the Dtrace Toolkit when trying to
> troubleshoot I/O issues
> On Oracle.
>
> I am looking for help on how to do the following:
> I am trying to answer whether adding more HBA Cards/ports
> would be effective.
>
> To do this, I need to know the i/o''s per second
> As well as total bandwidth per second.
>
> Has anyone done this before?
Sure - and dtrace isn''t needed

# iostat -xCn 1 | nawk ''$0 ~ /device|c.$/''
> Does anyone have any other ideas on how to attack this problem?
Your DBA''s should be able to tell give you more detailed data such as
how individual tables and files are performing.  Ask for an AWR report
(e.g. http://users.telenet.be/oraguy.be/awr2.htm).  It could be that
there is just a hot LUN due to multiple hot tables being on the same
file system.  You can see the relative performance of the disks with
iostat (get rid of the nawk at the end).

Adding more paths to storage can sometimes confuse the storage array,
causing it to be less efficient, thereby making your problem worse.
Depending on I/O patterns, striping, array type, etc., you could end
up making it so that the array no longer recognizes sequential reads
(thereby missing out on readahead) or you could end up with more
copies of the most active data in the array''s cache while slightly
less active data is evicted.  I''ve generally found that if you are at
the point of thinking you need more paths you are best off making any
LUN available only to two HBAs.

If you are using veritas file systems, you should be using odm.  If
you think you are using odm, verify that you really are by using
odmstat on some of the database files to be sure you don''t see just a
bunch of zeros.  If you are using UFS or NFS, be sure that you are
using directio.  If you are using ASM or raw disks you should be good
on this front.  Typically problems in this area will result in very
high %sys (vmstat) while not a lot of real work is getting done.
> I have been tuning Oracle for quite some time now, and I am continually
> Asked to prove what I tend to know naturally, that the classic 1 HBA, 2
> port card
> Isn''t cutting it.
>
> I also have similar discussions on whether I am saturating the BUS on a
> particular box.
What kind of bus?  What speed are the HBA''s?  If you are on a x8 PCIe
connected to a dual port 2 Gb HBA, you are going to max out the HBA''s
while only at no more than 25% of the PCIe bandwidth.  On the other
hand, a dual 4 Gb HBA in a PCI or PCI-X slot could certainly be
problematic.  You may want to take a look at busstat if you feel like
you are likely overwhelming a bus.

Also, if prstat -mL is showing that you have queries where usr + sys
add up to 100 for extended periods of time, you may have problems with
queries not having fast enough CPU''s.  If you see the lat (time
waiting to get on a cpu) more than a few percent, you could probably
benefit from having more CPU or faster CPU''s.

Hmmm... I forgot to mention dtrace...

One thing I have found with the dtrace toolkit scripts on multi
terabyte OLTP databases is that the default values in dtrace and/or
the scripts are not large enough to store the amount of data required.
 As such there are more drops than data points, severely limiting the
usefulness of the scripts unless you tune them.

I love dtrace, but to date I haven''t found it to be more useful for
database analysis than a lot of the tools that have existed in Solaris
for a decade or so.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/

Michael Brian - IL

2009-Dec-18 00:38 UTC

head link

[dtrace-discuss] Looking for help on 2 items...

Mike,

Thanks for the reply.

Well, the problem I am actually dealing can not be fought we awr, since
I am actually
Dealing with problems on an oracle standby server (a sun T2000), with 6
standbys running on it.
The problem I am faced with is that 4 of 6 standbys apply logs at
reasonable rates and the 5 or 6th standby''s
Apply goes from a quick 3 min apply, to upwards of 60minutes per log.
Now, it is possible that the transactions
In the redolog are so repetive (deletes to the same blocks), etc that
the apply rate is shriekingly bad.

We are using UFS with forcedirectio on the mount point option, probably
not enough ram to do what we want, and a T2000 with a very weird
multi-threading and virtual cpu model.

So, the question I posed, I guess really wasn''t as simple as it
sounded.
I am doing a multi-front approach:  1) calculating total box i/o''s per
second 2) total bandwidth per second (include log file transfers, etc).
3) drill down to the largest consumers (mainly the ora_p00) parallel
recovery processes and the MRP processes, 4) calculating their overall
combined throughput on the entire box (across all 6 standbys).
5) Track processor switches between the processes (since I fear that the
T2000 is actually starving certain processes, while keeping certain
processes on a give CPU ) (lots of reading about T2000 shows this is
definitely possible).

And others.

This problem isn''t appearing as simple as I would like and I guess I am
trying to use Dtrace to solve all my problems (yep, looking for that
silver bullet).

Thanks for the help.

Brian

----------------------
Brian P Michael
Technical Management Consultant
Rolta TUSC, Inc.
michaelb at tusc.com
630-960-2909 x1181
http://www.tusc.com

The information contained in this transmission is privileged and
confidential information intended for the use of the individual or
entity named above.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited.  If you have
received this transmission in error, do not read it.  Please immediately
reply to the sender that you have received this communication in error
and then delete it.  Thank you.

-----Original Message-----
From: Mike Gerdts [mailto:mgerdts at gmail.com] 
Sent: Thursday, December 17, 2009 6:27 PM
To: Michael Brian - IL
Cc: dtrace-discuss at opensolaris.org
Subject: Re: [dtrace-discuss] Looking for help on 2 items...

On Thu, Dec 17, 2009 at 5:51 PM, Michael Brian - IL <MICHAELB at tusc.com>
wrote:> I am pretty new to Dtrace but use the Dtrace Toolkit when trying to 
> troubleshoot I/O issues On Oracle.
>
> I am looking for help on how to do the following:
> I am trying to answer whether adding more HBA Cards/ports would be 
> effective.
>
> To do this, I need to know the i/o''s per second As well as total 
> bandwidth per second.
>
> Has anyone done this before?
Sure - and dtrace isn''t needed

# iostat -xCn 1 | nawk ''$0 ~ /device|c.$/''
> Does anyone have any other ideas on how to attack this problem?
Your DBA''s should be able to tell give you more detailed data such as
how individual tables and files are performing.  Ask for an AWR report
(e.g. http://users.telenet.be/oraguy.be/awr2.htm).  It could be that
there is just a hot LUN due to multiple hot tables being on the same
file system.  You can see the relative performance of the disks with
iostat (get rid of the nawk at the end).

Adding more paths to storage can sometimes confuse the storage array,
causing it to be less efficient, thereby making your problem worse.
Depending on I/O patterns, striping, array type, etc., you could end up
making it so that the array no longer recognizes sequential reads
(thereby missing out on readahead) or you could end up with more copies
of the most active data in the array''s cache while slightly less active
data is evicted.  I''ve generally found that if you are at the point of
thinking you need more paths you are best off making any LUN available
only to two HBAs.

If you are using veritas file systems, you should be using odm.  If you
think you are using odm, verify that you really are by using odmstat on
some of the database files to be sure you don''t see just a bunch of
zeros.  If you are using UFS or NFS, be sure that you are using
directio.  If you are using ASM or raw disks you should be good on this
front.  Typically problems in this area will result in very high %sys
(vmstat) while not a lot of real work is getting done.
> I have been tuning Oracle for quite some time now, and I am 
> continually Asked to prove what I tend to know naturally, that the 
> classic 1 HBA, 2 port card Isn''t cutting it.
>
> I also have similar discussions on whether I am saturating the BUS on 
> a particular box.
What kind of bus?  What speed are the HBA''s?  If you are on a x8 PCIe
connected to a dual port 2 Gb HBA, you are going to max out the HBA''s
while only at no more than 25% of the PCIe bandwidth.  On the other
hand, a dual 4 Gb HBA in a PCI or PCI-X slot could certainly be
problematic.  You may want to take a look at busstat if you feel like
you are likely overwhelming a bus.

Also, if prstat -mL is showing that you have queries where usr + sys add
up to 100 for extended periods of time, you may have problems with
queries not having fast enough CPU''s.  If you see the lat (time waiting
to get on a cpu) more than a few percent, you could probably benefit
from having more CPU or faster CPU''s.

Hmmm... I forgot to mention dtrace...

One thing I have found with the dtrace toolkit scripts on multi terabyte
OLTP databases is that the default values in dtrace and/or the scripts
are not large enough to store the amount of data required.
 As such there are more drops than data points, severely limiting the
usefulness of the scripts unless you tune them.

I love dtrace, but to date I haven''t found it to be more useful for
database analysis than a lot of the tools that have existed in Solaris
for a decade or so.

--
Mike Gerdts
http://mgerdts.blogspot.com/

Reasonably Related Threads

Search for more reasonably related threads

dtrace discuss - Dec 2009 - Looking for help on 2 items...

[dtrace-discuss] Looking for help on 2 items...

[dtrace-discuss] Looking for help on 2 items...

[dtrace-discuss] Looking for help on 2 items...

[dtrace-discuss] Looking for help on 2 items...

Reasonably Related Threads