I am pretty new to Dtrace but use the Dtrace Toolkit when trying to troubleshoot I/O issues On Oracle. I am looking for help on how to do the following: I am trying to answer whether adding more HBA Cards/ports would be effective. To do this, I need to know the i/o''s per second As well as total bandwidth per second. Has anyone done this before? Does anyone have any other ideas on how to attack this problem? I have been tuning Oracle for quite some time now, and I am continually Asked to prove what I tend to know naturally, that the classic 1 HBA, 2 port card Isn''t cutting it. I also have similar discussions on whether I am saturating the BUS on a particular box. Brian ---------------------- Brian P Michael Technical Management Consultant Rolta TUSC, Inc. michaelb at tusc.com 630-960-2909 x1181 http://www.tusc.com The information contained in this transmission is privileged and confidential information intended for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, do not read it. Please immediately reply to the sender that you have received this communication in error and then delete it. Thank you.
"iostat -Cx 1" is your friend. The -C flag will provide a rollup per controller (c1, c2, etc) so you can determine the IO rate on a per-controller basis (IOPS and bandwidth). I''d start there. DTrace rocks, but you should be able to answer this question with iostat. /jim Michael Brian - IL wrote:> I am pretty new to Dtrace but use the Dtrace Toolkit when trying to > troubleshoot I/O issues > On Oracle. > > I am looking for help on how to do the following: > I am trying to answer whether adding more HBA Cards/ports > would be effective. > > To do this, I need to know the i/o''s per second > As well as total bandwidth per second. > > Has anyone done this before? > > Does anyone have any other ideas on how to attack this problem? > > I have been tuning Oracle for quite some time now, and I am continually > Asked to prove what I tend to know naturally, that the classic 1 HBA, 2 > port card > Isn''t cutting it. > > I also have similar discussions on whether I am saturating the BUS on a > particular box. > > > > > > > > Brian > > ---------------------- > Brian P Michael > Technical Management Consultant > Rolta TUSC, Inc. > michaelb at tusc.com > 630-960-2909 x1181 > http://www.tusc.com > > The information contained in this transmission is privileged and > confidential information intended for the use of the individual or > entity named above. If the reader of this message is not the intended > recipient, you are hereby notified that any dissemination, distribution > or copying of this communication is strictly prohibited. If you have > received this transmission in error, do not read it. Please immediately > reply to the sender that you have received this communication in error > and then delete it. Thank you. > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
On Thu, Dec 17, 2009 at 5:51 PM, Michael Brian - IL <MICHAELB at tusc.com> wrote:> I am pretty new to Dtrace but use the Dtrace Toolkit when trying to > troubleshoot I/O issues > On Oracle. > > I am looking for help on how to do the following: > I am trying to answer whether adding more HBA Cards/ports > would be effective. > > To do this, I need to know the i/o''s per second > As well as total bandwidth per second. > > Has anyone done this before?Sure - and dtrace isn''t needed # iostat -xCn 1 | nawk ''$0 ~ /device|c.$/''> Does anyone have any other ideas on how to attack this problem?Your DBA''s should be able to tell give you more detailed data such as how individual tables and files are performing. Ask for an AWR report (e.g. http://users.telenet.be/oraguy.be/awr2.htm). It could be that there is just a hot LUN due to multiple hot tables being on the same file system. You can see the relative performance of the disks with iostat (get rid of the nawk at the end). Adding more paths to storage can sometimes confuse the storage array, causing it to be less efficient, thereby making your problem worse. Depending on I/O patterns, striping, array type, etc., you could end up making it so that the array no longer recognizes sequential reads (thereby missing out on readahead) or you could end up with more copies of the most active data in the array''s cache while slightly less active data is evicted. I''ve generally found that if you are at the point of thinking you need more paths you are best off making any LUN available only to two HBAs. If you are using veritas file systems, you should be using odm. If you think you are using odm, verify that you really are by using odmstat on some of the database files to be sure you don''t see just a bunch of zeros. If you are using UFS or NFS, be sure that you are using directio. If you are using ASM or raw disks you should be good on this front. Typically problems in this area will result in very high %sys (vmstat) while not a lot of real work is getting done.> I have been tuning Oracle for quite some time now, and I am continually > Asked to prove what I tend to know naturally, that the classic 1 HBA, 2 > port card > Isn''t cutting it. > > I also have similar discussions on whether I am saturating the BUS on a > particular box.What kind of bus? What speed are the HBA''s? If you are on a x8 PCIe connected to a dual port 2 Gb HBA, you are going to max out the HBA''s while only at no more than 25% of the PCIe bandwidth. On the other hand, a dual 4 Gb HBA in a PCI or PCI-X slot could certainly be problematic. You may want to take a look at busstat if you feel like you are likely overwhelming a bus. Also, if prstat -mL is showing that you have queries where usr + sys add up to 100 for extended periods of time, you may have problems with queries not having fast enough CPU''s. If you see the lat (time waiting to get on a cpu) more than a few percent, you could probably benefit from having more CPU or faster CPU''s. Hmmm... I forgot to mention dtrace... One thing I have found with the dtrace toolkit scripts on multi terabyte OLTP databases is that the default values in dtrace and/or the scripts are not large enough to store the amount of data required. As such there are more drops than data points, severely limiting the usefulness of the scripts unless you tune them. I love dtrace, but to date I haven''t found it to be more useful for database analysis than a lot of the tools that have existed in Solaris for a decade or so. -- Mike Gerdts http://mgerdts.blogspot.com/
Mike, Thanks for the reply. Well, the problem I am actually dealing can not be fought we awr, since I am actually Dealing with problems on an oracle standby server (a sun T2000), with 6 standbys running on it. The problem I am faced with is that 4 of 6 standbys apply logs at reasonable rates and the 5 or 6th standby''s Apply goes from a quick 3 min apply, to upwards of 60minutes per log. Now, it is possible that the transactions In the redolog are so repetive (deletes to the same blocks), etc that the apply rate is shriekingly bad. We are using UFS with forcedirectio on the mount point option, probably not enough ram to do what we want, and a T2000 with a very weird multi-threading and virtual cpu model. So, the question I posed, I guess really wasn''t as simple as it sounded. I am doing a multi-front approach: 1) calculating total box i/o''s per second 2) total bandwidth per second (include log file transfers, etc). 3) drill down to the largest consumers (mainly the ora_p00) parallel recovery processes and the MRP processes, 4) calculating their overall combined throughput on the entire box (across all 6 standbys). 5) Track processor switches between the processes (since I fear that the T2000 is actually starving certain processes, while keeping certain processes on a give CPU ) (lots of reading about T2000 shows this is definitely possible). And others. This problem isn''t appearing as simple as I would like and I guess I am trying to use Dtrace to solve all my problems (yep, looking for that silver bullet). Thanks for the help. Brian ---------------------- Brian P Michael Technical Management Consultant Rolta TUSC, Inc. michaelb at tusc.com 630-960-2909 x1181 http://www.tusc.com The information contained in this transmission is privileged and confidential information intended for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, do not read it. Please immediately reply to the sender that you have received this communication in error and then delete it. Thank you. -----Original Message----- From: Mike Gerdts [mailto:mgerdts at gmail.com] Sent: Thursday, December 17, 2009 6:27 PM To: Michael Brian - IL Cc: dtrace-discuss at opensolaris.org Subject: Re: [dtrace-discuss] Looking for help on 2 items... On Thu, Dec 17, 2009 at 5:51 PM, Michael Brian - IL <MICHAELB at tusc.com> wrote:> I am pretty new to Dtrace but use the Dtrace Toolkit when trying to > troubleshoot I/O issues On Oracle. > > I am looking for help on how to do the following: > I am trying to answer whether adding more HBA Cards/ports would be > effective. > > To do this, I need to know the i/o''s per second As well as total > bandwidth per second. > > Has anyone done this before?Sure - and dtrace isn''t needed # iostat -xCn 1 | nawk ''$0 ~ /device|c.$/''> Does anyone have any other ideas on how to attack this problem?Your DBA''s should be able to tell give you more detailed data such as how individual tables and files are performing. Ask for an AWR report (e.g. http://users.telenet.be/oraguy.be/awr2.htm). It could be that there is just a hot LUN due to multiple hot tables being on the same file system. You can see the relative performance of the disks with iostat (get rid of the nawk at the end). Adding more paths to storage can sometimes confuse the storage array, causing it to be less efficient, thereby making your problem worse. Depending on I/O patterns, striping, array type, etc., you could end up making it so that the array no longer recognizes sequential reads (thereby missing out on readahead) or you could end up with more copies of the most active data in the array''s cache while slightly less active data is evicted. I''ve generally found that if you are at the point of thinking you need more paths you are best off making any LUN available only to two HBAs. If you are using veritas file systems, you should be using odm. If you think you are using odm, verify that you really are by using odmstat on some of the database files to be sure you don''t see just a bunch of zeros. If you are using UFS or NFS, be sure that you are using directio. If you are using ASM or raw disks you should be good on this front. Typically problems in this area will result in very high %sys (vmstat) while not a lot of real work is getting done.> I have been tuning Oracle for quite some time now, and I am > continually Asked to prove what I tend to know naturally, that the > classic 1 HBA, 2 port card Isn''t cutting it. > > I also have similar discussions on whether I am saturating the BUS on > a particular box.What kind of bus? What speed are the HBA''s? If you are on a x8 PCIe connected to a dual port 2 Gb HBA, you are going to max out the HBA''s while only at no more than 25% of the PCIe bandwidth. On the other hand, a dual 4 Gb HBA in a PCI or PCI-X slot could certainly be problematic. You may want to take a look at busstat if you feel like you are likely overwhelming a bus. Also, if prstat -mL is showing that you have queries where usr + sys add up to 100 for extended periods of time, you may have problems with queries not having fast enough CPU''s. If you see the lat (time waiting to get on a cpu) more than a few percent, you could probably benefit from having more CPU or faster CPU''s. Hmmm... I forgot to mention dtrace... One thing I have found with the dtrace toolkit scripts on multi terabyte OLTP databases is that the default values in dtrace and/or the scripts are not large enough to store the amount of data required. As such there are more drops than data points, severely limiting the usefulness of the scripts unless you tune them. I love dtrace, but to date I haven''t found it to be more useful for database analysis than a lot of the tools that have existed in Solaris for a decade or so. -- Mike Gerdts http://mgerdts.blogspot.com/