Hi, folks, I''m attempting to run an e-mail server on Xen. The e-mail system is Novell GroupWise, and it serves about 250 users. The disk volume for the e-mail is on my SAN, and I''ve attached the FC LUN to my Xen host, then used the "phy:/dev..." method to forward the disk through to the domU. I''m running into an issue with high I/O wait on the box (~250%) and large load averages (20-40 for the 1/5/15 minute average). I was wondering if anyone has ideas on tuning the domU to handle this - is there a better way to forward the disk device through, should I try using an iSCSI software initiator in the domU, or is it just a bad idea to put an I/O load like this in a domU? Unfortunately mapping the entire F/C card through to the domU isn''t really an option - the FC card accesses other SAN volumes for the Xen host, so it needs to be present in dom0. I''m running Xen 3.2.0 on SLES 10 SP2, on a Dell PowerEdge R610 server. The FC HBA is a QLE2462, dual-channel 4Gb FC card. Any help, hints, etc., are greatly appreciated! -Nick -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
250 users normally is no big deal for an e-mail server, even a virtualized one, though I don''t know how GroupWise behaves. I suggest you to change your domU IO scheduler to minimize dom0-domU IO latency impact: BLAH: your domU block device. $ echo deadline > /sys/block/BLAH/queue/scheduler And play with the settings inside /sys/block/BLAH/queue/ About dom0, I don''t know about your storage and RAID setup so it might (or might not) be a good idea to try to reduce the latency between dom0-storage: BLAH: your FC device paths (sda, sdb ... sdaa, sdab etc) $ echo noop > /sys/block/BLAH/queue/scheduler On Wednesday 26 August 2009 13:01:13 Nick Couchman wrote:> Hi, folks, > I''m attempting to run an e-mail server on Xen. The e-mail system is Novell > GroupWise, and it serves about 250 users. The disk volume for the e-mail > is on my SAN, and I''ve attached the FC LUN to my Xen host, then used the > "phy:/dev..." method to forward the disk through to the domU. I''m running > into an issue with high I/O wait on the box (~250%) and large load averages > (20-40 for the 1/5/15 minute average). I was wondering if anyone has ideas > on tuning the domU to handle this - is there a better way to forward the > disk device through, should I try using an iSCSI software initiator in the > domU, or is it just a bad idea to put an I/O load like this in a domU? > Unfortunately mapping the entire F/C card through to the domU isn''t really > an option - the FC card accesses other SAN volumes for the Xen host, so it > needs to be present in dom0. > > I''m running Xen 3.2.0 on SLES 10 SP2, on a Dell PowerEdge R610 server. The > FC HBA is a QLE2462, dual-channel 4Gb FC card. Any help, hints, etc., are > greatly appreciated! > > -Nick > > > -------- > This e-mail may contain confidential and privileged material for the sole > use of the intended recipient. If this email is not intended for you, or > you are not responsible for the delivery of this message to the intended > recipient, please note that this message may contain SEAKR Engineering > (SEAKR) Privileged/Proprietary Information. In such a case, you are > strictly prohibited from downloading, photocopying, distributing or > otherwise using this message, its contents or attachments in any way. If > you have received this message in error, please notify us immediately by > replying to this e-mail and delete the message from your mailbox. > Information contained in this message that does not relate to the business > of SEAKR is neither endorsed by nor attributable to SEAKR.-- Daniel Mealha Cabrita Divisao de Suporte Tecnico AINFO / Reitoria / UTFPR http://www.utfpr.edu.br _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> I''m attempting to run an e-mail server on Xen. The e-mail system is > Novell GroupWise, and it serves about 250 users. The disk volume for > the e-mail is on my SAN, and I''ve attached the FC LUN to my Xen host, > then used the "phy:/dev..." method to forward the disk through to the > domU. I''m running into an issue with high I/O wait on the box (~250%) > and large load averages (20-40 for the 1/5/15 minute average). I was > wondering if anyone has ideas on tuning the domU to handle this - is > there a better way to forward the disk device through, should I try > using an iSCSI software initiator in the domU, or is it just a bad > idea to put an I/O load like this in a domU? Unfortunately mapping > the entire F/C card through to the domU isn''t really an option - the > FC card accesses other SAN volumes for the Xen host, so it needs to be > present in dom0.If this turns out to be a global issue, I''d certainly like to hear about it. I recently load-tested a postfix+cyrus domU with 6 SATA-backed spools and 6 FC-backed meta partitions for about 300,000 IMAP accounts and consistently delivered around 100 messages/sec to them. That load was obviously all i/o-bound, but at what I''d consider to be an acceptable delivery rate (which seems to be the most performance-challenging operation at least with Cyrus). I did see similar load averages though. This was with a RHEL 5 domU and a CentOS 5 dom0 and phy: mappings. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Aug 26, 2009 at 10:01:13AM -0600, Nick Couchman wrote:> > Hi, folks, > The disk volume for the e-mail is on my SAN, and I''ve attached the FC LUN to my Xen hostDoes the SAN LUN perform OK from dom0 without any domUs running?> I''m running into an issue with high I/O wait on the box (~250%) and large load averages (20-40 for the 1/5/15 minute average).Do you have iowait on dom0, or only in domU? Try running "iostat 1" in both dom0 and domU. Also, have you dedicated a cpu core only for dom0?> I''m running Xen 3.2.0 on SLES 10 SP2, on a Dell PowerEdge R610 server. The FC HBA is a QLE2462, dual-channel 4Gb FC card. Any help, hints, etc., are greatly appreciated!You could also try updating to SLES11. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi, 1) Yes, it seems to, although I've not run my e-mail server inside my dom0, yet - that's not really possible. 2) Just on domU - dom0 seems fine. 3) No, I haven't played around much with pinning dom0 or any of the domUs to certain CPU cores. 4) Updating to SLES11 is in the future; however, doing this requires that I do it to all my production Xen nodes concurrently, since the OCFS2 filesystem doesn't really play nice with other versions. I can either have it mounted on the SLES10 box(es) or the SLES11 box(es), but not both at the same time. Thanks! -Nick>>> On 2009/08/26 at 11:32, Pasi Kärkkäinen<pasik@iki.fi> wrote:On Wed, Aug 26, 2009 at 10:01:13AM -0600, Nick Couchman wrote:> > Hi, folks, > The disk volume for the e-mail is on my SAN, and I've attached the FCLUN to my Xen host Does the SAN LUN perform OK from dom0 without any domUs running?> I'm running into an issue with high I/O wait on the box (~250%) andlarge load averages (20-40 for the 1/5/15 minute average). Do you have iowait on dom0, or only in domU? Try running "iostat 1" in both dom0 and domU. Also, have you dedicated a cpu core only for dom0?> I'm running Xen 3.2.0 on SLES 10 SP2, on a Dell PowerEdge R610server. The FC HBA is a QLE2462, dual-channel 4Gb FC card. Any help, hints, etc., are greatly appreciated! You could also try updating to SLES11. -- Pasi -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
John, What filesystem did you use for this test in the domU for the e-mail storage? I''m currently running XFS on my volume where the GroupWise data sits, and I''m wondering if the filesystem isn''t tuned properly. Could you give me a run-down of what filesystem you used, and what parameters you used for creating the filesystem (block size, inode size, etc.)? Thanks! -Nick>>> On 2009/08/26 at 11:32, John Madden <jmadden@ivytech.edu> wrote:> I''m attempting to run an e-mail server on Xen. The e-mail system is > Novell GroupWise, and it serves about 250 users. The disk volume for > the e-mail is on my SAN, and I''ve attached the FC LUN to my Xen host, > then used the "phy:/dev..." method to forward the disk through to the > domU. I''m running into an issue with high I/O wait on the box (~250%) > and large load averages (20-40 for the 1/5/15 minute average). I was > wondering if anyone has ideas on tuning the domU to handle this - is > there a better way to forward the disk device through, should I try > using an iSCSI software initiator in the domU, or is it just a bad > idea to put an I/O load like this in a domU? Unfortunately mapping > the entire F/C card through to the domU isn''t really an option - the > FC card accesses other SAN volumes for the Xen host, so it needs to be > present in dom0.If this turns out to be a global issue, I''d certainly like to hear about it. I recently load-tested a postfix+cyrus domU with 6 SATA-backed spools and 6 FC-backed meta partitions for about 300,000 IMAP accounts and consistently delivered around 100 messages/sec to them. That load was obviously all i/o-bound, but at what I''d consider to be an acceptable delivery rate (which seems to be the most performance-challenging operation at least with Cyrus). I did see similar load averages though. This was with a RHEL 5 domU and a CentOS 5 dom0 and phy: mappings. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, 2009-08-26 at 11:41 -0600, Nick Couchman wrote:> What filesystem did you use for this test in the domU for the e-mail > storage? I''m currently running XFS on my volume where the GroupWise > data sits, and I''m wondering if the filesystem isn''t tuned properly. > Could you give me a run-down of what filesystem you used, and what > parameters you used for creating the filesystem (block size, inode > size, etc.)?ext3, always. xfs et al may be better depending on the filesystem use but I''ve found ext3 to always be reliable and performant enough. `mke2fs -j -O dir_index -T news /dev/vg/lv` -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I take that back...iostat in dom0 shows similar results to iostat in domU. As suggested by another person, I've changed the dom0 elevator to noop and the domU to deadline. I'm playing with tuning some of the parameters for the deadline scheduler, now. -Nick>>> On 2009/08/26 at 11:32, Pasi Kärkkäinen<pasik@iki.fi> wrote:On Wed, Aug 26, 2009 at 10:01:13AM -0600, Nick Couchman wrote:> > Hi, folks, > The disk volume for the e-mail is on my SAN, and I've attached the FCLUN to my Xen host Does the SAN LUN perform OK from dom0 without any domUs running?> I'm running into an issue with high I/O wait on the box (~250%) andlarge load averages (20-40 for the 1/5/15 minute average). Do you have iowait on dom0, or only in domU? Try running "iostat 1" in both dom0 and domU. Also, have you dedicated a cpu core only for dom0?> I'm running Xen 3.2.0 on SLES 10 SP2, on a Dell PowerEdge R610server. The FC HBA is a QLE2462, dual-channel 4Gb FC card. Any help, hints, etc., are greatly appreciated! You could also try updating to SLES11. -- Pasi -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Aug 26, 2009 at 12:57 PM, Nick Couchman<Nick.Couchman@seakr.com> wrote:> I''ve changed the dom0 elevator to noop and the domU to deadline.shouldn''t that be the other way around? -- Javier _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Doesn''t really seem to make a difference which way I do it...I still see pretty intense disk I/O. Here is some sample output from iostat in the domU: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvdb 12.20 0.00 1217.40 26.20 9197.60 530.80 15.65 29.66 23.47 0.80 100.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvdb 18.40 0.00 1121.20 19.60 8737.60 691.50 16.53 32.97 29.13 0.88 100.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvdb 27.80 0.00 1241.40 29.20 8158.40 377.90 13.44 42.59 33.73 0.79 100.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvdb 31.60 0.00 1256.60 35.00 9426.40 424.00 15.25 42.06 32.44 0.77 100.00 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvdb 57.68 0.00 1250.50 17.76 8588.42 352.99 14.10 51.36 40.60 0.79 99.80 the avgqu-sz is anywhere from 11 to 75, and the await is anywhere from 20 to 50. %util is always around 100. -Nick>>> On 2009/08/26 at 12:00, Javier Guerra <javier@guerrag.com> wrote:On Wed, Aug 26, 2009 at 12:57 PM, Nick Couchman<Nick.Couchman@seakr.com> wrote:> I''ve changed the dom0 elevator to noop and the domU to deadline.shouldn''t that be the other way around? -- Javier -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hmmm...may have to test with ext and the -T news parameter to see how that works. -Nick>>> On 2009/08/26 at 11:54, John Madden <jmadden@ivytech.edu> wrote:On Wed, 2009-08-26 at 11:41 -0600, Nick Couchman wrote:> What filesystem did you use for this test in the domU for the e-mail > storage? I''m currently running XFS on my volume where the GroupWise > data sits, and I''m wondering if the filesystem isn''t tuned properly. > Could you give me a run-down of what filesystem you used, and what > parameters you used for creating the filesystem (block size, inode > size, etc.)?ext3, always. xfs et al may be better depending on the filesystem use but I''ve found ext3 to always be reliable and performant enough. `mke2fs -j -O dir_index -T news /dev/vg/lv` -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wednesday 26 August 2009 15:00:39 Javier Guerra wrote:> On Wed, Aug 26, 2009 at 12:57 PM, Nick Couchman<Nick.Couchman@seakr.com>wrote:> > I''ve changed the dom0 elevator to noop and the domU to deadline. > > shouldn''t that be the other way around?I see what you mean, but it''s really what I meant (as described in another message). The noop@dom0 is to lower (the already high) IO latency and leave all optimizations to the storage to do, at least in theory. In _my_ SAN, though, it''s still worth using a local IO scheduler (though noop is ok). If you don''t have at least a decent hardware-RAID controller with write cache enabled, then it''s a no-no. About deadline@domU, that''s perhaps even more peculiar. Apparently. domU pestering dom0 too often with IO requests degrades performance, at least in xen 3.0.x. -- Daniel Mealha Cabrita Divisao de Suporte Tecnico AINFO / Reitoria / UTFPR http://www.utfpr.edu.br _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Aug 26, 2009 at 12:07:55PM -0600, Nick Couchman wrote:> > Doesn''t really seem to make a difference which way I do it...I still see pretty intense disk I/O. > > Here is some sample output from iostat in the domU: > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > xvdb 12.20 0.00 1217.40 26.20 9197.60 530.80 15.65 29.66 23.47 0.80 100.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > xvdb 18.40 0.00 1121.20 19.60 8737.60 691.50 16.53 32.97 29.13 0.88 100.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > xvdb 27.80 0.00 1241.40 29.20 8158.40 377.90 13.44 42.59 33.73 0.79 100.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > xvdb 31.60 0.00 1256.60 35.00 9426.40 424.00 15.25 42.06 32.44 0.77 100.00 > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util > xvdb 57.68 0.00 1250.50 17.76 8588.42 352.99 14.10 51.36 40.60 0.79 99.80 > > the avgqu-sz is anywhere from 11 to 75, and the await is anywhere from 20 to 50. %util is always around 100. >Well.. it seems your SAN LUN is the problem. Have you checked the load from the FC Storage array? Or then the problem is in your FC HBA. Have you verified the FC link is at full speed? Are the FC switches OK? Do you have up-to-date HBA driver in dom0? Are the HBA/Switch/Storage firmwares up-to-date? -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, Aug 26, 2009 at 11:01 PM, Nick Couchman<Nick.Couchman@seakr.com> wrote:> Hi, folks, > > I''m attempting to run an e-mail server on Xen. The e-mail system is Novell > GroupWise, and it serves about 250 users. The disk volume for the e-mail is > on my SAN, and I''ve attached the FC LUN to my Xen host, then used the > "phy:/dev..." method to forward the disk through to the domU. I''m running > into an issue with high I/O wait on the box (~250%) and large load averages > (20-40 for the 1/5/15 minute average).Just to be clear : can a native system handle your load? Try iostat on both dom0 and domU. My guess is that you''re I/O bound, and even moving to it a native physical server won''t help, since the bottleneck is in the disk.> I was wondering if anyone has ideas > on tuning the domU to handle this - is there a better way to forward the > disk device through, should I try using an iSCSI software initiator in the > domU,Some past threads on this list suggest otherwise. iSCSI in domU provides worse performance compared to (for example) iscsi in dom0 and passing the disk using phy:/.> or is it just a bad idea to put an I/O load like this in a domU?If it works on native system it should work on a domU. -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
So here are some details on the SAN LUN...the SAN is a Compellent SAN attached to my FC Switch (McData Sphereon 4700, now the Brocade M4700) with 4 x 2Gb FC connections. The dom0 uses the QLE2462 adapter, with a single 4Gb connection hooked up. I did find that there is a later driver available - I'll try to switch to that when I get a chance. One interesting thing that I found is that it the adapter appears to be in a 4x PCIe slot, which means the max bandwidth for the card is 2.5Gbps. I'm not sure if this is a QLogic issue or if I need to move the card to a different slot in my Dell PowerEdge R610 chassis, but it looks like I'm being limited to 2/3 or so the speed of the FC connection by my PCIe bus. It's using a 4Gbps Point-to-Point connection, with a frame size of 2048. Any hints on whether any of that needs tuning would be great. I'm not really sure that bandwidth is an issue - perhaps latency more than that. I don't think the amount of data is what's causing the problem; rather the number of transactions that the e-mail system is trying to do on the volume. The file sizes are actually pretty small - 1 to 4 Kb on average, so I think it's the large number of these files that it has to try to read rather than streaming a large amount of data. Both the SAN and the iostat output on both dom0 and domU indicate somewhere between 5000 and 20000 kB/s read rates - that's somewhere around 40Mb/s to 160Mb/s, which is well within the capability of the FC connection. The SAN is indicating I/O operations between 500 and 1500 I/O requests per second, which I assume is what's causing the problem. Again, any tips on what to look at next would be greatly appreciated! Thanks for all the advice so far! -Nick>>> On 2009/08/27 at 03:00, Pasi Kärkkäinen<pasik@iki.fi> wrote:On Wed, Aug 26, 2009 at 12:07:55PM -0600, Nick Couchman wrote:> > Doesn't really seem to make a difference which way I do it...I stillsee pretty intense disk I/O.> > Here is some sample output from iostat in the domU: > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/savgrq-sz avgqu-sz await svctm %util> xvdb 12.20 0.00 1217.40 26.20 9197.60 530.8015.65 29.66 23.47 0.80 100.00> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/savgrq-sz avgqu-sz await svctm %util> xvdb 18.40 0.00 1121.20 19.60 8737.60 691.5016.53 32.97 29.13 0.88 100.00> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/savgrq-sz avgqu-sz await svctm %util> xvdb 27.80 0.00 1241.40 29.20 8158.40 377.9013.44 42.59 33.73 0.79 100.00> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/savgrq-sz avgqu-sz await svctm %util> xvdb 31.60 0.00 1256.60 35.00 9426.40 424.0015.25 42.06 32.44 0.77 100.00> > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/savgrq-sz avgqu-sz await svctm %util> xvdb 57.68 0.00 1250.50 17.76 8588.42 352.9914.10 51.36 40.60 0.79 99.80> > the avgqu-sz is anywhere from 11 to 75, and the await is anywherefrom 20 to 50. %util is always around 100.>Well.. it seems your SAN LUN is the problem. Have you checked the load from the FC Storage array? Or then the problem is in your FC HBA. Have you verified the FC link is at full speed? Are the FC switches OK? Do you have up-to-date HBA driver in dom0? Are the HBA/Switch/Storage firmwares up-to-date? -- Pasi -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I think that my previous native system from which I migrated this VM was handling the load *better* than this one is, but, unfortunately, it''s not a one-to-one comparison. The previous system was a PowerEdge 2650 with 2 x CPUs and 4GB of RAM, and a single 4Gb FC connection to the SAN. This system was seeing some I/O issues, but was also seeing some pretty severe CPU load. Now I''ve eliminated the CPU bottleneck by moving it into a VM that has 4 x CPUs and 4GB of RAM, but seem to be up against the I/O bottleneck, now. Due to the complexity of the software installation, switching the load over to the dom0 on the box really isn''t an option for testing. But, it seems that you''re probably right, since iostat in both dom0 and domU show very similar statistics. My real question is, what can I do to alleviate it? Is it really a SAN issue? Will tuning the filesystem (even if that means recreating the filesystem) help reduce the number of I/O operations per second? I guess I have a few things to investigate, and I may file a support case with the SAN vendor and request some assistance from them. Thanks! -Nick>>> On 2009/08/27 at 03:36, "Fajar A. Nugraha" <fajar@fajar.net> wrote:On Wed, Aug 26, 2009 at 11:01 PM, Nick Couchman<Nick.Couchman@seakr.com> wrote:> Hi, folks, > > I''m attempting to run an e-mail server on Xen. The e-mail system is Novell > GroupWise, and it serves about 250 users. The disk volume for the e-mail is > on my SAN, and I''ve attached the FC LUN to my Xen host, then used the > "phy:/dev..." method to forward the disk through to the domU. I''m running > into an issue with high I/O wait on the box (~250%) and large load averages > (20-40 for the 1/5/15 minute average).Just to be clear : can a native system handle your load? Try iostat on both dom0 and domU. My guess is that you''re I/O bound, and even moving to it a native physical server won''t help, since the bottleneck is in the disk.> I was wondering if anyone has ideas > on tuning the domU to handle this - is there a better way to forward the > disk device through, should I try using an iSCSI software initiator in the > domU,Some past threads on this list suggest otherwise. iSCSI in domU provides worse performance compared to (for example) iscsi in dom0 and passing the disk using phy:/.> or is it just a bad idea to put an I/O load like this in a domU?If it works on native system it should work on a domU. -- Fajar -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> I''m not really sure that bandwidth is an issue - perhaps latency more > than that. I don''t think the amount of data is what''s causing the > problem; rather the number of transactions that the e-mail system is > trying to do on the volume. The file sizes are actually pretty small > - 1 to 4 Kb on average, so I think it''s the large number of these > files that it has to try to read rather than streaming a large amount > of data. Both the SAN and the iostat output on both dom0 and domU > indicate somewhere between 5000 and 20000 kB/s read rates - that''s > somewhere around 40Mb/s to 160Mb/s, which is well within the > capability of the FC connection. The SAN is indicating I/O operations > between 500 and 1500 I/O requests per second, which I assume is what''s > causing the problem.What''s the backend inside the SAN look like? Look into amount of cache, number of spindles, RAID used, what else is using those spindles, etc. 500-1500 iops isn''t a lot for a "SAN" in general, but given that your FC disks are going to get around 200 worst-case iops, you''d still need quite a few of them to push 1500 continuously (with your cache picking up some of the spikes). And that depends on workload (read/write, random or not, block size) and RAID type. In case you haven''t already, I''d look into the usual filesystem performance guides and do things like turning off atime and that lot. My feeling on this is that you''re going to need to drive down those iops numbers. What were your results on trying something other than xfs? John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Aug 27, 2009 at 8:54 PM, Nick Couchman<Nick.Couchman@seakr.com> wrote:> But, it > seems that you''re probably right, since iostat in both dom0 and domU show > very similar statistics. > > My real question is, what can I do to alleviate it? Is it really a SAN > issue?If dom0 iostat says near 100% usage, then yes, most probably it''s a storage issue.> Will tuning the filesystem (even if that means recreating the > filesystem) help reduce the number of I/O operations per second? I guess I > have a few things to investigate, and I may file a support case with the SAN > vendor and request some assistance from them.How many disks you have in your SAN? How many disks are in use exclusively by this system? A typical SATA disk handles < 100 random IOPS, so that might be the issue, and increasing the number of disks (and configure them to be used evenly) seems to be the solution. As to how to reduce number of IOPS, well, I''m not really sure there''s a way to do it that doesn''t involve changing your application. Some things to try : - if the load is bursty then usually adding more writeback memory cache in SAN/SCSI controller helps. - If it''s mostily temporary files then using something like ext4 which has delayed allocation should help. - Another method would be switching to zfs and adding some SSD for ZIL, but this belongs in a different list :P -- Fajar _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Let''s see...the SAN has two controllers with a 4GB cache in each controller. Each controller has a single 4 x 2Gb FC controller. Two of those ports go to the switch; the other two create redundant loops with the disk array (going from the controller to one disk array, then to the next disk array, then to the second controler). The disks are FCATA disks, there are 30 active disks (with 2 hot-spares). The SAN does RAIDs across the disks on a per-volume basis, and my e-mail volume is using a RAID10 configuration. I''ve done most of the filesystem tuning I can without completely rebuilding the filesystem - atime is turned off. I''ve also adjusted the elevator per previous suggestions and played with some of the tuning parameters for the elevators. I haven''t got around to trying something other than XFS, yet - it''s going to take a while to sync over stuff from the existing FS to an EXT3 or something similar. I''m also contacting the SAN vendor to get their help in the situation. -Nick>>> On 2009/08/27 at 08:15, John Madden <jmadden@ivytech.edu> wrote:> I''m not really sure that bandwidth is an issue - perhaps latency more > than that. I don''t think the amount of data is what''s causing the > problem; rather the number of transactions that the e-mail system is > trying to do on the volume. The file sizes are actually pretty small > - 1 to 4 Kb on average, so I think it''s the large number of these > files that it has to try to read rather than streaming a large amount > of data. Both the SAN and the iostat output on both dom0 and domU > indicate somewhere between 5000 and 20000 kB/s read rates - that''s > somewhere around 40Mb/s to 160Mb/s, which is well within the > capability of the FC connection. The SAN is indicating I/O operations > between 500 and 1500 I/O requests per second, which I assume is what''s > causing the problem.What''s the backend inside the SAN look like? Look into amount of cache, number of spindles, RAID used, what else is using those spindles, etc. 500-1500 iops isn''t a lot for a "SAN" in general, but given that your FC disks are going to get around 200 worst-case iops, you''d still need quite a few of them to push 1500 continuously (with your cache picking up some of the spikes). And that depends on workload (read/write, random or not, block size) and RAID type. In case you haven''t already, I''d look into the usual filesystem performance guides and do things like turning off atime and that lot. My feeling on this is that you''re going to need to drive down those iops numbers. What were your results on trying something other than xfs? John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Aug 27, 2009 at 07:46:46AM -0600, Nick Couchman wrote:> So here are some details on the SAN LUN...the SAN is a Compellent SAN > attached to my FC Switch (McData Sphereon 4700, now the Brocade M4700) > with 4 x 2Gb FC connections. The dom0 uses the QLE2462 adapter, with a > single 4Gb connection hooked up. I did find that there is a later > driver available - I''ll try to switch to that when I get a chance. One > interesting thing that I found is that it the adapter appears to be in a > 4x PCIe slot, which means the max bandwidth for the card is 2.5Gbps. > I''m not sure if this is a QLogic issue or if I need to move the card to > a different slot in my Dell PowerEdge R610 chassis, but it looks like > I''m being limited to 2/3 or so the speed of the FC connection by my PCIe > bus. It''s using a 4Gbps Point-to-Point connection, with a frame size of > 2048. Any hints on whether any of that needs tuning would be great. >OK. I don''t think the pci-e slot is your problem.> I''m not really sure that bandwidth is an issue - perhaps latency more > than that. I don''t think the amount of data is what''s causing the > problem; rather the number of transactions that the e-mail system is > trying to do on the volume. The file sizes are actually pretty small - > 1 to 4 Kb on average, so I think it''s the large number of these files > that it has to try to read rather than streaming a large amount of data. > Both the SAN and the iostat output on both dom0 and domU indicate > somewhere between 5000 and 20000 kB/s read rates - that''s somewhere > around 40Mb/s to 160Mb/s, which is well within the capability of the FC > connection. The SAN is indicating I/O operations between 500 and 1500 > I/O requests per second, which I assume is what''s causing the problem. >What''s the size of those requests? 4 kB? 1500 IOPS * 4kB/IO == 6000 kB/sec (6 MB/sec). What kind of disk drives are you using on the Compellent storage array, on the RAID set for this LUN? 1500 random IOPS requires at least 10x 7200 SATA disks (if using SATA). each SATA 7200 rpm sata disk can do max around 150 random IOPS. each 15k rpm SAS disk can do max ~300 random IOPS. It''s easy maths. Big write-back cache in the storage array will help though. -- Pasi> Again, any tips on what to look at next would be greatly appreciated! > Thanks for all the advice so far! > > -Nick > > >>> On 2009/08/27 at 03:00, Pasi Kärkkäinen<pasik@iki.fi> wrote: > > On Wed, Aug 26, 2009 at 12:07:55PM -0600, Nick Couchman wrote: > > > > Doesn''t really seem to make a difference which way I do it...I still > see pretty intense disk I/O. > > > > Here is some sample output from iostat in the domU: > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 12.20 0.00 1217.40 26.20 9197.60 530.80 > 15.65 29.66 23.47 0.80 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 18.40 0.00 1121.20 19.60 8737.60 691.50 > 16.53 32.97 29.13 0.88 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 27.80 0.00 1241.40 29.20 8158.40 377.90 > 13.44 42.59 33.73 0.79 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 31.60 0.00 1256.60 35.00 9426.40 424.00 > 15.25 42.06 32.44 0.77 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 57.68 0.00 1250.50 17.76 8588.42 352.99 > 14.10 51.36 40.60 0.79 99.80 > > > > the avgqu-sz is anywhere from 11 to 75, and the await is anywhere > from 20 to 50. %util is always around 100. > > > > Well.. it seems your SAN LUN is the problem. Have you checked the load > from the FC Storage array? > > Or then the problem is in your FC HBA. Have you verified the FC link is > at full speed? > > Are the FC switches OK? > > Do you have up-to-date HBA driver in dom0? Are the HBA/Switch/Storage > firmwares up-to-date? > > -- Pasi > > > > -------- > This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR._______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, 2009-08-27 at 08:25 -0600, Nick Couchman wrote:> Let''s see...the SAN has two controllers with a 4GB cache in each > controller. Each controller has a single 4 x 2Gb FC controller. Two > of those ports go to the switch; the other two create redundant loops > with the disk array (going from the controller to one disk array, then > to the next disk array, then to the second controler). The disks are > FCATA disks, there are 30 active disks (with 2 hot-spares). The SAN > does RAIDs across the disks on a per-volume basis, and my e-mail > volume is using a RAID10 configuration.FCATA? Well, that isn''t going to help your situation any. But 30 spindles is a good start. How many are in your particular RAID 10 group? It''s sounding like 1500 iops might be all this guy can handle (ATA, maybe 120 iops, RAID 10, so you''re using at most 14 of those disks, that''d give you 1680 iops max -- for reads -- half that for writes.)> tuning parameters for the elevators. I haven''t got around to trying > something other than XFS, yet - it''s going to take a while to sync > over stuff from the existing FS to an EXT3 or something similar. I''m > also contacting the SAN vendor to get their help in the situation.Shut down, rsync, remount, start up. I don''t think your SAN vendor could really help here...? John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Nick, Do you mean Groupwise data volume is on one RAID10 comprised of 30 disks dedicated to Groupwise data? Or that this one RAID volume is contending with other volumes using the disks on the SAN? I''m not familiar with how Groupwise works, does ideal deployment suggest separate sets of spindles for temp file, database and transaction logs? Is the RAID block/chunk/stripe size aligned with xfs sunit/swidth parameters? Are the xfs block boundaries aligned with the RAID blocks? Is that 4GB of write back cache? What is the write back delay? How fast are the drives in rpm?> Date: Thu, 27 Aug 2009 08:25:08 -0600 > From: "Nick Couchman" <Nick.Couchman@seakr.com> > Subject: Re: [Xen-users] Xen and I/O Intensive Loads > > Let''s see...the SAN has two controllers with a 4GB cache in each > controller. Each controller has a single 4 x 2Gb FC controller. Two of > those ports go to the switch; the other two create redundant loops with > the disk array (going from the controller to one disk array, then to the > next disk array, then to the second controler). The disks are FCATA > disks, there are 30 active disks (with 2 hot-spares). The SAN does RAIDs > across the disks on a per-volume basis, and my e-mail volume is using a > RAID10 configuration. > > I''ve done most of the filesystem tuning I can without completely > rebuilding the filesystem - atime is turned off. I''ve also adjusted the > elevator per previous suggestions and played with some of the tuning > parameters for the elevators. I haven''t got around to trying something > other than XFS, yet - it''s going to take a while to sync over stuff from > the existing FS to an EXT3 or something similar. I''m also contacting the > SAN vendor to get their help in the situation. > > -Nick_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Yeah, it may be time to invest in some real drives :-). It''s hard to tell how many disks are being used for the RAID 10 - the information isn''t readily available in the user interface for the SAN. I''m working with the vendor right now, so maybe they can help me out. Also, I wasn''t suggesting the SAN vendor would help with the rsync/startup/shutdown, just that maybe they can tell me if the performance I''m seeing on the SAN is what I should expect or not. I may have to purchase a chassis of FC disks that run at 10K or 15K RPM. Current FCATA drives are 7200 RPM. -Nick>>> On 2009/08/27 at 08:30, John Madden <jmadden@ivytech.edu> wrote:On Thu, 2009-08-27 at 08:25 -0600, Nick Couchman wrote:> Let''s see...the SAN has two controllers with a 4GB cache in each > controller. Each controller has a single 4 x 2Gb FC controller. Two > of those ports go to the switch; the other two create redundant loops > with the disk array (going from the controller to one disk array, then > to the next disk array, then to the second controler). The disks are > FCATA disks, there are 30 active disks (with 2 hot-spares). The SAN > does RAIDs across the disks on a per-volume basis, and my e-mail > volume is using a RAID10 configuration.FCATA? Well, that isn''t going to help your situation any. But 30 spindles is a good start. How many are in your particular RAID 10 group? It''s sounding like 1500 iops might be all this guy can handle (ATA, maybe 120 iops, RAID 10, so you''re using at most 14 of those disks, that''d give you 1680 iops max -- for reads -- half that for writes.)> tuning parameters for the elevators. I haven''t got around to trying > something other than XFS, yet - it''s going to take a while to sync > over stuff from the existing FS to an EXT3 or something similar. I''m > also contacting the SAN vendor to get their help in the situation.Shut down, rsync, remount, start up. I don''t think your SAN vendor could really help here...? John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmadden@ivytech.edu -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
You might have seen my other replies, but here you go...30 disks total active in the SAN, with 2 hot spares. FCATA drives, 7200 RPM. I can''t really modify the SAN controllers much - they have 4GB read and write caching - I suppose I can ask the vendor if they go any higher on that, but I think that''s about the highest it goes. I may be able to purchase some SSD, but that gets real $$$$ real fast, and I''d rather try out some faster drives - maybe real FC at 10K or 15K RPM. -Nick>>> On 2009/08/27 at 08:18, "Fajar A. Nugraha" <fajar@fajar.net> wrote:On Thu, Aug 27, 2009 at 8:54 PM, Nick Couchman<Nick.Couchman@seakr.com> wrote:> But, it > seems that you''re probably right, since iostat in both dom0 and domU show > very similar statistics. > > My real question is, what can I do to alleviate it? Is it really a SAN > issue?If dom0 iostat says near 100% usage, then yes, most probably it''s a storage issue.> Will tuning the filesystem (even if that means recreating the > filesystem) help reduce the number of I/O operations per second? I guess I > have a few things to investigate, and I may file a support case with the SAN > vendor and request some assistance from them.How many disks you have in your SAN? How many disks are in use exclusively by this system? A typical SATA disk handles < 100 random IOPS, so that might be the issue, and increasing the number of disks (and configure them to be used evenly) seems to be the solution. As to how to reduce number of IOPS, well, I''m not really sure there''s a way to do it that doesn''t involve changing your application. Some things to try : - if the load is bursty then usually adding more writeback memory cache in SAN/SCSI controller helps. - If it''s mostily temporary files then using something like ext4 which has delayed allocation should help. - Another method would be switching to zfs and adding some SSD for ZIL, but this belongs in a different list :P -- Fajar -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
I'm not sure what the request size is - block size on the filesystem is 4K, inode size is 512 bytes. Drives are FCATA 7200 RPM - 30 active spindles plus two hot spares. I don't know how Compellent does the striping, so I don't know if my RAID10 volume is striped across all 30 active spindles or if they choose a subset of those. -Nick>>> On 2009/08/27 at 08:28, Pasi Kärkkäinen<pasik@iki.fi> wrote:On Thu, Aug 27, 2009 at 07:46:46AM -0600, Nick Couchman wrote:> So here are some details on the SAN LUN...the SAN is a CompellentSAN> attached to my FC Switch (McData Sphereon 4700, now the BrocadeM4700)> with 4 x 2Gb FC connections. The dom0 uses the QLE2462 adapter, witha> single 4Gb connection hooked up. I did find that there is a later > driver available - I'll try to switch to that when I get a chance.One> interesting thing that I found is that it the adapter appears to bein a> 4x PCIe slot, which means the max bandwidth for the card is 2.5Gbps. > I'm not sure if this is a QLogic issue or if I need to move the cardto> a different slot in my Dell PowerEdge R610 chassis, but it lookslike> I'm being limited to 2/3 or so the speed of the FC connection by myPCIe> bus. It's using a 4Gbps Point-to-Point connection, with a frame sizeof> 2048. Any hints on whether any of that needs tuning would be great. >OK. I don't think the pci-e slot is your problem.> I'm not really sure that bandwidth is an issue - perhaps latencymore> than that. I don't think the amount of data is what's causing the > problem; rather the number of transactions that the e-mail system is > trying to do on the volume. The file sizes are actually pretty small-> 1 to 4 Kb on average, so I think it's the large number of thesefiles> that it has to try to read rather than streaming a large amount ofdata.> Both the SAN and the iostat output on both dom0 and domU indicate > somewhere between 5000 and 20000 kB/s read rates - that's somewhere > around 40Mb/s to 160Mb/s, which is well within the capability of theFC> connection. The SAN is indicating I/O operations between 500 and1500> I/O requests per second, which I assume is what's causing theproblem.>What's the size of those requests? 4 kB? 1500 IOPS * 4kB/IO == 6000 kB/sec (6 MB/sec). What kind of disk drives are you using on the Compellent storage array, on the RAID set for this LUN? 1500 random IOPS requires at least 10x 7200 SATA disks (if using SATA). each SATA 7200 rpm sata disk can do max around 150 random IOPS. each 15k rpm SAS disk can do max ~300 random IOPS. It's easy maths. Big write-back cache in the storage array will help though. -- Pasi> Again, any tips on what to look at next would be greatlyappreciated!> Thanks for all the advice so far! > > -Nick > > >>> On 2009/08/27 at 03:00, Pasi Kärkkäinen<pasik@iki.fi> wrote: > > On Wed, Aug 26, 2009 at 12:07:55PM -0600, Nick Couchman wrote: > > > > Doesn't really seem to make a difference which way I do it...Istill> see pretty intense disk I/O. > > > > Here is some sample output from iostat in the domU: > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 12.20 0.00 1217.40 26.20 9197.60 530.80> 15.65 29.66 23.47 0.80 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 18.40 0.00 1121.20 19.60 8737.60 691.50> 16.53 32.97 29.13 0.88 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 27.80 0.00 1241.40 29.20 8158.40 377.90> 13.44 42.59 33.73 0.79 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 31.60 0.00 1256.60 35.00 9426.40 424.00> 15.25 42.06 32.44 0.77 100.00 > > > > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await svctm %util > > xvdb 57.68 0.00 1250.50 17.76 8588.42 352.99> 14.10 51.36 40.60 0.79 99.80 > > > > the avgqu-sz is anywhere from 11 to 75, and the await is anywhere > from 20 to 50. %util is always around 100. > > > > Well.. it seems your SAN LUN is the problem. Have you checked theload> from the FC Storage array? > > Or then the problem is in your FC HBA. Have you verified the FC linkis> at full speed? > > Are the FC switches OK? > > Do you have up-to-date HBA driver in dom0? Are theHBA/Switch/Storage> firmwares up-to-date? > > -- Pasi > > > > -------- > This e-mail may contain confidential and privileged material for thesole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. -------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Oliver, The way the Compellent system works is that is does a per-volume RAID. So, there are 30 disks presented to the SAN controllers as a JBOD, and then each volume is assigned one or more RAID levels, and the controller stripes the data and moves it between RAID levels. The GroupWise data volume is configured as a RAID10 only, but it does contend with other volumes on the same set of disks. GroupWise does not use separate disks or volumes for temporary data, databases, logs, etc. - everything is kept in the same filesystem and there really isn''t much documentation or whether it''s possible or how to separate those things. I''m not sure about the RAID block/chunk/stripe size - the user interface on the controller doesn''t really lend itself well to those sorts of detailed customizations. I''ll have to dig a little bit to see about that. Drives are FCATA 7200 RPM, and the 4GB cache is for read and write - I''m not sure if they do write-through or write-back - I''ll check on that. -Nick>>> On 2009/08/27 at 09:11, "Oliver Wilcock" <oliver@owch.ca> wrote:Nick, Do you mean Groupwise data volume is on one RAID10 comprised of 30 disks dedicated to Groupwise data? Or that this one RAID volume is contending with other volumes using the disks on the SAN? I''m not familiar with how Groupwise works, does ideal deployment suggest separate sets of spindles for temp file, database and transaction logs? Is the RAID block/chunk/stripe size aligned with xfs sunit/swidth parameters? Are the xfs block boundaries aligned with the RAID blocks? Is that 4GB of write back cache? What is the write back delay? How fast are the drives in rpm?> Date: Thu, 27 Aug 2009 08:25:08 -0600 > From: "Nick Couchman" <Nick.Couchman@seakr.com> > Subject: Re: [Xen-users] Xen and I/O Intensive Loads > > Let''s see...the SAN has two controllers with a 4GB cache in each > controller. Each controller has a single 4 x 2Gb FC controller. Two of > those ports go to the switch; the other two create redundant loops with > the disk array (going from the controller to one disk array, then to the > next disk array, then to the second controler). The disks are FCATA > disks, there are 30 active disks (with 2 hot-spares). The SAN does RAIDs > across the disks on a per-volume basis, and my e-mail volume is using a > RAID10 configuration. > > I''ve done most of the filesystem tuning I can without completely > rebuilding the filesystem - atime is turned off. I''ve also adjusted the > elevator per previous suggestions and played with some of the tuning > parameters for the elevators. I haven''t got around to trying something > other than XFS, yet - it''s going to take a while to sync over stuff from > the existing FS to an EXT3 or something similar. I''m also contacting the > SAN vendor to get their help in the situation. > > -Nick-------- This e-mail may contain confidential and privileged material for the sole use of the intended recipient. If this email is not intended for you, or you are not responsible for the delivery of this message to the intended recipient, please note that this message may contain SEAKR Engineering (SEAKR) Privileged/Proprietary Information. In such a case, you are strictly prohibited from downloading, photocopying, distributing or otherwise using this message, its contents or attachments in any way. If you have received this message in error, please notify us immediately by replying to this e-mail and delete the message from your mailbox. Information contained in this message that does not relate to the business of SEAKR is neither endorsed by nor attributable to SEAKR. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Thu, Aug 27, 2009 at 01:08:42PM -0600, Nick Couchman wrote:> > You might have seen my other replies, but here you go...30 disks total active in the SAN, with 2 hot spares. FCATA drives, 7200 RPM. I can''t really modify the SAN controllers much - they have 4GB read and write caching - I suppose I can ask the vendor if they go any higher on that, but I think that''s about the highest it goes. I may be able to purchase some SSD, but that gets real $$$$ real fast, and I''d rather try out some faster drives - maybe real FC at 10K or 15K RPM.If you''re IOPS limited then definitely go for 15K drives. You''ll get 50% more IOPS from them. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users