Hey folks, CentOS / PostgreSQL shop over here. I'm hitting 3 of my favorite lists with this, so here's hoping that the BCC trick is the right way to do it :-) We've just discovered thanks to a new Munin plugin http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html that our production DB is completely maxing out in I/O for about a 3 hour stretch from 6am til 9am This is "device utilization" as per the last graph at the above link. Load went down for a while but is now between 70% and 95% sustained. We've only had this plugin going for less than a day so I don't really have any more data going back further. But we've suspected a disk issue for some time - just have not been able to prove it. Our system IBM 3650 - quad 2Ghz e5405 Xeon 8K SAS RAID Controller 6 x 300G 15K/RPM SAS Drives /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS /dev/sdb - 3 drives configured as RAID5 for 600G for the DB 1 drive as a global hot spare /dev/sdb is the one that is maxing out. We need to have a very serious look at fixing this situation. But we don't have the money to be experimenting with solutions that won't solve our problem. And our budget is fairly limited. Is there a public library somewhere of disk subsystems and their performance figures? Done with some semblance of a standard benchmark? One benchmark I am partial to is this one : http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmarking_notes#dd_test One thing I am thinking of in the immediate term is taking the RAID5 + hot spare and converting it to RAID10 with the same amount of storage. Will that perform much better? In general we are planning to move away from RAID5 toward RAID10. We also have on order an external IBM array (don't have the exact name on hand but model number was 3000) with 12 drive bays. We ordered it with just 4 x SATAII drives, and were going to put it on a different system as a RAID10. These are just 7200 RPM drives - the goal was cheaper storage because the SAS drives are about twice as much per drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC the SATA2 drives are about $200 each and the SAS 300G drives about $500 each. So I have 2 thoughts with this 12 disk array. 1 is to fill it up with 12 x cheap SATA2 drives and hope that even though the spin-rate is a lot slower, that the fact that it has more drives will make it perform better. But somehow I am doubtful about that. The other thought is to bite the bullet and fill it up with 300G SAS drives. any thoughts here? recommendations on what to do with a tight budget? It could be the answer is that I just have to go back to the bean counters and tell them we have no choice but to start spending some real money. But on what? And how do I prove that this is the only choice? -- ?Don't eat anything you've ever seen advertised on TV? - Michael Pollan, author of "In Defense of Food"
On Fri, Oct 09, 2009 at 12:45:14PM -0400, Alan McKay wrote:> Hey folks, > > CentOS / PostgreSQL shop over here. > > I'm hitting 3 of my favorite lists with this, so here's hoping that > the BCC trick is the right way to do it :-) > > We've just discovered thanks to a new Munin plugin > http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html > that our production DB is completely maxing out in I/O for about a 3 > hour stretch from 6am til 9am > This is "device utilization" as per the last graph at the above link. > > Load went down for a while but is now between 70% and 95% sustained. > We've only had this plugin going for less than a day so I don't really > have any more data going back further. But we've suspected a disk > issue for some time - just have not been able to prove it.Really hard to say what's going on. Does your DB need optimization? Do the applications hitting it? Maybe some indexing? Maybe some more RAM on the machine would help? What exactly is the workload like -- especially during the time when you're peaked out? Is the system swapping? If so, you either need more memory or need to track down a memory leak.... 'free' and 'sar' can both help you see what swap usage is like. It would be interesting to know which processes are running and consuming IO during this peak period as well. top could probably give you an "OK" picture, but something like iotop or SystemTap could tell you a lot more (unfortunately you'll have to wait for 5.4 to get that functionality I believe). Writes are always slower on any parity based RAID setup, so I imagine you'd get superior performance on RAID10, especially if you're write heavy. But to begin with, it'd be interesting to know exactly what this server is doing. Does it makes sense that the disks are being brought to their knees with the given workload? Is the disk array you bought an N-series? (N3300, N3600)? If so, those are NetApps and should be quite fast thanks to heavy write caching. Even then, you'll be limited by spindles it sounds like... Ray
Hi Alan, You will get the best performance on /dev/sdb by making that a RAID 10 device. The write penalty for R5 is a killer. You will lose a bit of space over R5 but the performance is worth it. As to the external enclosure, 15k 300g SAS drives are dropping in price if you shop around or ask a sales person for a break since you are buying more that 2-4. Dell gave us a nice break on our last order to ~ $300 per disk. I would use R10 there as well. Do make sure to look at the controller attaching the array. It may be best to let it do the raid on the hardware side. It may be best to do 2 or 3 R10 groups to maximize the channels or one big R10 to take advantage of I/O spindle distribution. Do spread your data around as much as possible. If you have to use the SATA drives, then definitely fill up the array and spread the load as wide as possible. Hope this helps. Tim On Fri, 2009-10-09 at 12:45 -0400, Alan McKay wrote:> Hey folks, > > CentOS / PostgreSQL shop over here. > > I'm hitting 3 of my favorite lists with this, so here's hoping that > the BCC trick is the right way to do it :-) > > We've just discovered thanks to a new Munin plugin > http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html > that our production DB is completely maxing out in I/O for about a 3 > hour stretch from 6am til 9am > This is "device utilization" as per the last graph at the above link. > > Load went down for a while but is now between 70% and 95% sustained. > We've only had this plugin going for less than a day so I don't really > have any more data going back further. But we've suspected a disk > issue for some time - just have not been able to prove it. > > Our system > IBM 3650 - quad 2Ghz e5405 Xeon > 8K SAS RAID Controller > 6 x 300G 15K/RPM SAS Drives > /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS > /dev/sdb - 3 drives configured as RAID5 for 600G for the DB > 1 drive as a global hot spare > > /dev/sdb is the one that is maxing out. > > We need to have a very serious look at fixing this situation. But we > don't have the money to be experimenting with solutions that won't > solve our problem. And our budget is fairly limited. > > Is there a public library somewhere of disk subsystems and their > performance figures? Done with some semblance of a standard > benchmark? > > One benchmark I am partial to is this one : > http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmarking_notes#dd_test > > One thing I am thinking of in the immediate term is taking the RAID5 + > hot spare and converting it to RAID10 with the same amount of storage. > Will that perform much better? > > In general we are planning to move away from RAID5 toward RAID10. > > We also have on order an external IBM array (don't have the exact name > on hand but model number was 3000) with 12 drive bays. We ordered it > with just 4 x SATAII drives, and were going to put it on a different > system as a RAID10. These are just 7200 RPM drives - the goal was > cheaper storage because the SAS drives are about twice as much per > drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC > the SATA2 drives are about $200 each and the SAS 300G drives about > $500 each. > > So I have 2 thoughts with this 12 disk array. 1 is to fill it up > with 12 x cheap SATA2 drives and hope that even though the spin-rate > is a lot slower, that the fact that it has more drives will make it > perform better. But somehow I am doubtful about that. The other > thought is to bite the bullet and fill it up with 300G SAS drives. > > any thoughts here? recommendations on what to do with a tight budget? > It could be the answer is that I just have to go back to the bean > counters and tell them we have no choice but to start spending some > real money. But on what? And how do I prove that this is the only > choice? > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20091009/a5a79e32/attachment-0002.html>
I would second what another poster stated, to look at DB/RAM optimizations. If possible, you should have enough RAM in the system to hold your entire DB in memory - and your db software should be setup to take advantage of that memory. Since you stated that your system is only having trouble at writes, doing batch inserts, I would look at how you are doing these batch inserts and see if you can optimize this by using techniques to delay key writes or adjust locking mechanisms on the insert. Some databases have special insert from file or bulk insert operations that are much faster than running many single row inserts - be sure you are taking advantage of these methods. --Blake
On Fri, Oct 09, 2009 at 12:45:14PM -0400, Alan McKay wrote:> Hey folks, > > CentOS / PostgreSQL shop over here. > > I'm hitting 3 of my favorite lists with this, so here's hoping that > the BCC trick is the right way to do it :-) > > We've just discovered thanks to a new Munin plugin > http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html > that our production DB is completely maxing out in I/O for about a 3 > hour stretch from 6am til 9am > This is "device utilization" as per the last graph at the above link. > > Load went down for a while but is now between 70% and 95% sustained. > We've only had this plugin going for less than a day so I don't really > have any more data going back further. But we've suspected a disk > issue for some time - just have not been able to prove it. > > Our system > IBM 3650 - quad 2Ghz e5405 Xeon > 8K SAS RAID Controller > 6 x 300G 15K/RPM SAS Drives > /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS > /dev/sdb - 3 drives configured as RAID5 for 600G for the DB > 1 drive as a global hot spare > > /dev/sdb is the one that is maxing out. > > We need to have a very serious look at fixing this situation. But we > don't have the money to be experimenting with solutions that won't > solve our problem. And our budget is fairly limited. > > Is there a public library somewhere of disk subsystems and their > performance figures? Done with some semblance of a standard > benchmark? > > One benchmark I am partial to is this one : > http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmarking_notes#dd_test > > One thing I am thinking of in the immediate term is taking the RAID5 + > hot spare and converting it to RAID10 with the same amount of storage. > Will that perform much better? >Does your RAID controller have battery-backed write-cache? That's needed to get good performance from RAID5. Changing to RAID-10 will definitely help.> In general we are planning to move away from RAID5 toward RAID10. >That's good. RAID5 isn't very good for database use..> We also have on order an external IBM array (don't have the exact name > on hand but model number was 3000) with 12 drive bays. We ordered it > with just 4 x SATAII drives, and were going to put it on a different > system as a RAID10. These are just 7200 RPM drives - the goal was > cheaper storage because the SAS drives are about twice as much per > drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC > the SATA2 drives are about $200 each and the SAS 300G drives about > $500 each. > > So I have 2 thoughts with this 12 disk array. 1 is to fill it up > with 12 x cheap SATA2 drives and hope that even though the spin-rate > is a lot slower, that the fact that it has more drives will make it > perform better. But somehow I am doubtful about that. The other > thought is to bite the bullet and fill it up with 300G SAS drives. > > any thoughts here? recommendations on what to do with a tight budget? > It could be the answer is that I just have to go back to the bean > counters and tell them we have no choice but to start spending some > real money. But on what? And how do I prove that this is the only > choice? >I'd use ltp disktest to measure the performance using various workloads.. sequential and random, using different amount of threads and different blocksizes. There's only so many random IOPS each disk can do.. 15k SAS drive will be 2-3x faster than 7200 rpm SATA disk, but still 4 disks total isn't THAT much.. -- Pasi
Raid10 should be better on your writes. Random reads and writes are the most important for a db. For random io the more spindles(disks) you have the better. I would use sas over sata if possible. How big is your database? If it is small you may be able to put it on a few solid state drives there really good for random io. I don't know of a website, but one would be nice. -----Original Message----- From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On Behalf Of Alan McKay Sent: Friday, October 09, 2009 9:45 AM To: Alan McKay Subject: [CentOS] disk I/O problems and Solutions Hey folks, CentOS / PostgreSQL shop over here. I'm hitting 3 of my favorite lists with this, so here's hoping that the BCC trick is the right way to do it :-) We've just discovered thanks to a new Munin plugin http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-wi th-munin.html that our production DB is completely maxing out in I/O for about a 3 hour stretch from 6am til 9am This is "device utilization" as per the last graph at the above link. Load went down for a while but is now between 70% and 95% sustained. We've only had this plugin going for less than a day so I don't really have any more data going back further. But we've suspected a disk issue for some time - just have not been able to prove it. Our system IBM 3650 - quad 2Ghz e5405 Xeon 8K SAS RAID Controller 6 x 300G 15K/RPM SAS Drives /dev/sda - 2 drives configured as a RAID 1 for 300G for the OS /dev/sdb - 3 drives configured as RAID5 for 600G for the DB 1 drive as a global hot spare /dev/sdb is the one that is maxing out. We need to have a very serious look at fixing this situation. But we don't have the money to be experimenting with solutions that won't solve our problem. And our budget is fairly limited. Is there a public library somewhere of disk subsystems and their performance figures? Done with some semblance of a standard benchmark? One benchmark I am partial to is this one : http://wiki.postgresql.org/wiki/PgCon_2009/Greg_Smith_Hardware_Benchmark ing_notes#dd_test One thing I am thinking of in the immediate term is taking the RAID5 + hot spare and converting it to RAID10 with the same amount of storage. Will that perform much better? In general we are planning to move away from RAID5 toward RAID10. We also have on order an external IBM array (don't have the exact name on hand but model number was 3000) with 12 drive bays. We ordered it with just 4 x SATAII drives, and were going to put it on a different system as a RAID10. These are just 7200 RPM drives - the goal was cheaper storage because the SAS drives are about twice as much per drive, and it is only a 300G drive versus the 1T SATA2 drives. IIRC the SATA2 drives are about $200 each and the SAS 300G drives about $500 each. So I have 2 thoughts with this 12 disk array. 1 is to fill it up with 12 x cheap SATA2 drives and hope that even though the spin-rate is a lot slower, that the fact that it has more drives will make it perform better. But somehow I am doubtful about that. The other thought is to bite the bullet and fill it up with 300G SAS drives. any thoughts here? recommendations on what to do with a tight budget? It could be the answer is that I just have to go back to the bean counters and tell them we have no choice but to start spending some real money. But on what? And how do I prove that this is the only choice? -- "Don't eat anything you've ever seen advertised on TV" - Michael Pollan, author of "In Defense of Food" _______________________________________________ CentOS mailing list CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos
On Fri, 9 Oct 2009, Alan McKay wrote:> Hey folks, > > CentOS / PostgreSQL shop over here. > > I'm hitting 3 of my favorite lists with this, so here's hoping that > the BCC trick is the right way to do it :-) > > We've just discovered thanks to a new Munin plugin > http://blogs.amd.co.at/robe/2008/12/graphing-linux-disk-io-statistics-with-munin.html > that our production DB is completely maxing out in I/O for about a 3 > hour stretch from 6am til 9am > This is "device utilization" as per the last graph at the above link.You make no mention of the file system that you are currently using here either. If you look and see I/O wait rather high than your file system could also be the culprit. EXT3/4 journalling is terrible under heavy load because the kjournald is single threaded. We have a user here that completely saturates a 4 core 8GB memory machine with only a few clients because it can't keep up with the journal writes. Are you seeing any high I/O waits and lots of kjournald's running? -- James A. Peltier Systems Analyst (FASNet), VIVARIUM Technical Director HPC Coordinator Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier at sfu.ca Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca http://blogs.sfu.ca/people/jpeltier MSN : subatomic_spam at hotmail.com Does your OS has a man 8 lart? http://www.xinu.nl/unix/humour/asr-manpages/lart.html