We're having a hard to track down a recurring problem that we think has some problem to do with disc i/o access. This is slightly beyond our level of expertise and were hoping someone could shed some light on the following problem. If this has been dealt with previously, please let me know under what topic so I can do relevant searches. Description of Server configuration and problems. Configuration: FreeBSD 4.4-RELEASE apache+mod_ssl-1.3.22+2.8.5_1 mysql-server-3.23.42 Hard Drive (dual ATA disks, no RAID) on a Dell P4 PowerEDGE server We had many speed and timeout issues so we recompiled the kernel with maxusers=128 instead of the previous 32 and moved the hard drives into a new P4 2.8Ghz PowerEdge Case. atapci0: <Generic PCI ATA controller> port 0xffa0-0xffaf,0x374-0x377,0x170- 0x177,0x3f4-0x3f7,0x1f0-0x1f7 irq 11 at device 31.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 atapci1: <Generic PCI ATA controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20- 0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 mem 0xdff3fc00-0xdff3ffff irq 5 at device 31.2 on pci0 ata2: at 0xfe00 on atapci1 ata3: at 0xfe20 on atapci1 This cleared up all errors found in fstat and the "file table is full" errors. Server is configured to hold 2 medium sized MySQL DB's accessed through various perl and php scripts via websites on the server. Uptime: 78463 Threads: 18 Questions: 351685 Slow queries: 328 Opens: 2793 Flush tables: 1 Open tables: 64 Queries per second avg: 4.482 Each httpd process is of size 15MB to 20MB * about 15 processes Mysqld process is of size 32MB, resident 16MB Server load is now usually between .24 and .44 With the new kernel, we observed the server for about a week and are trying to pinpoint this issue: If we run pine on a large mailbox or any other disk i/o intensive task, all other processes in motion seem to stall until the disk i/o is complete. This manifests itself in timeouts on webpages that require DB data, IMAP timeouts for other mail accounts, and even odd console/shell behaviour. For instance, at a console/ssh prompt we would run "uptime" or "pwd" and there would be up to a 10 second delay before results are returned. The results also show no heavy load (<.40) on the CPU. Rebooting the server eliminates speed issues for up to a couple of hours. Any thoughts on how to diagnose if this really is a disc I/O issue and how to resolve it would be most helpful! Thank you, __________________________________ Oren Baum Creative Image Communications Inc. oren@creativeimage.ca
Chuck Swiger
2005-Jan-28 14:14 UTC
Seemingly odd disc i/o behaviour, need help to diagnose
Hello-- Oren Baum wrote: [ ... ]> Configuration: > > FreeBSD 4.4-RELEASE > apache+mod_ssl-1.3.22+2.8.5_1 > mysql-server-3.23.42 > Hard Drive (dual ATA disks, no RAID) on a Dell P4 PowerEDGE serverYour software is several years out of date, and there are important security holes in at least FreeBSD and apache which have since been fixed. You ought to upgrade to recent versions and retest the system to see whether your problems get solved as a result... -- -Chuck
Mark Kirkwood
2005-Jan-28 16:21 UTC
Seemingly odd disc i/o behaviour, need help to diagnose
Oren Baum wrote:> > Configuration: > > FreeBSD 4.4-RELEASE > apache+mod_ssl-1.3.22+2.8.5_1 > mysql-server-3.23.42 > Hard Drive (dual ATA disks, no RAID) on a Dell P4 PowerEDGE server > > We had many speed and timeout issues so we recompiled the kernel with > maxusers=128 instead of the previous 32 and moved the hard drives into a new > P4 2.8Ghz PowerEdge Case. >What model number Poweredge ? (so we can examine specifications if need be). The amount and type of RAM would be good to know as well.> atapci0: <Generic PCI ATA controller> port 0xffa0-0xffaf,0x374-0x377,0x170- > 0x177,0x3f4-0x3f7,0x1f0-0x1f7 irq 11 at device 31.1 on pci0 > ata0: at 0x1f0 irq 14 on atapci0 > atapci1: <Generic PCI ATA controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20- > 0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 mem 0xdff3fc00-0xdff3ffff irq 5 at device > 31.2 on pci0 > ata2: at 0xfe00 on atapci1 > ata3: at 0xfe20 on atapci1 >FreeBSD 4.4 is detecting a 'Generic PCI ATA Controller', which may mean the capabilities are very dumped down (and performance is poor) - as no doubt other posters will mention, let me suggest 4.10 or 4.11 for better HW detection :-)> If we run pine on a large mailbox or any other disk i/o intensive task, all > other processes in motion seem to stall until the disk i/o is complete. > This manifests itself in timeouts on webpages that require DB data, IMAP > timeouts for other mail accounts, and even odd console/shell behaviour. >An additional possibility is that your old hard drives are becoming worn out. It is probably worth buying 2 fast new ones (maybe SATA if your Dell supports it - I suspect you will *need* be off 4.4 before trying this tho!). regards Mark
On Fri, 28 Jan 2005, Oren Baum wrote:> Description of Server configuration and problems. > > Configuration: > > FreeBSD 4.4-RELEASE > apache+mod_ssl-1.3.22+2.8.5_1 > mysql-server-3.23.42 > Hard Drive (dual ATA disks, no RAID) on a Dell P4 PowerEDGE serverIt bears mentioning again ... these releases are very old and have known security issues. If this server is accessible to the public Internet its probably been hacked several times over. An upgrade to at least 4.8-RELEASE+patches, apache 1.3.33, and the latest mysql-server-3.23 is recommended.> We had many speed and timeout issues so we recompiled the kernel with > maxusers=128 instead of the previous 32 and moved the hard drives into a new > P4 2.8Ghz PowerEdge Case.I'd think this is a PE1750 but those have SCSI disks, not IDE.> atapci0: <Generic PCI ATA controller> port 0xffa0-0xffaf,0x374-0x377,0x170- > 0x177,0x3f4-0x3f7,0x1f0-0x1f7 irq 11 at device 31.1 on pci0 > ata0: at 0x1f0 irq 14 on atapci0 > atapci1: <Generic PCI ATA controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20- > 0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 mem 0xdff3fc00-0xdff3ffff irq 5 at device > 31.2 on pci0 > ata2: at 0xfe00 on atapci1 > ata3: at 0xfe20 on atapci1Also as noted before this may be causing your disks to run at UDMA33 and not at their full interface speed. The 'atacontrol mode ' command will display the current settings (although you'll have to specify a bus, in this case 0, 2, and 3).> Server is configured to hold 2 medium sized MySQL DB's accessed through > various perl and php scripts via websites on the server. > > Uptime: 78463 Threads: 18 Questions: 351685 Slow queries: 328 Opens: 2793 > Flush tables: 1 Open tables: 64 Queries per second avg: 4.482 > > Each httpd process is of size 15MB to 20MB * about 15 processes > Mysqld process is of size 32MB, resident 16MBHow much RAM does the machine have? A full dmesg output would be nice for reference.> If we run pine on a large mailbox or any other disk i/o intensive task, > all other processes in motion seem to stall until the disk i/o is > complete. This manifests itself in timeouts on webpages that require DB > data, IMAP timeouts for other mail accounts, and even odd console/shell > behaviour.I suspect this is simple I/O contention from one of two sources: 1. mysql keeps the disks loaded enough that when pine slurps in the big file its able to out-contend the other processes. 2. pine bloats badly with large mailboxes, enough to cause other processes to swap, which kills performance dead. I'd suggest running 'iostat 1' in one window or VTY, 'top -s1' in another, watch for a bit to get a baseline, then start up pine in another window and watch the numbers change. In the top window, watch for the "Swap" line to change with +In and +Out lines, and for the total use to go up. In the iostat window watch how big MB/s gets, and the 'sy' column on the far right, which is system CPU time as a percent. If you're seeing the iostat MB/s go up to a level and stop, but swap stay still, then its case #1. If you see iostat go up and Swap to start increasing then its case #2. In case #2 the System CPU % should go up radically (+10-20%) compared to baseline. To fix case #1 you'll need to get faster storage, either by upgrading the OS to make full use of the available ATA channels or by getting a more capable storage subsystem (SCSI+disk array, for example). To fix case #2 you need to install more physical RAM. ATA does not handle multiple transaction streams well since each bus can only take one command at a time. The ata driver attempts to optimize the order of these operations but compared to SCSI will not scale well in the face of large quantities of diverse transactions. (Before people get up in arms, I'll say that YMMV depending on workload, available cache, interface speeds, etc.) There is, of course, option #3 which is to move your mail to another machine. :-) -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org