Pieter de Boer
2010-May-14 17:42 UTC
Read / write timeouts on SATA disks connected to ICH9
Hi list, I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA controller on-board (do not have the RAID controller). The system has 2 disks in a gmirror setup. Every now and then, probably under some load, one of the disks gets read or write timeouts like: May 5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command May 5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5). ad4[WRITE(offset=200404975104, length=16384)] May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected. or: May 13 14:41:26 aberdeen kernel: ad6: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=975513887 Sometimes the read/write succeeds after a few retries, but sometimes it does not, so geom_mirror throws the disk out of the mirror. Tonight ad6 was thrown out of the mirror and ad4 then gave actual read errors, resulting in a big mess :( My question: does anyone have experience with FreeBSD on a Dell R300 or can anyone give me some help in trying to fix the timeouts? I was told using AHCI could be better for SATA disks, but apparently (http://permalink.gmane.org/gmane.linux.kernel.pci/8267) the BIOS does not support turning that on, so that does not appear to be an option. Thanks, Pieter
Adam Vande More
2010-May-14 17:53 UTC
Read / write timeouts on SATA disks connected to ICH9
On Fri, May 14, 2010 at 12:42 PM, Pieter de Boer <pieter@os3.nl> wrote:> I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 SATA > controller on-board (do not have the RAID controller). > > The system has 2 disks in a gmirror setup. Every now and then, probably > under some load, one of the disks gets read or write timeouts like: > May 5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command > May 5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command > May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed (error=5). > ad4[WRITE(offset=200404975104, length=16384)] > May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider ad4 > disconnected. >Have you tried replacing/checking the cables? Does it always happen to ad4? Your drive could be dying, try swapping it out and see if the errors persist. -- Adam Vande More
Jeremy Chadwick
2010-May-14 19:53 UTC
Read / write timeouts on SATA disks connected to ICH9
On Fri, May 14, 2010 at 07:42:33PM +0200, Pieter de Boer wrote:> Hi list, > > I'm running FreeBSD 8.0-RELEASE-p1 on a Dell R300 which has a ICH9 > SATA controller on-board (do not have the RAID controller). > > The system has 2 disks in a gmirror setup. Every now and then, > probably under some load, one of the disks gets read or write > timeouts like: > May 5 03:01:37 aberdeen kernel: ad4: timeout waiting to issue command > May 5 03:01:37 aberdeen kernel: ad4: error issuing WRITE_DMA48 command > May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Request failed > (error=5). ad4[WRITE(offset=200404975104, length=16384)] > May 5 03:01:37 aberdeen kernel: GEOM_MIRROR: Device gm0: provider > ad4 disconnected. > > or: > > May 13 14:41:26 aberdeen kernel: ad6: TIMEOUT - READ_DMA48 retrying > (1 retry left) LBA=975513887 > > Sometimes the read/write succeeds after a few retries, but sometimes > it does not, so geom_mirror throws the disk out of the mirror. > > Tonight ad6 was thrown out of the mirror and ad4 then gave actual > read errors, resulting in a big mess :( > > My question: does anyone have experience with FreeBSD on a Dell R300 > or can anyone give me some help in trying to fix the timeouts?Could you please do the following: - Provide output from "vmstat -i" - Provide output from "dmesg | grep -i ata" - Install ports/sysutils/smartmontools (5.40 or later) and provide full output from commands "smartctl -a /dev/ad4" and "smartctl -a /dev/ad6" -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
On Fri May 14 22:42:38 UTC 2010, Jeremy Chadwick wrote:> Finally, your vmstat -i output: > > > # vmstat -i > > interrupt total rate > > irq23: atapci0 371021299 10423 > > Good to know there's no IRQ sharing going on, but what does worry me is > the interrupt rate (10K interrupts/second). That seems *extremely* > high, but it also depends on what kind of disk I/O is happening on this > system -- especially since you have 2 disks attached to the same > controller.I have a bunch of R300's here. From one that is using the on-board SATA and 2 drives in a gmirror setup (very similar to the OP) after 18 hours of uptime: [0:2] speedtest:~> vmstat -i interrupt total rate irq23: atapci0 254116 3 I haven't specifically done any stress testing on this box, though I did do a "make -j8 buildworld" during the initial gmirror synchronization. 8-} The drives are a pair of Dell-labeled 160GB "SAMSUNG HE161HJ 1AC01121" that shipped with the box. I also have another R300 with Dell's "SAS 6/iR" card (a re-branded LSI 1068-something, seen as "mpt" by FreeBSD). While Dell only sells that as part of a package deal with the hot-swap backplane and redundant power supplies, there's no reason you couldn't pick one up on eBay and add it yourself. You'll need some sort of breakout cable to get from the big connector on the SAS 6 to individual SATA ports. Terry Kennedy http://www.tmk.com terry@tmk.com New York, NY USA