Antony Mawer
2008-Aug-27 08:41 UTC
Finding which GEOM provider is generating errors in a graid3
I have a FreeBSD 6.2-based server running a 1.2TB graid3 volume, which consists of 5x 320gb SATA hard drives. I've been getting errors in /var/log/messages from the graid3 volume, which I suspect means an underlying fault with one of the disks, but is there any way to decipher which one of these drives is throwing errors? I've checked smartctl -a /dev/adXX but nothing shows up there.. I'm wondering if this is the infamous ata driver bug(s) that may be rearing its ugly head.. Also, does anyone know what "ZoneXXFailed" items in the graid3 list output mean? Relevant output: $ graid3 status Name Status Components raid3/data1 COMPLETE ad12 ad14 ad16 ad18 ad20 $ graid3 list Geom name: data1 State: COMPLETE Components: 5 Flags: VERIFY GenID: 0 SyncID: 1 ID: 3700500186 Zone64kFailed: 791239 Zone64kRequested: 49197268 Zone16kFailed: 40204 Zone16kRequested: 1283738 Zone4kFailed: 12005939 Zone4kRequested: 2445799003 Providers: 1. Name: raid3/data1 Mediasize: 1280291731456 (1.2T) Sectorsize: 2048 Mode: r1w1e1 ... $ atacontrol list ... ATA channel 6: Master: ad12 <ST3320620AS/3.AAK> Serial ATA v1.0 ATA channel 7: Master: ad14 <ST3320620AS/3.AAK> Serial ATA v1.0 ATA channel 8: Master: ad16 <ST3320620AS/3.AAK> Serial ATA v1.0 ATA channel 9: Master: ad18 <ST3320620AS/3.AAK> Serial ATA v1.0 ATA channel 10: Master: ad20 <ST3320620AS/3.AAK> Serial ATA v1.0 Output in /var/log/messages:> Aug 27 17:17:27 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5 > Aug 27 17:25:45 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5 > Aug 27 17:25:45 backup last message repeated 7 times > Aug 27 17:25:45 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320176128, length=16384)]error = 5 > Aug 27 17:25:45 backup last message repeated 22 times > Aug 27 17:25:45 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320192512, length=16384)]error = 5 > Aug 27 17:25:45 backup last message repeated 21 times > Aug 27 17:38:24 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320176128, length=16384)]error = 5 > Aug 27 17:38:26 backup last message repeated 4 times > Aug 27 17:46:02 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5 > Aug 27 17:53:48 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5 > Aug 27 17:53:48 backup last message repeated 7 times > Aug 27 17:53:48 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320176128, length=16384)]error = 5 > Aug 27 17:53:48 backup last message repeated 22 times > Aug 27 17:53:48 backup kernel: g_vfs_done():raid3/data1[READ(offset=160320192512, length=16384)]error = 5 > Aug 27 17:53:49 backup last message repeated 21 timesCheers Antony
Jeremy Chadwick
2008-Aug-27 09:06 UTC
Finding which GEOM provider is generating errors in a graid3
On Wed, Aug 27, 2008 at 06:27:47PM +1000, Antony Mawer wrote:> I have a FreeBSD 6.2-based server running a 1.2TB graid3 volume, which > consists of 5x 320gb SATA hard drives. I've been getting errors in > /var/log/messages from the graid3 volume, which I suspect means an > underlying fault with one of the disks, but is there any way to decipher > which one of these drives is throwing errors? > > I've checked smartctl -a /dev/adXX but nothing shows up there..When you say "nothing shows up there", what exactly do you mean? A lot of people don't know how to read SMART statistics. I hope by "nothing shows up there" you mean "nothing stands out"> I'm wondering if this is the infamous ata driver bug(s) that may be > rearing its ugly head..The bugs in question only apply when there's kernel messages coming from the *disks themselves*, and not a GEOM provider. Your below dmesg doesn't indicate there's any ATA errors, just GEOM errors. If the disks were failing, you *would* be getting errors from the ATA subsystem, but you're not. I'm not familiar with GEOM "stuff", so I can't really comment on what all is going on here.> Also, does anyone know what "ZoneXXFailed" items in the graid3 list > output mean? > > Relevant output: > > $ graid3 status Name Status Components raid3/data1 COMPLETE ad12 > ad14 ad16 ad18 ad20 > > $ graid3 list Geom name: data1 State: COMPLETE Components: 5 Flags: > VERIFY GenID: 0 SyncID: 1 ID: 3700500186 Zone64kFailed: 791239 > Zone64kRequested: 49197268 Zone16kFailed: 40204 Zone16kRequested: > 1283738 Zone4kFailed: 12005939 Zone4kRequested: 2445799003 Providers: > 1. Name: raid3/data1 Mediasize: 1280291731456 (1.2T) Sectorsize: 2048 > Mode: r1w1e1 ... > > $ atacontrol list ... ATA channel 6: Master: ad12 <ST3320620AS/3.AAK> > Serial ATA v1.0 ATA channel 7: Master: ad14 <ST3320620AS/3.AAK> Serial > ATA v1.0 ATA channel 8: Master: ad16 <ST3320620AS/3.AAK> Serial ATA > v1.0 ATA channel 9: Master: ad18 <ST3320620AS/3.AAK> Serial ATA v1.0 > ATA channel 10: Master: ad20 <ST3320620AS/3.AAK> Serial ATA v1.0 > > > Output in /var/log/messages: > >> Aug 27 17:17:27 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 7 times Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 22 times Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320192512, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 21 times Aug 27 17:38:24 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:38:26 backup last message repeated >> 4 times Aug 27 17:46:02 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated >> 7 times Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated >> 22 times Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320192512, >> length=16384)]error = 5 Aug 27 17:53:49 backup last message repeated >> 21 times-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |