Antony Mawer
2008-Aug-27 08:41 UTC
Finding which GEOM provider is generating errors in a graid3
I have a FreeBSD 6.2-based server running a 1.2TB graid3 volume, which
consists of 5x 320gb SATA hard drives. I've been getting errors in
/var/log/messages from the graid3 volume, which I suspect means an
underlying fault with one of the disks, but is there any way to decipher
which one of these drives is throwing errors?
I've checked smartctl -a /dev/adXX but nothing shows up there.. I'm
wondering if this is the infamous ata driver bug(s) that may be rearing
its ugly head..
Also, does anyone know what "ZoneXXFailed" items in the graid3 list
output mean?
Relevant output:
$ graid3 status
Name Status Components
raid3/data1 COMPLETE ad12
ad14
ad16
ad18
ad20
$ graid3 list
Geom name: data1
State: COMPLETE
Components: 5
Flags: VERIFY
GenID: 0
SyncID: 1
ID: 3700500186
Zone64kFailed: 791239
Zone64kRequested: 49197268
Zone16kFailed: 40204
Zone16kRequested: 1283738
Zone4kFailed: 12005939
Zone4kRequested: 2445799003
Providers:
1. Name: raid3/data1
Mediasize: 1280291731456 (1.2T)
Sectorsize: 2048
Mode: r1w1e1
...
$ atacontrol list
...
ATA channel 6:
Master: ad12 <ST3320620AS/3.AAK> Serial ATA v1.0
ATA channel 7:
Master: ad14 <ST3320620AS/3.AAK> Serial ATA v1.0
ATA channel 8:
Master: ad16 <ST3320620AS/3.AAK> Serial ATA v1.0
ATA channel 9:
Master: ad18 <ST3320620AS/3.AAK> Serial ATA v1.0
ATA channel 10:
Master: ad20 <ST3320620AS/3.AAK> Serial ATA v1.0
Output in /var/log/messages:
> Aug 27 17:17:27 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5
> Aug 27 17:25:45 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5
> Aug 27 17:25:45 backup last message repeated 7 times
> Aug 27 17:25:45 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320176128, length=16384)]error = 5
> Aug 27 17:25:45 backup last message repeated 22 times
> Aug 27 17:25:45 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320192512, length=16384)]error = 5
> Aug 27 17:25:45 backup last message repeated 21 times
> Aug 27 17:38:24 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320176128, length=16384)]error = 5
> Aug 27 17:38:26 backup last message repeated 4 times
> Aug 27 17:46:02 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5
> Aug 27 17:53:48 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320159744, length=16384)]error = 5
> Aug 27 17:53:48 backup last message repeated 7 times
> Aug 27 17:53:48 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320176128, length=16384)]error = 5
> Aug 27 17:53:48 backup last message repeated 22 times
> Aug 27 17:53:48 backup kernel:
g_vfs_done():raid3/data1[READ(offset=160320192512, length=16384)]error = 5
> Aug 27 17:53:49 backup last message repeated 21 times
Cheers
Antony
Jeremy Chadwick
2008-Aug-27 09:06 UTC
Finding which GEOM provider is generating errors in a graid3
On Wed, Aug 27, 2008 at 06:27:47PM +1000, Antony Mawer wrote:> I have a FreeBSD 6.2-based server running a 1.2TB graid3 volume, which > consists of 5x 320gb SATA hard drives. I've been getting errors in > /var/log/messages from the graid3 volume, which I suspect means an > underlying fault with one of the disks, but is there any way to decipher > which one of these drives is throwing errors? > > I've checked smartctl -a /dev/adXX but nothing shows up there..When you say "nothing shows up there", what exactly do you mean? A lot of people don't know how to read SMART statistics. I hope by "nothing shows up there" you mean "nothing stands out"> I'm wondering if this is the infamous ata driver bug(s) that may be > rearing its ugly head..The bugs in question only apply when there's kernel messages coming from the *disks themselves*, and not a GEOM provider. Your below dmesg doesn't indicate there's any ATA errors, just GEOM errors. If the disks were failing, you *would* be getting errors from the ATA subsystem, but you're not. I'm not familiar with GEOM "stuff", so I can't really comment on what all is going on here.> Also, does anyone know what "ZoneXXFailed" items in the graid3 list > output mean? > > Relevant output: > > $ graid3 status Name Status Components raid3/data1 COMPLETE ad12 > ad14 ad16 ad18 ad20 > > $ graid3 list Geom name: data1 State: COMPLETE Components: 5 Flags: > VERIFY GenID: 0 SyncID: 1 ID: 3700500186 Zone64kFailed: 791239 > Zone64kRequested: 49197268 Zone16kFailed: 40204 Zone16kRequested: > 1283738 Zone4kFailed: 12005939 Zone4kRequested: 2445799003 Providers: > 1. Name: raid3/data1 Mediasize: 1280291731456 (1.2T) Sectorsize: 2048 > Mode: r1w1e1 ... > > $ atacontrol list ... ATA channel 6: Master: ad12 <ST3320620AS/3.AAK> > Serial ATA v1.0 ATA channel 7: Master: ad14 <ST3320620AS/3.AAK> Serial > ATA v1.0 ATA channel 8: Master: ad16 <ST3320620AS/3.AAK> Serial ATA > v1.0 ATA channel 9: Master: ad18 <ST3320620AS/3.AAK> Serial ATA v1.0 > ATA channel 10: Master: ad20 <ST3320620AS/3.AAK> Serial ATA v1.0 > > > Output in /var/log/messages: > >> Aug 27 17:17:27 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 7 times Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 22 times Aug 27 17:25:45 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320192512, >> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated >> 21 times Aug 27 17:38:24 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:38:26 backup last message repeated >> 4 times Aug 27 17:46:02 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320159744, >> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated >> 7 times Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320176128, >> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated >> 22 times Aug 27 17:53:48 backup kernel: >> g_vfs_done():raid3/data1[READ(offset=160320192512, >> length=16384)]error = 5 Aug 27 17:53:49 backup last message repeated >> 21 times-- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |