Jason J. W. Williams
2007-Dec-04 09:43 UTC
[zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.
Hey Guys, Have any of y''all seen a condition where the ILOM considers a disk faulted (status is 3 instead of 1), but ZFS keeps writing to the disk and doesn''t report any errors? I''m going to do a scrub tomorrow and see what comes back. I''m curious what caused the ILOM to fault the disk. Any advice is greatly appreciated. Best Regards, Jason P.S. The system is running OpenSolaris Build 54.
Ralf Ramge
2007-Dec-04 09:54 UTC
[zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.
Jason J. W. Williams wrote:> Have any of y''all seen a condition where the ILOM considers a disk > faulted (status is 3 instead of 1), but ZFS keeps writing to the disk > and doesn''t report any errors? I''m going to do a scrub tomorrow and > see what comes back. I''m curious what caused the ILOM to fault the > disk. Any advice is greatly appreciated. >What does `iostat -E` tell you? I''ve experienced several times that ZFS is very fault tolerant - a bit too tolerant for my taste - when it comes to faulting a disk. I saw external FC drives with hundreds or even thousands of errors, even entire hanging loops or drives with hardware trouble, and neither ZFS nor /var/adm/messages reported a problem. So I prefer examining the iostat output over `zpool status` - but with the unattractive side effect that it''s not possible to reset the error count which iostat reports without a reboot, so this method is not suitable for monitoring purposes. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 ralf.ramge at webde.de - http://web.de/ 1&1 Internet AG Brauerstra?e 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren
Jason J. W. Williams
2007-Dec-04 10:13 UTC
[zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.
Hi Ralf, Thank you for the suggestion. About half of the disks are reporting 1968-1969 in the "Soft Errors" field. All disks are reporting 1968 in the "Illegal Request" field. There don''t appear to be any other errors; all other counters are 0. The Illegal Request count seems a little fishy...like iostat -E doesn''t like the X4500 for some reason. Thank you again for your help. Best Regards, Jason On Dec 4, 2007 2:54 AM, Ralf Ramge <ralf.ramge at webde.de> wrote:> Jason J. W. Williams wrote: > > Have any of y''all seen a condition where the ILOM considers a disk > > faulted (status is 3 instead of 1), but ZFS keeps writing to the disk > > and doesn''t report any errors? I''m going to do a scrub tomorrow and > > see what comes back. I''m curious what caused the ILOM to fault the > > disk. Any advice is greatly appreciated. > > > What does `iostat -E` tell you? > > I''ve experienced several times that ZFS is very fault tolerant - a bit > too tolerant for my taste - when it comes to faulting a disk. I saw > external FC drives with hundreds or even thousands of errors, even > entire hanging loops or drives with hardware trouble, and neither ZFS > nor /var/adm/messages reported a problem. So I prefer examining the > iostat output over `zpool status` - but with the unattractive side > effect that it''s not possible to reset the error count which iostat > reports without a reboot, so this method is not suitable for monitoring > purposes. > > -- > > Ralf Ramge > Senior Solaris Administrator, SCNA, SCSA > > Tel. +49-721-91374-3963 > ralf.ramge at webde.de - http://web.de/ > > 1&1 Internet AG > Brauerstra?e 48 > 76135 Karlsruhe > > Amtsgericht Montabaur HRB 6484 > > Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss > Aufsichtsratsvorsitzender: Michael Scheeren > >