We recently had a disk fail on one of our whitebox (SuperMicro) ZFS arrays (Solaris 10 U9). The disk began throwing errors like this: May 5 04:33:44 dev-zfs4 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci15d9,400 at 0 (mpt_sas0): May 5 04:33:44 dev-zfs4 mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110610 And errors for the drive were incrementing in iostat -En output. Nothing was seen in fmdump. Unfortunately, it took about three hours for ZFS (or maybe it was MPT) to decide the drive was actually dead: May 5 07:41:06 dev-zfs4 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c5002cbc76c0 (sd4): May 5 07:41:06 dev-zfs4 drive offline During this three hours the I/O performance on this server was pretty bad and caused issues for us. Once the drive "failed" completely, ZFS pulled in a spare and all was well. My question is -- is there a way to tune the MPT driver or even ZFS itself to be more/less aggressive on what it sees as a "failure" scenario? I suppose this would have been handled differently / better if we''d been using real Sun hardware? Our other option is to watch better for log entries similar to the above and either alert someone or take some sort of automated action .. I''m hoping there''s a better way to tune this via driver or ZFS settings however. Thanks, Ray
In a recent post "r-mexico" wrote that they had to parse system messages and "manually" fail the drives on a similar, though different, occasion: http://opensolaris.org/jive/message.jspa?messageID=515815#515815 -- This message posted from opensolaris.org
On Tue, May 10, 2011 at 02:42:40PM -0700, Jim Klimov wrote:> In a recent post "r-mexico" wrote that they had to parse system > messages and "manually" fail the drives on a similar, though > different, occasion: > > http://opensolaris.org/jive/message.jspa?messageID=515815#515815Thanks Jim, good pointer. It sounds like our use of SATA disks is likely the problem and we''d have better error reporting with SAS or some of the "nearline" SAS drives (SATA drives with a real SAS controller on them). Ray
On Tue, May 10, 2011 at 9:18 AM, Ray Van Dolson <rvandolson at esri.com> wrote:> My question is -- is there a way to tune the MPT driver or even ZFS > itself to be more/less aggressive on what it sees as a "failure" > scenario?You didn''t mention what drives you had attached, but I''m guessing they were normal "desktop" drives. I suspect (but can''t confirm) that using enterprise drives with TLER / ERC / CCTL would have reported the failure up the stack faster than a consumer drive. The drives will report an error after 7 seconds rather than retry for several minutes. You may be able to enable the feature on your drives, depending on the manufacturer and firmware revision. -B -- Brandon High : bhigh at freaks.com
On Tue, May 10, 2011 at 03:57:28PM -0700, Brandon High wrote:> On Tue, May 10, 2011 at 9:18 AM, Ray Van Dolson <rvandolson at esri.com> wrote: > > My question is -- is there a way to tune the MPT driver or even ZFS > > itself to be more/less aggressive on what it sees as a "failure" > > scenario? > > You didn''t mention what drives you had attached, but I''m guessing they > were normal "desktop" drives. > > I suspect (but can''t confirm) that using enterprise drives with TLER / > ERC / CCTL would have reported the failure up the stack faster than a > consumer drive. The drives will report an error after 7 seconds rather > than retry for several minutes. > > You may be able to enable the feature on your drives, depending on the > manufacturer and firmware revision. > > -BYup, shoulda included that. These are regular SATA drives -- supposedly "Enterprise" whatever that gives us (most likely a higher MTBF number). We''ll probably look at going with nearline SAS drives (only increases cost slightly) and write a small SEC rule on our syslog server to watch for 0x31111000 errors on servers with SATA disks only so we can at least be alerted more quickly. Ray
On May 10, 2011, at 9:18 AM, Ray Van Dolson wrote:> We recently had a disk fail on one of our whitebox (SuperMicro) ZFS > arrays (Solaris 10 U9). > > The disk began throwing errors like this: > > May 5 04:33:44 dev-zfs4 scsi: [ID 243001 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci15d9,400 at 0 (mpt_sas0): > May 5 04:33:44 dev-zfs4 mptsas_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110610These are commonly seen when hardware is having difficulty and devices are being reset.> > And errors for the drive were incrementing in iostat -En output. > Nothing was seen in fmdump.That is unusual because the ereports are sent along with the code that increments error counters in sd. Are you sure you ran "fmdump -e" as root or with appropriate privileges?> > Unfortunately, it took about three hours for ZFS (or maybe it was MPT) > to decide the drive was actually dead: > > May 5 07:41:06 dev-zfs4 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000c5002cbc76c0 (sd4): > May 5 07:41:06 dev-zfs4 drive offline > > During this three hours the I/O performance on this server was pretty > bad and caused issues for us. Once the drive "failed" completely, ZFS > pulled in a spare and all was well. > > My question is -- is there a way to tune the MPT driver or even ZFS > itself to be more/less aggressive on what it sees as a "failure" > scenario?mpt driver is closed source. Contact the source author for such details. mpt_sas is open source, but the decision to retire for Solaris-derived OSes is done via the Fault Management Architecture (FMA) agents. Many of these have tunable algorithms, but AFAIK they are only documented in source. That said, there are failure modes that do not fit the current algorithms very well. Feel free to propose alternatives.> > I suppose this would have been handled differently / better if we''d > been using real Sun hardware?Maybe, maybe not. These are generic conditions and can be seen on all sorts of hardware under a wide variety of failure conditions. -- richard> > Our other option is to watch better for log entries similar to the > above and either alert someone or take some sort of automated action > .. I''m hoping there''s a better way to tune this via driver or ZFS > settings however. > > Thanks, > Ray > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss