Justin Vassallo
2008-Jul-30 11:49 UTC
[zfs-discuss] zfs hanging on disk failure workaround
Hi, I''ve had 3 zfs file systems hang completely when one of the drives in their pool fails. This has happened on both USB as well as internal SAS drives. In /var/adm/messages, I''d get this kind of msg: Jul 29 13:45:24 zen SCSI transport failed: reason ''timeout'': retrying command Jul 29 13:48:24 zen scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci108e,cb84 at 2,1/hub at 1/hub at 4/storage at 3/disk at 0,0 (sd17): Jul 29 13:48:24 zen SCSI transport failed: reason ''timeout'': giving up Is there a SCSI and/OR zfs timeout setting I can tune to tell it to flag a drive as faulty and stop attempting to access it? I recently replaced some drives by WD drives, set up with a 7s TLER, but this has not helped the issue! justin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080730/adf79cdf/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080730/adf79cdf/attachment.bin>
Are you running the Solaris CIFS Server by any chance? This message posted from opensolaris.org
Which OS release/version? -- richard Justin Vassallo wrote:> > Hi, > > I?ve had 3 zfs file systems hang completely when one of the drives in > their pool fails. This has happened on both USB as well as internal > SAS drives. In /var/adm/messages, I?d get this kind of msg: > > Jul 29 13:45:24 zen SCSI transport failed: reason ''timeout'': retrying > command > > Jul 29 13:48:24 zen scsi: [ID 107833 kern.warning] WARNING: > /pci at 0,0/pci108e,cb84 at 2,1/hub at 1/hub at 4/storage at 3/disk at 0,0 (sd17): > > Jul 29 13:48:24 zen SCSI transport failed: reason ''timeout'': giving up > > Is there a SCSI and/OR zfs timeout setting I can tune to tell it to > flag a drive as faulty and stop attempting to access it? > > I recently replaced some drives by WD drives, set up with a 7s TLER, > but this has not helped the issue! > > justin > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Justin Vassallo
2008-Jul-30 19:58 UTC
[zfs-discuss] zfs hanging on disk failure workaround
All 3 boxes I had disk failures on are SunFire x4200 M2 running Solaris 10 11/06 s10x_u3wos_10 X86 w the zfs it comes with, ie v3 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080730/40f93a64/attachment.bin>
Justin Vassallo wrote:> All 3 boxes I had disk failures on are SunFire x4200 M2 running > > Solaris 10 11/06 s10x_u3wos_10 X86 w the zfs it comes with, ie v3 >That version of ZFS is nearly 3 years old... has it been patched at all? Even if it has been patched, its fault handling capabilities will not be anything like the recent SXCE or OpenSolaris 2008.05 platforms. -- richard
Justin Vassallo
2008-Jul-30 20:51 UTC
[zfs-discuss] zfs hanging on disk failure workaround
Hi Richard, The version on stable Solaris is v4 at best today. I definitely do not want to go away from stable Solaris for my production environment, not least because I want to continue my Solaris support contracts. I will be attaching a Sun 2540FC array to these servers in the coming weeks and I was intending to run zfs over the hardware raid. But seeing your comment I''m inclined to go back on that. Given the huge advances in zfs since that version, is installing latest zfs version from source an option I should at all consider? Or am I better put discarding zfs altogether? justin -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080730/2513e8cc/attachment.bin>
Justin Vassallo wrote:> Hi Richard, > > The version on stable Solaris is v4 at best today. I definitely do not want > to go away from stable Solaris for my production environment, not least > because I want to continue my Solaris support contracts. >Don''t confuse the ZFS on-disk format version with software versions. There have been many software fixes in the past 3 years.> I will be attaching a Sun 2540FC array to these servers in the coming weeks > and I was intending to run zfs over the hardware raid. But seeing your > comment I''m inclined to go back on that. Given the huge advances in zfs > since that version, is installing latest zfs version from source an option I > should at all consider? Or am I better put discarding zfs altogether? >If I were you, I would look to Solaris 10 5/2008 before discarding. I cannot say what to recommend instead, because I do not know your requirements, but I think it is safe to say that ZFS is already much better for your data than UFS. -- richard
Justin Vassallo
2008-Jul-30 21:11 UTC
[zfs-discuss] zfs hanging on disk failure workaround
Thank you for the feedback Justin -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3361 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080730/706693d3/attachment.bin>