Jason J. W. Williams
2008-Jan-03 21:57 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
Hello, There seems to be a persistent issue we have with ZFS where one of the SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS does not offline the disk and instead hangs all zpools across the system. If it is not caught soon enough, application data ends up in an inconsistent state. We''ve had this issue with b54 through b77 (as of last night). We don''t seem to be the only folks with this issue reading through the archives. Are there any plans to fix this behavior? It really makes ZFS less than desirable/reliable. Best Regards, Jason
Eric Schrock
2008-Jan-03 22:03 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
This should be pretty much fixed on build 77. It will lock up for the duration of a single command timeout, but ZFS should recover quickly without queueing up additional commands. Since the default timeout is 60 seconds, and we retry 3 times, and we do a probe afterwards, you may see hangs of up to 6 minutes. Unfortunately there''s not much we can do, since that''s the minimum amount of time to do two I/O operations to a single drive (one that fails and one to do a basic probe of the disk). You can tune down ''sd_io_time'' to a more reasonable value to get shorter command timeouts, but this may break slow things (like powered down CD-ROM drives). Other options at the ZFS level could be imagined, but would require per-pool tunables: 1. Allowing I/O to complete as soon as it was on enough devices, instead of replicating to all devices. 2. Inventing a per-pool tunable that controlled timeouts independent of SCSI timeouts. Neither of these is trivial, and both potentially compromise data integrity, hence the lack of such features. There''s no easy solution to the problem, but we''re happy to hear ideas. - Eric On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:> Hello, > > There seems to be a persistent issue we have with ZFS where one of the > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > does not offline the disk and instead hangs all zpools across the > system. If it is not caught soon enough, application data ends up in > an inconsistent state. We''ve had this issue with b54 through b77 (as > of last night). > > We don''t seem to be the only folks with this issue reading through the > archives. Are there any plans to fix this behavior? It really makes > ZFS less than desirable/reliable. > > Best Regards, > Jason > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, FishWorks http://blogs.sun.com/eschrock
Albert Chin
2008-Jan-03 22:07 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:> There seems to be a persistent issue we have with ZFS where one of the > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > does not offline the disk and instead hangs all zpools across the > system. If it is not caught soon enough, application data ends up in > an inconsistent state. We''ve had this issue with b54 through b77 (as > of last night). > > We don''t seem to be the only folks with this issue reading through the > archives. Are there any plans to fix this behavior? It really makes > ZFS less than desirable/reliable.http://blogs.sun.com/eschrock/entry/zfs_and_fma FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68: http://www.opensolaris.org/os/community/arc/caselog/2007/283/ http://www.opensolaris.org/os/community/on/flag-days/all/ -- albert chin (china at thewrittenword.com)
Jason J. W. Williams
2008-Jan-03 22:11 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
Hi Albert, Thank you for the link. ZFS isn''t offlining the disk in b77. -J On Jan 3, 2008 3:07 PM, Albert Chin <opensolaris-zfs-discuss at mlists.thewrittenword.com> wrote:> > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > > There seems to be a persistent issue we have with ZFS where one of the > > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > > does not offline the disk and instead hangs all zpools across the > > system. If it is not caught soon enough, application data ends up in > > an inconsistent state. We''ve had this issue with b54 through b77 (as > > of last night). > > > > We don''t seem to be the only folks with this issue reading through the > > archives. Are there any plans to fix this behavior? It really makes > > ZFS less than desirable/reliable. > > http://blogs.sun.com/eschrock/entry/zfs_and_fma > > FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68: > http://www.opensolaris.org/os/community/arc/caselog/2007/283/ > http://www.opensolaris.org/os/community/on/flag-days/all/ > > -- > albert chin (china at thewrittenword.com) > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Jason J. W. Williams
2008-Jan-03 22:14 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
Hi Eric, I''d really like to suggest a helpful idea, but all I can suggest is an end result. Running ZFS on top of STK arrays doing the RAID, they offline their bad disks very quickly and the applications never notice. In the X4500s, ZFS times out and locks up the applications. If ZFS is going to be able to compete with the more traditional arrays it seems the failure behavior has to be just as seamless. -J On Jan 3, 2008 3:03 PM, Eric Schrock <eric.schrock at sun.com> wrote:> This should be pretty much fixed on build 77. It will lock up for the > duration of a single command timeout, but ZFS should recover quickly > without queueing up additional commands. Since the default timeout is > 60 seconds, and we retry 3 times, and we do a probe afterwards, you may > see hangs of up to 6 minutes. Unfortunately there''s not much we can do, > since that''s the minimum amount of time to do two I/O operations to a > single drive (one that fails and one to do a basic probe of the disk). > You can tune down ''sd_io_time'' to a more reasonable value to get shorter > command timeouts, but this may break slow things (like powered down > CD-ROM drives). > > Other options at the ZFS level could be imagined, but would require > per-pool tunables: > > 1. Allowing I/O to complete as soon as it was on enough devices, instead > of replicating to all devices. > > 2. Inventing a per-pool tunable that controlled timeouts independent > of SCSI timeouts. > > Neither of these is trivial, and both potentially compromise data > integrity, hence the lack of such features. There''s no easy solution to > the problem, but we''re happy to hear ideas. > > - Eric > > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > > > Hello, > > > > There seems to be a persistent issue we have with ZFS where one of the > > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > > does not offline the disk and instead hangs all zpools across the > > system. If it is not caught soon enough, application data ends up in > > an inconsistent state. We''ve had this issue with b54 through b77 (as > > of last night). > > > > We don''t seem to be the only folks with this issue reading through the > > archives. Are there any plans to fix this behavior? It really makes > > ZFS less than desirable/reliable. > > > > Best Regards, > > Jason > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, FishWorks http://blogs.sun.com/eschrock >
Eric Schrock
2008-Jan-03 22:33 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
When you say "starts throwing sense errors," does that mean every I/O to the drive will fail, or some arbitrary percentage of I/Os will fail? If it''s the latter, ZFS is trying to do the right thing by recognizing these as transient errors, but eventually the ZFS diagnosis should kick in. What does ''::spa -ve'' in ''mdb -k'' show in one of these situations? How about ''::zio_state''? - Eric On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote:> Hi Albert, > > Thank you for the link. ZFS isn''t offlining the disk in b77. > > -J > > On Jan 3, 2008 3:07 PM, Albert Chin > <opensolaris-zfs-discuss at mlists.thewrittenword.com> wrote: > > > > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > > > There seems to be a persistent issue we have with ZFS where one of the > > > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > > > does not offline the disk and instead hangs all zpools across the > > > system. If it is not caught soon enough, application data ends up in > > > an inconsistent state. We''ve had this issue with b54 through b77 (as > > > of last night). > > > > > > We don''t seem to be the only folks with this issue reading through the > > > archives. Are there any plans to fix this behavior? It really makes > > > ZFS less than desirable/reliable. > > > > http://blogs.sun.com/eschrock/entry/zfs_and_fma > > > > FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68: > > http://www.opensolaris.org/os/community/arc/caselog/2007/283/ > > http://www.opensolaris.org/os/community/on/flag-days/all/ > > > > -- > > albert chin (china at thewrittenword.com) > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, FishWorks http://blogs.sun.com/eschrock
Jason J. W. Williams
2008-Jan-04 03:40 UTC
[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)
Hi Eric, Hard to say. I''ll use MDB next time it happens for more info. The applications using any zpool lock up. -J On Jan 3, 2008 3:33 PM, Eric Schrock <eric.schrock at sun.com> wrote:> When you say "starts throwing sense errors," does that mean every I/O to > the drive will fail, or some arbitrary percentage of I/Os will fail? If > it''s the latter, ZFS is trying to do the right thing by recognizing > these as transient errors, but eventually the ZFS diagnosis should kick > in. What does ''::spa -ve'' in ''mdb -k'' show in one of these situations? > How about ''::zio_state''? > > - Eric > > > On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote: > > Hi Albert, > > > > Thank you for the link. ZFS isn''t offlining the disk in b77. > > > > -J > > > > On Jan 3, 2008 3:07 PM, Albert Chin > > <opensolaris-zfs-discuss at mlists.thewrittenword.com> wrote: > > > > > > On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote: > > > > There seems to be a persistent issue we have with ZFS where one of the > > > > SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS > > > > does not offline the disk and instead hangs all zpools across the > > > > system. If it is not caught soon enough, application data ends up in > > > > an inconsistent state. We''ve had this issue with b54 through b77 (as > > > > of last night). > > > > > > > > We don''t seem to be the only folks with this issue reading through the > > > > archives. Are there any plans to fix this behavior? It really makes > > > > ZFS less than desirable/reliable. > > > > > > http://blogs.sun.com/eschrock/entry/zfs_and_fma > > > > > > FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68: > > > http://www.opensolaris.org/os/community/arc/caselog/2007/283/ > > > http://www.opensolaris.org/os/community/on/flag-days/all/ > > > > > > -- > > > albert chin (china at thewrittenword.com) > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, FishWorks http://blogs.sun.com/eschrock >