Phi Tran
2006-Aug-10 21:55 UTC
[zfs-discuss] Re: I/O write failures on non-replicated pool
I remember a discussion about I/O write failures causing a panic for a non-replicated pool and a plan to fix this in the future. I couldn''t find a bug for this work though. Is there still a plan to fix this? Phi
Eric Schrock
2006-Aug-10 22:03 UTC
[zfs-discuss] Re: I/O write failures on non-replicated pool
Yes, there are three incremental fixes that we plan in this area: 6417772 need nicer message on write failure This just cleans up the failure mode so that we get a nice FMA failure message and can distinguish this from a random failed assert. 6417779 ZFS: I/O failure (write on ...) -- need to reallocate writes In a multi-vdev pool, this would take a failed write and attempt to do the write on another toplevel vdev. This would all but elminate the problem for multi-vdev pools. 6322646 ZFS should gracefully handle all devices failing (when writing) This is the "real" fix. Unfortunately, it''s also really hard. Even if we manage to abort the current transaction group, dealing with the semantics of a filesystem which has lost an arbitrary amount of change and notifying the user in a meaningful way is difficult at best. Hope that helps. - Eric On Thu, Aug 10, 2006 at 02:55:51PM -0700, Phi Tran wrote:> I remember a discussion about I/O write failures causing a panic for a > non-replicated pool and a plan to fix this in the future. I couldn''t > find a bug for this work though. Is there still a plan to fix this? > > Phi > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Phi Tran
2006-Aug-10 22:16 UTC
[zfs-discuss] Re: I/O write failures on non-replicated pool
Thanks for the list. Phi Eric Schrock wrote:> Yes, there are three incremental fixes that we plan in this area: > > 6417772 need nicer message on write failure > > This just cleans up the failure mode so that we get a nice > FMA failure message and can distinguish this from a random > failed assert. > > 6417779 ZFS: I/O failure (write on ...) -- need to reallocate writes > > In a multi-vdev pool, this would take a failed write and attempt > to do the write on another toplevel vdev. This would all but > elminate the problem for multi-vdev pools. > > 6322646 ZFS should gracefully handle all devices failing (when writing) > > This is the "real" fix. Unfortunately, it''s also really hard. > Even if we manage to abort the current transaction group, > dealing with the semantics of a filesystem which has lost an > arbitrary amount of change and notifying the user in a > meaningful way is difficult at best. > > Hope that helps. > > - Eric > > > On Thu, Aug 10, 2006 at 02:55:51PM -0700, Phi Tran wrote: > >>I remember a discussion about I/O write failures causing a panic for a >>non-replicated pool and a plan to fix this in the future. I couldn''t >>find a bug for this work though. Is there still a plan to fix this? >> >>Phi >> >>_______________________________________________ >>zfs-discuss mailing list >>zfs-discuss at opensolaris.org >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Nice to see some progress, at last, on this bug: http://bugs.opensolaris.org/view_bug.do?bug_id=6417779 "ZFS: I/O failure (write on ...) -- need to reallocate writes" Commit to Fix: snv_77 http://www.opensolaris.org/os/community/arc/caselog/2007/567/onepager/ http://mail.opensolaris.org/pipermail/onnv-notify/2007-October/012782.html Regards Nigel Smith This message posted from opensolaris.org
Robert Milkowski
2007-Oct-29 17:47 UTC
[zfs-discuss] I/O write failures on non-replicated pool
Hello Nigel, Thursday, October 25, 2007, 12:02:04 PM, you wrote: NS> Nice to see some progress, at last, on this bug: NS> http://bugs.opensolaris.org/view_bug.do?bug_id=6417779 NS> "ZFS: I/O failure (write on ...) -- need to reallocate writes" NS> Commit to Fix: snv_77 NS> http://www.opensolaris.org/os/community/arc/caselog/2007/567/onepager/ NS> http://mail.opensolaris.org/pipermail/onnv-notify/2007-October/012782.html Thanks for spotting this. By looking at one-pager it''s not obvious what would happen in case of one top level vdev failuere - will it wait or will it using ditto block to write data on another device as suggested in bug (however I''m not sure it''s good idea)? -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com