I recently ran into a problem for the second time with ZFS mirrors. I mirror between two different physical arrays for some of my data. One array (SE3511) had a catastrophic failure and was unresponsive. Thus, according to the ZFS in s10u3 it just basically waits for the array to come back and hangs pretty much all IO to the zpool. I was told by Sun service that there were enhancements in the upcoming S10 10/08 release that will help. My understanding of the code being delivered with S10 10/08 is that on 2-way mirrors (which is what I use) that if this same situation occurs again, ZFS will allow reads to happen but writes are still going to be queued until the other half of the mirror comes back. Is it just me or have we gone backwards? The whole point of mirroring is so that if half the mirror goes we survive and can fix the problem with little to NO impact to the running system. Is this really true? With ZFS root also being available in S10 10/08 I would not want it anywhere near my root filesystem if this is really the behavior. Any information would be GREATLY appreciated! BlueUmp -- This message posted from opensolaris.org
I don''t know if this is already available in S10 10/08, but in opensolaris build > 71 you can set the: zpool failmode property see: http://opensolaris.org/os/community/arc/caselog/2007/567/ available options are: The property can be set to one of three options: "wait", "continue", or "panic". The default behavior will be to "wait" for manual intervention before allowing any further I/O attempts. Any I/O that was already queued would remain in memory until the condition is resolved. This error condition can be cleared by using the ''zpool clear'' subcommand, which will attempt to resume any queued I/Os. The "continue" mode returns EIO to any new write request but attempts to satisfy reads. Any write I/Os that were already in-flight at the time of the failure will be queued and maybe resumed using ''zpool clear''. Finally, the "panic" mode provides the existing behavior that was explained above. -- This message posted from opensolaris.org
As far as I can tell, it all comes down to whether ZFS detects the failure properly, and what commands you use as it''s recovering. Running "zpool status" is a complete no no if your array is degraded in any way. This is capable of locking up zfs even when it would otherwise have recovered itself. If you had zpool status hang, this probably happened to you. It also appears that ZFS is at the mercy of your drivers when it comes to detecting and reacting to the failure. From my experience this means that when a device does fail, ZFS may react instantly and keep your mirror online, it may take 3 minutes (waiting for iSCSI to timeout), or it may take a long time (if FMA is involved). I''ve seen ZFS mirrors protect data nicely, but I''ve also seen a lot of very odd fail modes. I''d quite happily run ZFS in production, but you can be damn sure it''d be on Sun hardware, and I''d test as many fail modes as I could before it went live. -- This message posted from opensolaris.org
zfs-discuss-bounces at opensolaris.org wrote on 10/07/2008 01:10:51 PM:> I don''t know if this is already available in S10 10/08, but in > opensolaris build > 71 you can set the: > > zpool failmode property > > see: > http://opensolaris.org/os/community/arc/caselog/2007/567/ > > available options are: > > The property can be set to one of three options: "wait", "continue", > or "panic". > > The default behavior will be to "wait" for manual intervention before > allowing any further I/O attempts. Any I/O that was already queued would > remain in memory until the condition is resolved. This error conditioncan> be cleared by using the ''zpool clear'' subcommand, which will attempttoresume> any queued I/Os. > > The "continue" mode returns EIO to any new write request but attempts to > satisfy reads. Any write I/Os that were already in-flight at the time > of the failure will be queued and maybe resumed using ''zpool clear''. > > Finally, the "panic" mode provides the existing behavior that wasexplained> above.Huh? I was under the impression that this was for catastrophic write issues (no paths to storage at all) not just one side of a mirror being down? I run mostly zraid2 and have not tested mirror breakage but am I wrong in assuming that like any other mirroring system (hw or software) when you lose one side of a mirror for writes that the expected result is the filesystem stays online and error free while the disk(s) in question are marked as down/failed/offline? -Wade
On Tue, Oct 07, 2008 at 11:42:57AM -0700, Ross wrote:> > Running "zpool status" is a complete no no if your array is degraded > in any way. This is capable of locking up zfs even when it would > otherwise have recovered itself. If you had zpool status hang, this > probably happened to you.FYI, this is bug 6667208 fixed in build 100 of nevada. - Eric -- Eric Schrock, Fishworks http://blogs.sun.com/eschrock
kristof wrote:> I don''t know if this is already available in S10 10/08, but in opensolaris build> 71 you can set the: > > zpool failmode property > > see: > http://opensolaris.org/os/community/arc/caselog/2007/567/ > > available options are: > > The property can be set to one of three options: "wait", "continue", > or "panic".I''m fairly certain that this isn''t what the OP was concerned about. The OP appeared to be concerned about ZFS''s behaviour when one half of a mirror went away. As the pool is merely degraded, ZFS will continue to allow reads and writes... eventually... Depending on _how_ the disk is failing, I/O may become glacial, or freeze entirely for several minutes before recovering, or hiccup briefly and then go on normally. ZFS is layered to the point where stacked timeouts _may_ become unreasonably large (see many previous threads). And a single "slow" device will drag the rest of the volume with it (e.g. a disk that demands 10 retries per write). SVM suffers from some of the same problems, although not (in my experience) to the same degree. SVM tends to err on the side of "fail the disk quickly", whereas ZFS tries very very hard to make all I/O succeed, and relies on the fault management system or I/O stack to decide to fail things. -- Carson
Oh cool, that''s great news. Thanks Eric. ----------------------------------------> Date: Tue, 7 Oct 2008 11:50:08 -0700 > From: eric.schrock at sun.com > To: myxiplx at hotmail.com > CC: zfs-discuss at opensolaris.org > Subject: Re: [zfs-discuss] ZFS Mirrors braindead? > > On Tue, Oct 07, 2008 at 11:42:57AM -0700, Ross wrote: >> >> Running "zpool status" is a complete no no if your array is degraded >> in any way. This is capable of locking up zfs even when it would >> otherwise have recovered itself. If you had zpool status hang, this >> probably happened to you. > > FYI, this is bug 6667208 fixed in build 100 of nevada. > > - Eric > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock_________________________________________________________________ Discover Bird''s Eye View now with Multimap from Live Search http://clk.atdmt.com/UKM/go/111354026/direct/01/