Issues with ZFS and Sun Cluster If a cluster node crashes and HAStoragePlus resource group containing ZFS structure (ie. Zpool) is transitioned to a surviving node, the zpool import can cause the surviving node to panic. Zpool was obviously not exported in controlled fashion because of hard crash. Storage structure is - HW RAID protected LUN from array. Zpool build on single HW LUN. Zpool created on a full device (zpool create HAzpool c1t8d0s2). No RAID-Z configured. I was under the impression that ZFS was always maintained in a perpetually consistent state. Panic of surving node appears to be related to some form of silent corruption in ZFS. But I thought the whole design of ZFS was to prevent this very thing. Is RAID-Z required to achieve this resiliency? ZFS is officially supported under Sun Cluster but this situation concerns me greatly because the whole purpose of a cluster is undermined if a single resource group using HA ZFS causes a panic on import. The effect could bring the whole cluster down. Has anyone got any thoughts, comments or similar experiences on this. thx -- This message posted from opensolaris.org
James C. McPherson
2008-Sep-11 22:50 UTC
[zfs-discuss] ZFS Panicing System Cluster Crash effect
Jack Dumson wrote:> Issues with ZFS and Sun Cluster > > If a cluster node crashes and HAStoragePlus resource group containing ZFS > structure (ie. Zpool) is transitioned to a surviving node, the zpool > import can cause the surviving node to panic. Zpool was obviously not > exported in controlled fashion because of hard crash. Storage structure > is - HW RAID protected LUN from array. Zpool build on single HW LUN.You''ve got no redundancy from a ZFS perspective.> Zpool created on a full device (zpool create HAzpool c1t8d0s2). No RAID-Z > configured. > > I was under the impression that ZFS was always maintained in a > perpetually consistent state.It''s a *lot* easier for ZFS to achieve this if you provide it with a redundant configuration for it to manage.> Panic of surving node appears to be related > to some form of silent corruption in ZFS. But I thought the whole design > of ZFS was to prevent this very thing. Is RAID-Z required to achieve this > resiliency?Again, redundancy from ZFS'' perspective.> ZFS is officially supported under Sun Cluster but this situation concerns > me greatly because the whole purpose of a cluster is undermined if a > single resource group using HA ZFS causes a panic on import. The effect > could bring the whole cluster down. > Has anyone got any thoughts, comments or similar experiences on this.I''m pretty sure that the Cluster configuration guide will have spelt out the ZFS requirements. Apart from that, please look at the ZFS Best Practices site http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
Ricardo M. Correia
2008-Sep-11 23:02 UTC
[zfs-discuss] ZFS Panicing System Cluster Crash effect
Hi Jack, On Qui, 2008-09-11 at 15:37 -0700, Jack Dumson wrote:> Issues with ZFS and Sun Cluster > > If a cluster node crashes and HAStoragePlus resource group containing > ZFS structure (ie. Zpool) is transitioned to a surviving node, the > zpool import can cause the surviving node to panic.zpool import should not cause a node to panic under any circumstances. Can you provide us details about the problem, e.g., which Solaris version are you running, log messages with the panic stack strace, ..? In any case, you should file a new CR if this is not a known problem already. Best regards, Ricardo
Ricardo M. Correia
2008-Sep-12 03:21 UTC
[zfs-discuss] ZFS Panicing System Cluster Crash effect
Hi John, On Qui, 2008-09-11 at 20:23 -0600, John Antonio wrote:> It is operating with Sol 10 u3 and also u4. Sun support is claiming > the issue is related to quiet corruptions.Probably, yes.> Since the ZFS structure was not cleanly exported because of the event > (Node crash), the statement from support is that these types of > corruptions could occur.I don''t think the cause of corruption is because the pool wasn''t cleanly exported. If corruption only happens when a node crashes, there are 3 likely causes for this problem: 1) The storage subsystem is ignoring disk write cache flush requests and allows writes to go out-of-order, making the uberblock reach the disk before other important metadata blocks. 2) Or you''re running into a bug that is corrupting metadata. 3) Or you''re experiencing memory corruption. The first one should be fixable and there is a bug open for this already (CR 6667683), the second one is fixable once identified, the third one is harder to solve. The ZFS team and a few folks in the Lustre group are looking into making ZFS more resilient against corrupted metadata, but this is definitely a hard-to-solve issue.> The panic response is apparently the expected behavior during a zpool > import if this situation occurs.I wouldn''t say that is the expected behavior.. :-) I''d say a panic when importing a pool is a bug.> Apparently in u6, there will be additional zpool import options that > will make the identification of a corruption a passive event. The pool > won''t import but instead of panicing the server it would respond with > a failure status.Interesting.. I''d love to see the CR for this.> Regardless of a passive response or not, it concerns me that the > condition can occur period. Not that other filesystems don''t > experience silent corruptions, the concern here is ZFS had been > promoted as overcoming these exact issues."Silent corruptions" is a bit vague :-) ZFS is promoted as being resilient against most kinds of disk corruption, but memory corruption and potential bugs are different issues. Note that there may be several causes for panicking when importing a pool, depending on which metadata was corrupted. Some may be easily fixable, others may be harder. That''s why providing a stack trace of the panic would help identify which particular issue you''re running into. Also note that there are efforts being made into solving these problems. As an example, Victor Latushkin has very recently identified a similar panic when importing a pool (CR 6720531) and provided a patch that allows the corrupted pool to be successfully imported (only for that particular kind of corruption, of course).> The fact that it has been certified to work in a cluster deployment, > this situation suggests that it may not be ready or a significant bug > exists.Yes, it does appear that you''ve ran into a significant bug. Knowing the exact bug you''re running into would be helpful. Best regards, Ricardo
What version of Solaris are you running there? For a long while the default response on encountering unrecoverable errors was to panic, but I believe that has been improved in newer builds. Also, part of your problem may be down to running with just a single disk. With just one disk, ZFS still uses checksumming to see if data is corrupted, but when it finds a bad checksum, it has no way to correct the data. On a cluster I would always use some kind of redundancy within ZFS, regardless of your underlying storage hardware. Having said that though, I still wouldn''t have thought it would be easy to get ZFS to hang like that. How did you crash the first node? -- This message posted from opensolaris.org