Mark Fasheh
2022-Aug-04 23:53 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
Hi Heming, On Sun, Jul 31, 2022 at 6:02 PM heming.zhao at suse.com <heming.zhao at suse.com> wrote:> > Hello Mark, > > On 8/1/22 01:42, Mark Fasheh wrote: > > Hi Heming, > > > > On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel > > <ocfs2-devel at oss.oracle.com> wrote: > >> > >> the key different between local mount and non-clustered mount: > >> local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do > >> convert job without ha stack. non-clustered mount feature can run > >> totally without ha stack. > > > > Can you please elaborate on this? Local mounts can run without a > > cluster stack so I don't see the difference there. We have > > I am using pacemaker cluster stack. In my env, the trouble of the converting between > local and clustered mounts are only happening on cluster stack. > > the non-clustered mount feature (Gang He commit: 912f655d78c5) gave ocfs2 the ability > to mount volume at any env (with/without cluster stack). > The 912f655d78c5 derived from SUSE customer complain: User wanted to fsck the backup > ocfs2 volume in non-clustered env. They wanted to access the volume quickly and didn't > want to take time/resource to set up HA stack. (by the way, pcmk stack at least needs > two nodes to set up a cluster.)Ok. I had some time to look over the ext4 mmp patches. I feel like there are two questions here: 1) Is MMP a useful feature for Ocfs2 My answer is yes absolutely, the user should have the option to enable this on local mounts. 2) Should we allow the user to bypass our cluster checks? On this question I'm still a 'no'. I simply haven't seen enough evidence to warrant such a drastic change in policy. Allowing it via mount option too just feels extremely error-prone. I think we need to explore alternative avenues to help ing the user out here. As you noted in your followup, a single node config is entirely possible in pacemaker (I've run that config myself). Why not provide an easy way for the user to drop down to that sort of a config? I know that's kind of pushing responsibility for this to the cluster stack, but that's where it belongs in the first place. Another option might be an 'observer mode' mount, where the node participates in the cluster (and the file system locking) but purely in a read-only fashion.> > tunefs.ocfs2 look for and join the cluster so as to avoid corrupting > > users data - that's a feature, not a bug. So what I'm seeing here is > > just opening us to potential corruptions. Is there a specific use case > > here that you're trying to account for? Are you fixing a particular > > bug? > > > > Tunefs.ocfs2 still needs HA/dlm stack to protect joining action. commit 912f655d78c5 > works on non-clustered env, which needs other tech (eg. MMP) to protect corrupting.FWIW, I'm glad that we pulled commit 912f655d78c5 and I do not think we should go back to that paradigm.> From my viewpoint, the non-clustered mount code is based on local mount code, > which gives more flexible than local mount. non-clustered mount uses unify mount > style align with clustered mount. I think users will like more to use non-clustered > mount than using tunefs.ocfs2 to change mount type.Can we get rid of the mount option, and make mmp something that users can turn on for Ocfs2 local mounts? I don't recall if we make a slot map for local mounts or not but it wouldn't be difficult to add that. Btw, thank you very much for all of these patches, and also thank you for the very detailed test cases in your initial email. I'll try to give the actual code a review as well. Thanks, --Mark
Mark Fasheh
2022-Aug-05 04:11 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
On Thu, Aug 4, 2022 at 4:53 PM Mark Fasheh <mark at fasheh.com> wrote:> 2) Should we allow the user to bypass our cluster checks? > > On this question I'm still a 'no'. I simply haven't seen enough > evidence to warrant such a drastic change in policy. Allowing it via > mount option too just feels extremely error-prone. I think we need to > explore alternative avenues to help > ing the user out here. As you noted in your followup, a single node > config is entirely possible in pacemaker (I've run that config > myself). Why not provide an easy way for the user to drop down to that > sort of a config? I know that's kind > of pushing responsibility for this to the cluster stack, but that's > where it belongs in the first place. > > Another option might be an 'observer mode' mount, where the node > participates in the cluster (and the file system locking) but purely > in a read-only fashion.Thinking about this some more... The only way that this works without potential corruptions is if we always write a periodic mmp sequence, even in clustered mode (which might mean each node writes to its own sector). That way tunefs can always check the disk for a mounted node, even without a cluster stack up. If tunefs sees anyone writing sequences to the disk, it can safely fail the operation. Tunefs also would have to be writing an mmp sequence once it has determined that the disk is not mounted. It could also write some flag alongisde the sequence that says 'tunefs is working on this disk'. If a cluster mount comes up and sees a live sequence with that flag, it will know to fail the mount request as the disk is being modified. Local mounts can also use this to ensure that they are the only mounted node. As it turns out, we already do pretty much all of the sequence writing already for the o2cb cluster stack - check out cluseter/heartbeat.c. If memory serves, tunefs.ocfs2 has code to write to this heartbeat area as well. For o2cb, we use the disk heartbeat to detect node liveness, and to kill our local node if we see disk timeouts. For pcmk, we shouldn't take any of these actions as it is none of our responsibility. Under pcmk, the heartbeating would be purely for mount protection checks. The downside to this is that all nodes would be heartbeating to the disk on a regular interval, not just one. To be fair, this is exactly how o2cb works and with the correct timeout choices, we were able to avoid a measurable performance impact, though in any case this might have to be a small price the user pays for cluster aware mount protection. Let me know what you think. Thanks, --Mark
heming.zhao at suse.com
2022-Aug-06 15:44 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
Hi Mark, On 8/5/22 07:53, Mark Fasheh wrote:> Hi Heming, > > On Sun, Jul 31, 2022 at 6:02 PM heming.zhao at suse.com > <heming.zhao at suse.com> wrote: >> >> Hello Mark, >> >> On 8/1/22 01:42, Mark Fasheh wrote: >>> Hi Heming, >>> >>> On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel >>> <ocfs2-devel at oss.oracle.com> wrote: >>>> >>>> the key different between local mount and non-clustered mount: >>>> local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do >>>> convert job without ha stack. non-clustered mount feature can run >>>> totally without ha stack. >>> >>> Can you please elaborate on this? Local mounts can run without a >>> cluster stack so I don't see the difference there. We have >> >> I am using pacemaker cluster stack. In my env, the trouble of the converting between >> local and clustered mounts are only happening on cluster stack. >> >> the non-clustered mount feature (Gang He commit: 912f655d78c5) gave ocfs2 the ability >> to mount volume at any env (with/without cluster stack). >> The 912f655d78c5 derived from SUSE customer complain: User wanted to fsck the backup >> ocfs2 volume in non-clustered env. They wanted to access the volume quickly and didn't >> want to take time/resource to set up HA stack. (by the way, pcmk stack at least needs >> two nodes to set up a cluster.) > > Ok. I had some time to look over the ext4 mmp patches. I feel like > there are two questions here: > > 1) Is MMP a useful feature for Ocfs2 > > My answer is yes absolutely, the user should have the option to enable > this on local mounts. >me too.> > 2) Should we allow the user to bypass our cluster checks? > > On this question I'm still a 'no'. I simply haven't seen enough > evidence to warrant such a drastic change in policy. Allowing it via > mount option too just feels extremely error-prone. I think we need to > explore alternative avenues to help > ing the user out here. As you noted in your followup, a single node > config is entirely possible in pacemaker (I've run that config > myself). Why not provide an easy way for the user to drop down to that > sort of a config? I know that's kind > of pushing responsibility for this to the cluster stack, but that's > where it belongs in the first place. >the reason for creating commit 912f655d78c5d4a is user didn't want to do any set up job for cluster stack. So any HA solution (eg: automatically config single node, auto install required software, create VM with already set up HA stack, etc) is not remove the pain point. Base on bypass cluster setup, there only left non-clustered mount. And in end user viewpoint, non-clustered mount is also the most easy way to mount ocfs2 volume.> Another option might be an 'observer mode' mount, where the node > participates in the cluster (and the file system locking) but purely > in a read-only fashion. >In my feeling, the 'observer mode' is just read-only mount, we don't need to create it. For 912f655d78c5d4a, user take a snapshot for ocfs2 volume, and do health-check on a different env (not production env). there may belong to two different network.> >>> tunefs.ocfs2 look for and join the cluster so as to avoid corrupting >>> users data - that's a feature, not a bug. So what I'm seeing here is >>> just opening us to potential corruptions. Is there a specific use case >>> here that you're trying to account for? Are you fixing a particular >>> bug? >>> >> >> Tunefs.ocfs2 still needs HA/dlm stack to protect joining action. commit 912f655d78c5 >> works on non-clustered env, which needs other tech (eg. MMP) to protect corrupting. > > FWIW, I'm glad that we pulled commit 912f655d78c5 and I do not think > we should go back to that paradigm. >If you or other maintainers prefer to pull out non-clustered mount, I respect the decision.> >> From my viewpoint, the non-clustered mount code is based on local mount code, >> which gives more flexible than local mount. non-clustered mount uses unify mount >> style align with clustered mount. I think users will like more to use non-clustered >> mount than using tunefs.ocfs2 to change mount type. > > Can we get rid of the mount option, and make mmp something that users > can turn on for Ocfs2 local mounts? I don't recall if we make a slot > map for local mounts or not but it wouldn't be difficult to add that. >From tech side, it's possible to enable mmp in local mount. Without 912f655d78c5d4a, local-mount uses slot 0 (the first available slot).> Btw, thank you very much for all of these patches, and also thank you > for the very detailed test cases in your initial email. I'll try to > give the actual code a review as well. > > Thanks, > --MarkThanks, Heming
Heming Zhao
2022-Aug-06 16:15 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
Hello Mark and Joseph, (please ignore my previous reply mail) I may met thunderbird bug, I clearly remembered I added you (Joseph) in CC for this mail & previous mail. But I only see Mark and ocfs2-devel in the receiver list. and the mail format also mess, thunderbird replace '\n' to '\r\n'. For fixing this problem, I change neomutt to resend these two mails On Thu, Aug 04, 2022 at 04:53:12PM -0700, Mark Fasheh wrote:> Hi Heming, > > On Sun, Jul 31, 2022 at 6:02 PM heming.zhao at suse.com > <heming.zhao at suse.com> wrote: > > > > Hello Mark, > > > > On 8/1/22 01:42, Mark Fasheh wrote: > > > Hi Heming, > > > > > > On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel > > > <ocfs2-devel at oss.oracle.com> wrote: > > >> > > >> the key different between local mount and non-clustered mount: > > >> local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do > > >> convert job without ha stack. non-clustered mount feature can run > > >> totally without ha stack. > > > > > > Can you please elaborate on this? Local mounts can run without a > > > cluster stack so I don't see the difference there. We have > > > > I am using pacemaker cluster stack. In my env, the trouble of the converting between > > local and clustered mounts are only happening on cluster stack. > > > > the non-clustered mount feature (Gang He commit: 912f655d78c5) gave ocfs2 the ability > > to mount volume at any env (with/without cluster stack). > > The 912f655d78c5 derived from SUSE customer complain: User wanted to fsck the backup > > ocfs2 volume in non-clustered env. They wanted to access the volume quickly and didn't > > want to take time/resource to set up HA stack. (by the way, pcmk stack at least needs > > two nodes to set up a cluster.) > > Ok. I had some time to look over the ext4 mmp patches. I feel like > there are two questions here: > > 1) Is MMP a useful feature for Ocfs2 > > My answer is yes absolutely, the user should have the option to enable > this on local mounts. >me too.> > 2) Should we allow the user to bypass our cluster checks? > > On this question I'm still a 'no'. I simply haven't seen enough > evidence to warrant such a drastic change in policy. Allowing it via > mount option too just feels extremely error-prone. I think we need to > explore alternative avenues to help > ing the user out here. As you noted in your followup, a single node > config is entirely possible in pacemaker (I've run that config > myself). Why not provide an easy way for the user to drop down to that > sort of a config? I know that's kind > of pushing responsibility for this to the cluster stack, but that's > where it belongs in the first place. >the reason for creating commit 912f655d78c5d4a is user didn't want to do any set up job for cluster stack. So any HA solution (eg: automatically config single node, auto install required software, create VM with already set up HA stack, etc) is not remove the pain point. Base on bypass cluster setup, there only left non-clustered mount. And in end user viewpoint, non-clustered mount is also the most easy way to mount ocfs2 volume.> Another option might be an 'observer mode' mount, where the node > participates in the cluster (and the file system locking) but purely > in a read-only fashion. >In my feeling, the 'observer mode' is just read-only mount, we don't need to create it. For 912f655d78c5d4a, user take a snapshot for ocfs2 volume, and do health-check on a different env (not production env). there may belong to two different network.> > > > tunefs.ocfs2 look for and join the cluster so as to avoid corrupting > > > users data - that's a feature, not a bug. So what I'm seeing here is > > > just opening us to potential corruptions. Is there a specific use case > > > here that you're trying to account for? Are you fixing a particular > > > bug? > > > > > > > Tunefs.ocfs2 still needs HA/dlm stack to protect joining action. commit 912f655d78c5 > > works on non-clustered env, which needs other tech (eg. MMP) to protect corrupting. > > FWIW, I'm glad that we pulled commit 912f655d78c5 and I do not think > we should go back to that paradigm.If you or other maintainers prefer to pull out non-clustered mount, I respect the decision.> > > > From my viewpoint, the non-clustered mount code is based on local mount code, > > which gives more flexible than local mount. non-clustered mount uses unify mount > > style align with clustered mount. I think users will like more to use non-clustered > > mount than using tunefs.ocfs2 to change mount type. > > Can we get rid of the mount option, and make mmp something that users > can turn on for Ocfs2 local mounts? I don't recall if we make a slot > map for local mounts or not but it wouldn't be difficult to add that. >>From tech side, it's possible to enable mmp in local mount.Without 912f655d78c5d4a, local-mount uses slot 0 (the first available slot).> Btw, thank you very much for all of these patches, and also thank you > for the very detailed test cases in your initial email. I'll try to > give the actual code a review as well. > > Thanks, > --MarkThanks, Heming