thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack" [Aug 2022]

If this information is useful, please help other people find it:
Share via:

Mark Fasheh

2022-Aug-04 23:53 UTC

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

Hi Heming,

On Sun, Jul 31, 2022 at 6:02 PM heming.zhao at suse.com
<heming.zhao at suse.com> wrote:>
> Hello Mark,
>
> On 8/1/22 01:42, Mark Fasheh wrote:
> > Hi Heming,
> >
> > On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel
> > <ocfs2-devel at oss.oracle.com> wrote:
> >>
> >> the key different between local mount and non-clustered mount:
> >> local mount feature (tunefs.ocfs2 --fs-features=[no]local)
can't do
> >> convert job without ha stack. non-clustered mount feature can run
> >> totally without ha stack.
> >
> > Can you please elaborate on this? Local mounts can run without a
> > cluster stack so I don't see the difference there. We have
>
> I am using pacemaker cluster stack. In my env, the trouble of the
converting between
> local and clustered mounts are only happening on cluster stack.
>
> the non-clustered mount feature (Gang He commit: 912f655d78c5) gave ocfs2
the ability
> to mount volume at any env (with/without cluster stack).
> The 912f655d78c5 derived from SUSE customer complain: User wanted to fsck
the backup
> ocfs2 volume in non-clustered env. They wanted to access the volume quickly
and didn't
> want to take time/resource to set up HA stack. (by the way, pcmk stack at
least needs
> two nodes to set up a cluster.)
Ok. I had some time to look over the ext4 mmp patches. I feel like
there are two questions here:

1) Is MMP a useful feature for Ocfs2

My answer is yes absolutely, the user should have the option to enable
this on local mounts.


2) Should we allow the user to bypass our cluster checks?

On this question I'm still a 'no'. I simply haven't seen enough
evidence to warrant such a drastic change in policy. Allowing it via
mount option too just feels extremely error-prone. I think we need to
explore alternative avenues to help
ing the user out here. As you noted in your followup, a single node
config is entirely possible in pacemaker (I've run that config
myself). Why not provide an easy way for the user to drop down to that
sort of a config? I know that's kind
of pushing responsibility for this to the cluster stack, but that's
where it belongs in the first place.

Another option might be an 'observer mode' mount, where the node
participates in the cluster (and the file system locking) but purely
in a read-only fashion.

> > tunefs.ocfs2 look for and join the cluster so as to avoid corrupting
> > users data - that's a feature, not a bug. So what I'm seeing
here is
> > just opening us to potential corruptions. Is there a specific use case
> > here that you're trying to account for? Are you fixing a
particular
> > bug?
> >
>
> Tunefs.ocfs2 still needs HA/dlm stack to protect joining action. commit
912f655d78c5
> works on non-clustered env, which needs other tech (eg. MMP) to protect
corrupting.
FWIW, I'm glad that we pulled commit 912f655d78c5 and I do not think
we should go back to that paradigm.

>  From my viewpoint, the non-clustered mount code is based on local mount
code,
> which gives more flexible than local mount. non-clustered mount uses unify
mount
> style align with clustered mount. I think users will like more to use
non-clustered
> mount than using tunefs.ocfs2 to change mount type.
Can we get rid of the mount option, and make mmp something that users
can turn on for Ocfs2 local mounts? I don't recall if we make a slot
map for local mounts or not but it wouldn't be difficult to add that.

Btw, thank you very much for all of these patches, and also thank you
for the very detailed test cases in your initial email. I'll try to
give the actual code a review as well.

Thanks,
  --Mark

Mark Fasheh

2022-Aug-05 04:11 UTC

head link

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

On Thu, Aug 4, 2022 at 4:53 PM Mark Fasheh <mark at fasheh.com>
wrote:> 2) Should we allow the user to bypass our cluster checks?
>
> On this question I'm still a 'no'. I simply haven't seen
enough
> evidence to warrant such a drastic change in policy. Allowing it via
> mount option too just feels extremely error-prone. I think we need to
> explore alternative avenues to help
> ing the user out here. As you noted in your followup, a single node
> config is entirely possible in pacemaker (I've run that config
> myself). Why not provide an easy way for the user to drop down to that
> sort of a config? I know that's kind
> of pushing responsibility for this to the cluster stack, but that's
> where it belongs in the first place.
>
> Another option might be an 'observer mode' mount, where the node
> participates in the cluster (and the file system locking) but purely
> in a read-only fashion.
Thinking about this some more... The only way that this works without
potential corruptions is if we always write a periodic mmp sequence,
even in clustered mode (which might mean each node writes to its own
sector). That way tunefs can always check the disk for a mounted node,
even without a cluster stack up. If tunefs sees anyone writing
sequences to the disk, it can safely fail the operation. Tunefs also
would have to be writing an mmp sequence once it has determined that
the disk is not mounted. It could also write some flag alongisde the
sequence that says 'tunefs is working on this disk'. If a cluster
mount comes up and sees a live sequence with that flag, it will know
to fail the mount request as the disk is being modified. Local mounts
can also use this to ensure that they are the only mounted node.

As it turns out, we already do pretty much all of the sequence writing
already for the o2cb cluster stack - check out cluseter/heartbeat.c.
If memory serves, tunefs.ocfs2 has code to write to this heartbeat
area as well. For o2cb, we use the disk heartbeat to detect node
liveness, and to kill our local node if we see disk timeouts. For
pcmk, we shouldn't take any of these actions as it is none of our
responsibility. Under pcmk, the heartbeating would be purely for mount
protection checks.

The downside to this is that all nodes would be heartbeating to the
disk on a regular interval, not just one. To be fair, this is exactly
how o2cb works and with the correct timeout choices, we were able to
avoid a measurable performance impact, though in any case this might
have to be a small price the user pays for cluster aware mount
protection.

Let me know what you think.

Thanks,
  --Mark

heming.zhao at suse.com

2022-Aug-06 15:44 UTC

head link

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

Hi Mark,

On 8/5/22 07:53, Mark Fasheh wrote:> Hi Heming,
> 
> On Sun, Jul 31, 2022 at 6:02 PM heming.zhao at suse.com
> <heming.zhao at suse.com> wrote:
>>
>> Hello Mark,
>>
>> On 8/1/22 01:42, Mark Fasheh wrote:
>>> Hi Heming,
>>>
>>> On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel
>>> <ocfs2-devel at oss.oracle.com> wrote:
>>>>
>>>> the key different between local mount and non-clustered mount:
>>>> local mount feature (tunefs.ocfs2 --fs-features=[no]local)
can't do
>>>> convert job without ha stack. non-clustered mount feature can
run
>>>> totally without ha stack.
>>>
>>> Can you please elaborate on this? Local mounts can run without a
>>> cluster stack so I don't see the difference there. We have
>>
>> I am using pacemaker cluster stack. In my env, the trouble of the
converting between
>> local and clustered mounts are only happening on cluster stack.
>>
>> the non-clustered mount feature (Gang He commit: 912f655d78c5) gave
ocfs2 the ability
>> to mount volume at any env (with/without cluster stack).
>> The 912f655d78c5 derived from SUSE customer complain: User wanted to
fsck the backup
>> ocfs2 volume in non-clustered env. They wanted to access the volume
quickly and didn't
>> want to take time/resource to set up HA stack. (by the way, pcmk stack
at least needs
>> two nodes to set up a cluster.)
> 
> Ok. I had some time to look over the ext4 mmp patches. I feel like
> there are two questions here:
> 
> 1) Is MMP a useful feature for Ocfs2
> 
> My answer is yes absolutely, the user should have the option to enable
> this on local mounts.
> 
me too.> 
> 2) Should we allow the user to bypass our cluster checks?
> 
> On this question I'm still a 'no'. I simply haven't seen
enough
> evidence to warrant such a drastic change in policy. Allowing it via
> mount option too just feels extremely error-prone. I think we need to
> explore alternative avenues to help
> ing the user out here. As you noted in your followup, a single node
> config is entirely possible in pacemaker (I've run that config
> myself). Why not provide an easy way for the user to drop down to that
> sort of a config? I know that's kind
> of pushing responsibility for this to the cluster stack, but that's
> where it belongs in the first place.
> 
the reason for creating commit 912f655d78c5d4a is user didn't want to do any

set up job for cluster stack. So any HA solution (eg: automatically config

single node, auto install required software, create VM with already set up HA

stack, etc) is not remove the pain point. Base on bypass cluster setup, there
only

left non-clustered mount. And in end user viewpoint, non-clustered mount is

also the most easy way to mount ocfs2 volume.
> Another option might be an 'observer mode' mount, where the node
> participates in the cluster (and the file system locking) but purely
> in a read-only fashion.
> 
In my feeling, the 'observer mode' is just read-only mount, we don't
need to
create it.
For 912f655d78c5d4a, user take a snapshot for ocfs2 volume, and do health-check

on a different env (not production env). there may belong to two different
network.
> 
>>> tunefs.ocfs2 look for and join the cluster so as to avoid
corrupting
>>> users data - that's a feature, not a bug. So what I'm
seeing here is
>>> just opening us to potential corruptions. Is there a specific use
case
>>> here that you're trying to account for? Are you fixing a
particular
>>> bug?
>>>
>>
>> Tunefs.ocfs2 still needs HA/dlm stack to protect joining action. commit
912f655d78c5
>> works on non-clustered env, which needs other tech (eg. MMP) to protect
corrupting.
> 
> FWIW, I'm glad that we pulled commit 912f655d78c5 and I do not think
> we should go back to that paradigm.
> 
If you or other maintainers prefer to pull out non-clustered mount, I respect

the decision.
> 
>>   From my viewpoint, the non-clustered mount code is based on local
mount code,
>> which gives more flexible than local mount. non-clustered mount uses
unify mount
>> style align with clustered mount. I think users will like more to use
non-clustered
>> mount than using tunefs.ocfs2 to change mount type.
> 
> Can we get rid of the mount option, and make mmp something that users
> can turn on for Ocfs2 local mounts? I don't recall if we make a slot
> map for local mounts or not but it wouldn't be difficult to add that.
> 
 From tech side, it's possible to enable mmp in local mount.

Without 912f655d78c5d4a, local-mount uses slot 0 (the first available slot).
> Btw, thank you very much for all of these patches, and also thank you
> for the very detailed test cases in your initial email. I'll try to
> give the actual code a review as well.
> 
> Thanks,
>    --Mark
Thanks,
Heming

Heming Zhao

2022-Aug-06 16:15 UTC

head link

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

Hello Mark and Joseph,

(please ignore my previous reply mail)
I may met thunderbird bug, I clearly remembered I added you (Joseph) in
CC for this mail & previous mail. But I only see Mark and ocfs2-devel in the
receiver list.  and the mail format also mess, thunderbird replace '\n'
to '\r\n'.

For fixing this problem, I change neomutt to resend these two mails

On Thu, Aug 04, 2022 at 04:53:12PM -0700, Mark Fasheh
wrote:> Hi Heming,
> 
> On Sun, Jul 31, 2022 at 6:02 PM heming.zhao at suse.com
> <heming.zhao at suse.com> wrote:
> >
> > Hello Mark,
> >
> > On 8/1/22 01:42, Mark Fasheh wrote:
> > > Hi Heming,
> > >
> > > On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel
> > > <ocfs2-devel at oss.oracle.com> wrote:
> > >>
> > >> the key different between local mount and non-clustered
mount:
> > >> local mount feature (tunefs.ocfs2 --fs-features=[no]local)
can't do
> > >> convert job without ha stack. non-clustered mount feature can
run
> > >> totally without ha stack.
> > >
> > > Can you please elaborate on this? Local mounts can run without a
> > > cluster stack so I don't see the difference there. We have
> >
> > I am using pacemaker cluster stack. In my env, the trouble of the
converting between
> > local and clustered mounts are only happening on cluster stack.
> >
> > the non-clustered mount feature (Gang He commit: 912f655d78c5) gave
ocfs2 the ability
> > to mount volume at any env (with/without cluster stack).
> > The 912f655d78c5 derived from SUSE customer complain: User wanted to
fsck the backup
> > ocfs2 volume in non-clustered env. They wanted to access the volume
quickly and didn't
> > want to take time/resource to set up HA stack. (by the way, pcmk stack
at least needs
> > two nodes to set up a cluster.)
> 
> Ok. I had some time to look over the ext4 mmp patches. I feel like
> there are two questions here:
> 
> 1) Is MMP a useful feature for Ocfs2
> 
> My answer is yes absolutely, the user should have the option to enable
> this on local mounts.
> 
me too.
> 
> 2) Should we allow the user to bypass our cluster checks?
> 
> On this question I'm still a 'no'. I simply haven't seen
enough
> evidence to warrant such a drastic change in policy. Allowing it via
> mount option too just feels extremely error-prone. I think we need to
> explore alternative avenues to help
> ing the user out here. As you noted in your followup, a single node
> config is entirely possible in pacemaker (I've run that config
> myself). Why not provide an easy way for the user to drop down to that
> sort of a config? I know that's kind
> of pushing responsibility for this to the cluster stack, but that's
> where it belongs in the first place.
> 
the reason for creating commit 912f655d78c5d4a is user didn't want to do
any set up job for cluster stack. So any HA solution (eg: automatically
config single node, auto install required software, create VM with already set
up HA stack, etc) is not remove the pain point. Base on bypass cluster setup,
there only left non-clustered mount. And in end user viewpoint, non-clustered
mount is also the most easy way to mount ocfs2 volume. 
> Another option might be an 'observer mode' mount, where the node
> participates in the cluster (and the file system locking) but purely
> in a read-only fashion.
> 
In my feeling, the 'observer mode' is just read-only mount, we don't
need to
create it.
For 912f655d78c5d4a, user take a snapshot for ocfs2 volume, and do health-check

on a different env (not production env). there may belong to two different
network.
> 
> > > tunefs.ocfs2 look for and join the cluster so as to avoid
corrupting
> > > users data - that's a feature, not a bug. So what I'm
seeing here is
> > > just opening us to potential corruptions. Is there a specific use
case
> > > here that you're trying to account for? Are you fixing a
particular
> > > bug?
> > >
> >
> > Tunefs.ocfs2 still needs HA/dlm stack to protect joining action.
commit 912f655d78c5
> > works on non-clustered env, which needs other tech (eg. MMP) to
protect corrupting.
> 
> FWIW, I'm glad that we pulled commit 912f655d78c5 and I do not think
> we should go back to that paradigm.
If you or other maintainers prefer to pull out non-clustered mount, I respect
the decision.
> 
> 
> >  From my viewpoint, the non-clustered mount code is based on local
mount code,
> > which gives more flexible than local mount. non-clustered mount uses
unify mount
> > style align with clustered mount. I think users will like more to use
non-clustered
> > mount than using tunefs.ocfs2 to change mount type.
> 
> Can we get rid of the mount option, and make mmp something that users
> can turn on for Ocfs2 local mounts? I don't recall if we make a slot
> map for local mounts or not but it wouldn't be difficult to add that.
> 
>From tech side, it's possible to enable mmp in local mount.
Without 912f655d78c5d4a, local-mount uses slot 0 (the first available slot). 
> Btw, thank you very much for all of these patches, and also thank you
> for the very detailed test cases in your initial email. I'll try to
> give the actual code a review as well.
> 
> Thanks,
>   --Mark
Thanks,
Heming

Ocfs2 devel - Aug 2022 - [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"