thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue [Jun 2022]

If this information is useful, please help other people find it:
Share via:

heming.zhao at suse.com

2022-Jun-14 02:59 UTC

[Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

On 6/13/22 23:43, Junxiao Bi wrote:> 
>> ? 2022?6?13????1:48?heming.zhao at suse.com ???
>>
>> ?On 6/13/22 16:21, Joseph Qi wrote:
>>>> On 6/13/22 3:59 PM, heming.zhao at suse.com wrote:
>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>> Hi,
>>>>>
>>>>> Why can't use local mount? I don't remember if we
discuss about this.
>>>>>
>>>> Sorry, I can't follow your question.
>>>> Do you mean why revert commit 912f655d78c5?
>>>>
>>>> or you are interest with the feature local mount?
>>>> the local mount is created by mkfs.ocfs2, which can't be
converted to clustered.
>>>> see mkfs.ocfs2(8) '-M' option.
>>>>
>>> What Junxiao's main concern is data corruption, so I'm
afraid we have to
>>> introduce an ondisk feature bit to prevent mixed nocluster and
cluster
>>> mount, similar to local mount.
>>
>> this patch defined two new variants/flags:
>> #define OCFS2_SLOTMAP_CLUSTER   1
>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>
>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode and
for compatibility,
>> anything doesn't need to be changed.
>>
>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap area.
>> this new value only take effect after a successfully nocluster mount.
>> (pls fix me), existed kernel/user space code don't do any special
handle for
>> noclustered mount mode in slotmap area. So the new value is also
compatibility.
>>
>> And the patch can also prevent mixed mount, the related code is in
ocfs2_find_slot().
>> code logic:
>> - noclustered mount condition: slotmap is empty or already mounted with
noclustered
>> - clustered mount condition: slotmap is empty or already mounted with
clustered.
>> - all other conditions will be denied.
> Finding slot required reading slot map and then  update slot map. It is not
atomic , you can?t prevent mixed mount until you have a cluster lock.
Could I say your mentioned use case is invalid.
I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode in
their product env.
The nocluster mount is only used on maintained env (eg. backup, fsck).

We only concern two ways:
1. user forgets to unmount (eg crash) before using another mount mode.
2. when ocfs2 volume is working, which should deny volume is mounted by another
mode.
    this may happened user mistakenly runs command or script.

Base on above 1 & 2, Junxiao above mentioned use case only happens on one
scenario:
the volume doesn't be mounted by any node, then user mistakenly mounts
volume
with [no]cluster mode at same time. I believe this use case is invalid. And I
guess
gfs2 may also has the same issue.

With this patch, ocfs2 has more sanity check ability than other fs, eg: xfs,
ext4.
SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with A/P
mode.
The xfs/ext4 never have detecting mount status code, Junxiao mentioned mixed
mount
can also happens on these fs. How do xfs/ext4/HA maintainers handle it? Under
these
fs mounting behavior, these fields maintainers also treat the mixed mount as
invalid use case and ignore handle it.

/Heming
>>
>>> Another scenario is journal replay after crash.
>>
>> this patch set a rule:
>> If last mount didn't do umount, (eg: crash happened), the next
mount MUST be same mount type.
>> (please also check above lines of 'code logic'.)
>>
>> In my view, this rule is enough to handle crash scenario.
>> So my patch should be polished in somewhere, but it is workable.
>>
>> Thanks,
>> Heming

Joseph Qi

2022-Jun-14 03:27 UTC

head link

[Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

On 6/14/22 10:59 AM, heming.zhao at suse.com wrote:> On 6/13/22 23:43, Junxiao Bi wrote:
>>
>>> ? 2022?6?13????1:48?heming.zhao at suse.com ???
>>>
>>> ?On 6/13/22 16:21, Joseph Qi wrote:
>>>>> On 6/13/22 3:59 PM, heming.zhao at suse.com wrote:
>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Why can't use local mount? I don't remember if
we discuss about this.
>>>>>>
>>>>> Sorry, I can't follow your question.
>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>
>>>>> or you are interest with the feature local mount?
>>>>> the local mount is created by mkfs.ocfs2, which can't
be converted to clustered.
>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>
>>>> What Junxiao's main concern is data corruption, so I'm
afraid we have to
>>>> introduce an ondisk feature bit to prevent mixed nocluster and
cluster
>>>> mount, similar to local mount.
>>>
>>> this patch defined two new variants/flags:
>>> #define OCFS2_SLOTMAP_CLUSTER?? 1
>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>
>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode
and for compatibility,
>>> anything doesn't need to be changed.
>>>
>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap
area.
>>> this new value only take effect after a successfully nocluster
mount.
>>> (pls fix me), existed kernel/user space code don't do any
special handle for
>>> noclustered mount mode in slotmap area. So the new value is also
compatibility.
>>>
>>> And the patch can also prevent mixed mount, the related code is in
ocfs2_find_slot().
>>> code logic:
>>> - noclustered mount condition: slotmap is empty or already mounted
with noclustered
>>> - clustered mount condition: slotmap is empty or already mounted
with clustered.
>>> - all other conditions will be denied.
>> Finding slot required reading slot map and then? update slot map. It is
not atomic , you can?t prevent mixed mount until you have a cluster lock.
> 
> Could I say your mentioned use case is invalid.
> I believe all (yes, *all*) the ocfs2 users use this fs under cluster mode
in
> their product env.
> The nocluster mount is only used on maintained env (eg. backup, fsck).
> 
> We only concern two ways:
> 1. user forgets to unmount (eg crash) before using another mount mode.
> 2. when ocfs2 volume is working, which should deny volume is mounted by
another mode.
> ?? this may happened user mistakenly runs command or script.
> 
> Base on above 1 & 2, Junxiao above mentioned use case only happens on
one scenario:
> the volume doesn't be mounted by any node, then user mistakenly mounts
volume
> with [no]cluster mode at same time. I believe this use case is invalid. And
I guess
> gfs2 may also has the same issue.
> 
> With this patch, ocfs2 has more sanity check ability than other fs, eg:
xfs, ext4.
> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 with
A/P mode.
> The xfs/ext4 never have detecting mount status code, Junxiao mentioned
mixed mount
> can also happens on these fs. How do xfs/ext4/HA maintainers handle it?
Under these
> fs mounting behavior, these fields maintainers also treat the mixed mount
as
> invalid use case and ignore handle it.
Ext4 has a feature mmp (multiple mount protection) to do this.
More details please refer to:
https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout

Thanks,
Joseph

Junxiao Bi

2022-Jun-14 06:22 UTC

head link

[Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

On 6/13/22 7:59 PM, heming.zhao at suse.com wrote:
> On 6/13/22 23:43, Junxiao Bi wrote:
>>
>>> ? 2022?6?13????1:48?heming.zhao at suse.com ???
>>>
>>> ?On 6/13/22 16:21, Joseph Qi wrote:
>>>>> On 6/13/22 3:59 PM, heming.zhao at suse.com wrote:
>>>>> On 6/12/22 22:16, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Why can't use local mount? I don't remember if
we discuss about
>>>>>> this.
>>>>>>
>>>>> Sorry, I can't follow your question.
>>>>> Do you mean why revert commit 912f655d78c5?
>>>>>
>>>>> or you are interest with the feature local mount?
>>>>> the local mount is created by mkfs.ocfs2, which can't
be converted
>>>>> to clustered.
>>>>> see mkfs.ocfs2(8) '-M' option.
>>>>>
>>>> What Junxiao's main concern is data corruption, so I'm
afraid we
>>>> have to
>>>> introduce an ondisk feature bit to prevent mixed nocluster and
cluster
>>>> mount, similar to local mount.
>>>
>>> this patch defined two new variants/flags:
>>> #define OCFS2_SLOTMAP_CLUSTER?? 1
>>> #define OCFS2_SLOTMAP_NOCLUSTER 2
>>>
>>> (I expect) Under OCFS2_SLOTMAP_CLUSTER, for clustered mount mode
and
>>> for compatibility,
>>> anything doesn't need to be changed.
>>>
>>> OCFS2_SLOTMAP_NOCLUSTER will introduce a new value for slotmap
area.
>>> this new value only take effect after a successfully nocluster
mount.
>>> (pls fix me), existed kernel/user space code don't do any
special
>>> handle for
>>> noclustered mount mode in slotmap area. So the new value is also 
>>> compatibility.
>>>
>>> And the patch can also prevent mixed mount, the related code is in 
>>> ocfs2_find_slot().
>>> code logic:
>>> - noclustered mount condition: slotmap is empty or already mounted 
>>> with noclustered
>>> - clustered mount condition: slotmap is empty or already mounted 
>>> with clustered.
>>> - all other conditions will be denied.
>> Finding slot required reading slot map and then? update slot map. It 
>> is not atomic , you can?t prevent mixed mount until you have a 
>> cluster lock.
>
> Could I say your mentioned use case is invalid.
> I believe all (yes, *all*) the ocfs2 users use this fs under cluster 
> mode in
> their product env.
> The nocluster mount is only used on maintained env (eg. backup, fsck).
>
> We only concern two ways:
> 1. user forgets to unmount (eg crash) before using another mount mode.
> 2. when ocfs2 volume is working, which should deny volume is mounted 
> by another mode.
> ?? this may happened user mistakenly runs command or script.
>
> Base on above 1 & 2, Junxiao above mentioned use case only happens on 
> one scenario:
> the volume doesn't be mounted by any node, then user mistakenly mounts 
> volume
> with [no]cluster mode at same time. I believe this use case is 
> invalid. And I guess
> gfs2 may also has the same issue.
You missed the point, i never said that's the user case. The point is 
with your feature ocfs2 could be corrupted due to mistake done by 
customer, thinking about why fsck.ocfs2/mkfs.ocfs2 check that before 
changing anything.

You'd better make sure gfs2 had that issue, if so, i think you should 
raise a bug to them.

Thanks,

Junxiao.
> With this patch, ocfs2 has more sanity check ability than other fs, 
> eg: xfs, ext4.
> SUSE HA stack with corosync+pacemaker also supports running xfs/ext4 
> with A/P mode.
> The xfs/ext4 never have detecting mount status code, Junxiao mentioned 
> mixed mount
> can also happens on these fs. How do xfs/ext4/HA maintainers handle 
> it? Under these
> fs mounting behavior, these fields maintainers also treat the mixed 
> mount as
> invalid use case and ignore handle it.
>
> /Heming
>
>>>
>>>> Another scenario is journal replay after crash.
>>>
>>> this patch set a rule:
>>> If last mount didn't do umount, (eg: crash happened), the next
mount
>>> MUST be same mount type.
>>> (please also check above lines of 'code logic'.)
>>>
>>> In my view, this rule is enough to handle crash scenario.
>>> So my patch should be polished in somewhere, but it is workable.
>>>
>>> Thanks,
>>> Heming
>

Ocfs2 devel - Jun 2022 - [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

[Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

[Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue

[Ocfs2-devel] [PATCH 1/1] ocfs2: fix ocfs2_find_slot repeats alloc same slot issue