thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack" [Jun 2022]

If this information is useful, please help other people find it:
Share via:

heming.zhao at suse.com

2022-Jun-06 02:08 UTC

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

Hello Junxiao,

First of all, let's turn to the same channel to discuss your patch.
There are two features: 'local mount' & 'nocluster mount'.
I mistakenly wrote local-mount on some place in previous mails.
This patch revert commit 912f655d78c5d4, which is related with 'nocluster
mount'.


On 6/5/22 00:19, Junxiao Bi wrote:> 
> 
>> ? 2022?6?4????1:45?heming.zhao at suse.com ???
>>
>> ?Hello Junxiao,
>>
>>> On 6/4/22 06:28, Junxiao Bi via Ocfs2-devel wrote:
>>> This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.
>>> This commit introduced a regression that can cause mount hung.
>>> The changes in __ocfs2_find_empty_slot causes that any node with
>>> none-zero node number can grab the slot that was already taken by
>>> node 0, so node 1 will access the same journal with node 0, when it
>>> try to grab journal cluster lock, it will hung because it was
already
>>> acquired by node 0.
>>> It's very easy to reproduce this, in one cluster, mount node 0
first,
>>> then node 1, you will see the following call trace from node 1.
>>
>>  From your description, it looks your env mixed local-mount &
clustered-mount.
> No, only cluster mount.
>>
>> Could you mind to share your test/reproducible steps.
>> And which ha stack do you use, pmck or o2cb?
>>
>> I failed to reproduce it, my test steps (with pcmk stack):
>> ```
>> node1:
>> mount -t ocfs2 /dev/vdd /mnt
>>
>> node2:
>> for i in {1..100}; do
>> echo "mount <$i>"; mount -t ocfs2 /dev/vdd /mnt;
>> sleep 3;
>> echo "umount"; umount /mnt;
>> done
>> ```
>>
> Try set one node with node number 0 and mount it there first. I used o2cb
stack.
Could you show more test info/steps. I can't follow your meaning.
How to set up a node with a fix node number?
With my understanding, under pcmk env, the first mounted node will auto got node
number 1 (or any value great than 0). and there is no place to set node number
by hand. It's very likely you mixed to use nocluster & cluster mount.
If my suspect right (mixed mount), your use case is wrong.
>> This local mount feature helps SUSE customers to maintain ocfs2
partition, it's useful.
>> I want to find whether there is a idear way to fix the hung issue.
>>
>>> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than
122 seconds.
>>> [13148.739691]       Not tainted
5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
>>> [13148.742560] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045
ppid: 53044 flags:0x00004000
>>> [13148.749354] Call Trace:
>>> ...
>>> To fix it, we can just fix __ocfs2_find_empty_slot. But original
commit
>>> introduced the feature to mount ocfs2 locally even it is cluster
based,
>>> that is a very dangerous, it can easily cause serious data
corruption,
>>> there is no way to stop other nodes mounting the fs and corrupting
it.
>>
>> I can't follow your meaning. When users want to use local mount
feature, they MUST know
>> what they are doing, and how to use it.
> I can?t agree with you. There is no  mechanism to make sure customer will
follow that, you can?t expect customer understand tech well or even read the
doc.
yes, no one reads doc by default.

currently, mount with option 'nocluster' will show special info to user:

```
# mount -t ocfs2 -o nocluster /dev/vdd /mnt
Warning: to mount a clustered volume without the cluster stack.
Please make sure you only mount the file system from one node.
Otherwise, the file system may be damaged.
Proceed (y/N):
```
> It?s not the case that you don?t have choice, setup cluster stack is the
way to stop customer doing something bad, I believe you have to educate customer
to understand this is the cost to guard data security, otherwise when something
bad happens, they will lose important data, maybe even no way to recover.
This feature is not enabled by default, and also shows enough info/warn before
executing.
I give (may awkward) another example:
nocluster mount likes executing command 'rm -rf /', do you think we
should
tell/educate customer do not execute it?

The nocluster mount feature was designed to resolve customer pain point from
real world:
SUSE HA stack uses pacemaker+corosync+fsdlm+ocfs2, which
complicates/inconveniences
to set up. and need to install dozens of related packages.

The nocluster feature main use case:
customer wants to avoid to set up HA stack, but they wants to check ocfs2 volume
or do backup volume.

In my opinion, we should make ocfs2 more powerful and include more useful
features for users.
If there are some problems related new feature, we should do our best to fix it
not revert it.
>>
>>  From mount.ocfs2 (8), there also writes *only* mount fs on *one* node
at the same time.
>> And also tell user fs will be damaged under wrong action.
>>
>> ```
>> nocluster
>>
>>   This  option  allows  users  to  mount a clustered volume without
configuring the cluster
>>   stack.  However, you must be aware that you can only mount the file
system from one  node
>>   at the same time, otherwise, the file system may be damaged. Please
use it with caution.
>> ```
>>
>>> Setup ha or other cluster-aware stack is just the cost that we have
to
>>> take for avoiding corruption, otherwise we have to do it in kernel.
>>
>> It's a little bit serious to totally revert this commit just under
lacking sanity
>> check. If you or maintainer think the local mount should do more jobs
to prevent mix
>> local-mount and clustered-mount scenario, we could add more sanity
check during
>> local mounting.
> I don?t think this should be done in kernel. Setup cluster stack is the way
to forward.
> 
my mistake: all above 'local mount' should be 'nocluster mount'.

At last, let's totally understand your use case (or reproduce your hung
issue).

Thanks,
Heming

Heming Zhao

2022-Jun-06 08:27 UTC

head link

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

On Mon, Jun 06, 2022 at 10:08:53AM +0800, heming.zhao--- via Ocfs2-devel
wrote:> Hello Junxiao,
> 
> First of all, let's turn to the same channel to discuss your patch.
> There are two features: 'local mount' & 'nocluster
mount'.
> I mistakenly wrote local-mount on some place in previous mails.
> This patch revert commit 912f655d78c5d4, which is related with
'nocluster mount'.
> 
> 
> On 6/5/22 00:19, Junxiao Bi wrote:
> > 
> > 
> >> ? 2022?6?4????1:45?heming.zhao at suse.com ???
> >>
> >> ?Hello Junxiao,
> >>
> >>> On 6/4/22 06:28, Junxiao Bi via Ocfs2-devel wrote:
> >>> This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.
> >>> This commit introduced a regression that can cause mount hung.
> >>> The changes in __ocfs2_find_empty_slot causes that any node
with
> >>> none-zero node number can grab the slot that was already taken
by
> >>> node 0, so node 1 will access the same journal with node 0,
when it
> >>> try to grab journal cluster lock, it will hung because it was
already
> >>> acquired by node 0.
> >>> It's very easy to reproduce this, in one cluster, mount
node 0 first,
> >>> then node 1, you will see the following call trace from node
1.
> >>
> >>  From your description, it looks your env mixed local-mount &
clustered-mount.
> > No, only cluster mount.
> >>
> >> Could you mind to share your test/reproducible steps.
> >> And which ha stack do you use, pmck or o2cb?
> >>
> >> I failed to reproduce it, my test steps (with pcmk stack):
> >> ```
> >> node1:
> >> mount -t ocfs2 /dev/vdd /mnt
> >>
> >> node2:
> >> for i in {1..100}; do
> >> echo "mount <$i>"; mount -t ocfs2 /dev/vdd /mnt;
> >> sleep 3;
> >> echo "umount"; umount /mnt;
> >> done
> >> ```
> >>
> > Try set one node with node number 0 and mount it there first. I used
o2cb stack.
> 
> Could you show more test info/steps. I can't follow your meaning.
> How to set up a node with a fix node number?
> With my understanding, under pcmk env, the first mounted node will auto got
node
> number 1 (or any value great than 0). and there is no place to set node
number
> by hand. It's very likely you mixed to use nocluster & cluster
mount.
> If my suspect right (mixed mount), your use case is wrong.
> 
> >> This local mount feature helps SUSE customers to maintain ocfs2
partition, it's useful.
> >> I want to find whether there is a idear way to fix the hung issue.
> >>
> >>> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more
than 122 seconds.
> >>> [13148.739691]       Not tainted
5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
> >>> [13148.742560] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> [13148.745846] task:mount.ocfs2     state:D stack:    0
pid:53045 ppid: 53044 flags:0x00004000
> >>> [13148.749354] Call Trace:
> >>> ...
> >>> To fix it, we can just fix __ocfs2_find_empty_slot. But
original commit
> >>> introduced the feature to mount ocfs2 locally even it is
cluster based,
> >>> that is a very dangerous, it can easily cause serious data
corruption,
> >>> there is no way to stop other nodes mounting the fs and
corrupting it.
> >>
> >> I can't follow your meaning. When users want to use local
mount feature, they MUST know
> >> what they are doing, and how to use it.
> > I can?t agree with you. There is no  mechanism to make sure customer
will follow that, you can?t expect customer understand tech well or even read
the doc.
> 
> yes, no one reads doc by default.
> 
> currently, mount with option 'nocluster' will show special info to
user:
> 
> ```
> # mount -t ocfs2 -o nocluster /dev/vdd /mnt
> Warning: to mount a clustered volume without the cluster stack.
> Please make sure you only mount the file system from one node.
> Otherwise, the file system may be damaged.
> Proceed (y/N):
> ```
> 
> > It?s not the case that you don?t have choice, setup cluster stack is
the way to stop customer doing something bad, I believe you have to educate
customer to understand this is the cost to guard data security, otherwise when
something bad happens, they will lose important data, maybe even no way to
recover.
> 
> This feature is not enabled by default, and also shows enough info/warn
before executing.
> I give (may awkward) another example:
> nocluster mount likes executing command 'rm -rf /', do you think we
should
> tell/educate customer do not execute it?
> 
> The nocluster mount feature was designed to resolve customer pain point
from real world:
> SUSE HA stack uses pacemaker+corosync+fsdlm+ocfs2, which
complicates/inconveniences
> to set up. and need to install dozens of related packages.
> 
> The nocluster feature main use case:
> customer wants to avoid to set up HA stack, but they wants to check ocfs2
volume
> or do backup volume.
> 
> In my opinion, we should make ocfs2 more powerful and include more useful
features for users.
> If there are some problems related new feature, we should do our best to
fix it not revert it.
I am not familiar with o2cb stack. If o2cb could give a node with node
number ZERO, I have an idea to avoid mixed noncluster & cluster mounting.

there is slot map management struct:

struct ocfs2_extended_slot {
/*00*/	__u8	es_valid;
	__u8	es_reserved1[3];
	__le32	es_node_num;
/*08*/
};

we could use the es_reserved1[0] to give nocluster mount a speical flag.
maybe we could define:

#define OCFS2_NOCLUSTER_MOUNT 1
if (XX->es_reserved1[0] == OCFS2_NOCLUSTER_MOUNT)
	this_slot_is_mounted_by_noclustered_mode;

the code logic:
- When nocluster mount, check the es_valid for existing clustered mount,
  check the es_reserved1[0] for existing noclustered mount.
  If no other nodes mounted, do the noclustered mount, and set es_reserved1[0]
  with OCFS2_NOCLUSTER_MOUNT. Then clear this value when unmount.
- When another node prepares to mount with clustered mode, it should check
  es_reserved1[0] for detecting noclustered mount. ocfs2 should block the mount
  action if any slot is marked with OCFS2_NOCLUSTER_MOUNT. (make noclustered
  mount unique)

Thanks,
Heming
> 
> >>
> >>  From mount.ocfs2 (8), there also writes *only* mount fs on *one*
node at the same time.
> >> And also tell user fs will be damaged under wrong action.
> >>
> >> ```
> >> nocluster
> >>
> >>   This  option  allows  users  to  mount a clustered volume
without configuring the cluster
> >>   stack.  However, you must be aware that you can only mount the
file system from one  node
> >>   at the same time, otherwise, the file system may be damaged.
Please use it with caution.
> >> ```
> >>
> >>> Setup ha or other cluster-aware stack is just the cost that we
have to
> >>> take for avoiding corruption, otherwise we have to do it in
kernel.
> >>
> >> It's a little bit serious to totally revert this commit just
under lacking sanity
> >> check. If you or maintainer think the local mount should do more
jobs to prevent mix
> >> local-mount and clustered-mount scenario, we could add more sanity
check during
> >> local mounting.
> > I don?t think this should be done in kernel. Setup cluster stack is
the way to forward.
> > 
> 
> my mistake: all above 'local mount' should be 'nocluster
mount'.
> 
> At last, let's totally understand your use case (or reproduce your hung
issue).
> 
> Thanks,
> Heming
>

Junxiao Bi

2022-Jun-06 16:15 UTC

head link

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

On 6/5/22 7:08 PM, heming.zhao at suse.com wrote:
> Hello Junxiao,
>
> First of all, let's turn to the same channel to discuss your patch.
> There are two features: 'local mount' & 'nocluster
mount'.
> I mistakenly wrote local-mount on some place in previous mails.
> This patch revert commit 912f655d78c5d4, which is related with
'nocluster mount'.
>
>
> On 6/5/22 00:19, Junxiao Bi wrote:
>>
>>> ? 2022?6?4????1:45?heming.zhao at suse.com ???
>>>
>>> ?Hello Junxiao,
>>>
>>>> On 6/4/22 06:28, Junxiao Bi via Ocfs2-devel wrote:
>>>> This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.
>>>> This commit introduced a regression that can cause mount hung.
>>>> The changes in __ocfs2_find_empty_slot causes that any node
with
>>>> none-zero node number can grab the slot that was already taken
by
>>>> node 0, so node 1 will access the same journal with node 0,
when it
>>>> try to grab journal cluster lock, it will hung because it was
already
>>>> acquired by node 0.
>>>> It's very easy to reproduce this, in one cluster, mount
node 0 first,
>>>> then node 1, you will see the following call trace from node 1.
>>>   From your description, it looks your env mixed local-mount &
clustered-mount.
>> No, only cluster mount.
>>> Could you mind to share your test/reproducible steps.
>>> And which ha stack do you use, pmck or o2cb?
>>>
>>> I failed to reproduce it, my test steps (with pcmk stack):
>>> ```
>>> node1:
>>> mount -t ocfs2 /dev/vdd /mnt
>>>
>>> node2:
>>> for i in {1..100}; do
>>> echo "mount <$i>"; mount -t ocfs2 /dev/vdd /mnt;
>>> sleep 3;
>>> echo "umount"; umount /mnt;
>>> done
>>> ```
>>>
>> Try set one node with node number 0 and mount it there first. I used
o2cb stack.
> Could you show more test info/steps. I can't follow your meaning.
> How to set up a node with a fix node number?
> With my understanding, under pcmk env, the first mounted node will auto got
node
> number 1 (or any value great than 0). and there is no place to set node
number
> by hand. It's very likely you mixed to use nocluster & cluster
mount.
> If my suspect right (mixed mount), your use case is wrong.
Did you check my last mail? I already said i didn't do mixed mount, only 
cluster mount.

There is a configure file for o2cb, you can just set node number to 0, 
please check 
https://docs.oracle.com/en/operating-systems/oracle-linux/7/fsadmin/ol7-ocfs2.html#ol7-config-file-ocfs2
>
>>> This local mount feature helps SUSE customers to maintain ocfs2
partition, it's useful.
>>> I want to find whether there is a idear way to fix the hung issue.
>>>
>>>> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more
than 122 seconds.
>>>> [13148.739691]       Not tainted
5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
>>>> [13148.742560] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [13148.745846] task:mount.ocfs2     state:D stack:    0
pid:53045 ppid: 53044 flags:0x00004000
>>>> [13148.749354] Call Trace:
>>>> ...
>>>> To fix it, we can just fix __ocfs2_find_empty_slot. But
original commit
>>>> introduced the feature to mount ocfs2 locally even it is
cluster based,
>>>> that is a very dangerous, it can easily cause serious data
corruption,
>>>> there is no way to stop other nodes mounting the fs and
corrupting it.
>>> I can't follow your meaning. When users want to use local mount
feature, they MUST know
>>> what they are doing, and how to use it.
>> I can?t agree with you. There is no  mechanism to make sure customer
will follow that, you can?t expect customer understand tech well or even read
the doc.
> yes, no one reads doc by default.
>
> currently, mount with option 'nocluster' will show special info to
user:
>
> ```
> # mount -t ocfs2 -o nocluster /dev/vdd /mnt
> Warning: to mount a clustered volume without the cluster stack.
> Please make sure you only mount the file system from one node.
> Otherwise, the file system may be damaged.
> Proceed (y/N):
> ```
>
>> It?s not the case that you don?t have choice, setup cluster stack is
the way to stop customer doing something bad, I believe you have to educate
customer to understand this is the cost to guard data security, otherwise when
something bad happens, they will lose important data, maybe even no way to
recover.
> This feature is not enabled by default, and also shows enough info/warn
before executing.
> I give (may awkward) another example:
> nocluster mount likes executing command 'rm -rf /', do you think we
should
> tell/educate customer do not execute it?
That's totally out of domain of ocfs2, it's not ocfs2 developer's
job to
tell customer not doing that.

Here you provided a ocfs2 feature that can easily corrupt ocfs2.

As an ocfs2 developer, you should make sure ocfs2 was not corrupted even 
customer did something bad. That's why mkfs.ocfs2/fsck.ocfs2 check 
whether ocfs2 volume is mounted in the cluster before changing anything.
>
> The nocluster mount feature was designed to resolve customer pain point
from real world:
> SUSE HA stack uses pacemaker+corosync+fsdlm+ocfs2, which
complicates/inconveniences
> to set up. and need to install dozens of related packages.
>
> The nocluster feature main use case:
> customer wants to avoid to set up HA stack, but they wants to check ocfs2
volume
> or do backup volume.That doesn't mean you have to do this in kernel. Customer had a pain to 
setup HA stack, you should develop some script/app to make it
easy.>
> In my opinion, we should make ocfs2 more powerful and include more useful
features for users.
> If there are some problems related new feature, we should do our best to
fix it not revert it.
Only good/safe features, i don't think this one is qualified. Also no 
one give a reviewed-by to this commit, i am not sure how it was merged.

Joseph, what's your call on this?

Thanks,

Junxiao.
>>>   From mount.ocfs2 (8), there also writes *only* mount fs on *one*
node at the same time.
>>> And also tell user fs will be damaged under wrong action.
>>>
>>> ```
>>> nocluster
>>>
>>>    This  option  allows  users  to  mount a clustered volume
without configuring the cluster
>>>    stack.  However, you must be aware that you can only mount the
file system from one  node
>>>    at the same time, otherwise, the file system may be damaged.
Please use it with caution.
>>> ```
>>>
>>>> Setup ha or other cluster-aware stack is just the cost that we
have to
>>>> take for avoiding corruption, otherwise we have to do it in
kernel.
>>> It's a little bit serious to totally revert this commit just
under lacking sanity
>>> check. If you or maintainer think the local mount should do more
jobs to prevent mix
>>> local-mount and clustered-mount scenario, we could add more sanity
check during
>>> local mounting.
>> I don?t think this should be done in kernel. Setup cluster stack is the
way to forward.
>>
> my mistake: all above 'local mount' should be 'nocluster
mount'.
>
> At last, let's totally understand your use case (or reproduce your hung
issue).
>
> Thanks,
> Heming
>

Ocfs2 devel - Jun 2022 - [PATCH] Revert "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"