thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack" [Jun 2022]

If this information is useful, please help other people find it:
Share via:

heming.zhao at suse.com

2022-Jun-04 08:45 UTC

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

Hello Junxiao,

On 6/4/22 06:28, Junxiao Bi via Ocfs2-devel wrote:> This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.
> 
> This commit introduced a regression that can cause mount hung.
> The changes in __ocfs2_find_empty_slot causes that any node with
> none-zero node number can grab the slot that was already taken by
> node 0, so node 1 will access the same journal with node 0, when it
> try to grab journal cluster lock, it will hung because it was already
> acquired by node 0.
> It's very easy to reproduce this, in one cluster, mount node 0 first,
> then node 1, you will see the following call trace from node 1.
 From your description, it looks your env mixed local-mount &
clustered-mount.

Could you mind to share your test/reproducible steps.
And which ha stack do you use, pmck or o2cb?

I failed to reproduce it, my test steps (with pcmk stack):
```
node1:
mount -t ocfs2 /dev/vdd /mnt

node2:
for i in {1..100}; do
  echo "mount <$i>"; mount -t ocfs2 /dev/vdd /mnt;
  sleep 3;
  echo "umount"; umount /mnt;
done
```

This local mount feature helps SUSE customers to maintain ocfs2 partition,
it's useful.
I want to find whether there is a idear way to fix the hung issue.
> 
> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122
seconds.
> [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64
#2
> [13148.742560] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid:
53044 flags:0x00004000
> [13148.749354] Call Trace:
> [13148.750718]  <TASK>
> [13148.752019]  ? usleep_range+0x90/0x89
> [13148.753882]  __schedule+0x210/0x567
> [13148.755684]  schedule+0x44/0xa8
> [13148.757270]  schedule_timeout+0x106/0x13c
> [13148.759273]  ? __prepare_to_swait+0x53/0x78
> [13148.761218]  __wait_for_common+0xae/0x163
> [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
> [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
> [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
> [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
> [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
> [13148.775401]  ? iput+0x69/0xba
> [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
> [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
> [13148.781756]  mount_bdev+0x190/0x1b7
> [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
> [13148.785634]  legacy_get_tree+0x27/0x48
> [13148.787466]  vfs_get_tree+0x25/0xd0
> [13148.789270]  do_new_mount+0x18c/0x2d9
> [13148.791046]  __x64_sys_mount+0x10e/0x142
> [13148.792911]  do_syscall_64+0x3b/0x89
> [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
> [13148.797051] RIP: 0033:0x7f2309f6e26e
> [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX:
00000000000000a5
> [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX:
00007f2309f6e26e
> [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI:
0000559aa93a22b0
> [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09:
00007f230a0b4820
> [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12:
00007ffdcee7d420
> [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15:
0000000000000000
> [13148.816564]  </TASK>
> 
> To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
> introduced the feature to mount ocfs2 locally even it is cluster based,
> that is a very dangerous, it can easily cause serious data corruption,
> there is no way to stop other nodes mounting the fs and corrupting it.
I can't follow your meaning. When users want to use local mount feature,
they MUST know
what they are doing, and how to use it.

 From mount.ocfs2 (8), there also writes *only* mount fs on *one* node at the
same time.
And also tell user fs will be damaged under wrong action.

```
nocluster

   This  option  allows  users  to  mount a clustered volume without configuring
the cluster

   stack.  However, you must be aware that you can only mount the file system
from one  node

   at the same time, otherwise, the file system may be damaged. Please use it
with caution.
```
> Setup ha or other cluster-aware stack is just the cost that we have to
> take for avoiding corruption, otherwise we have to do it in kernel.
It's a little bit serious to totally revert this commit just under lacking
sanity
check. If you or maintainer think the local mount should do more jobs to prevent
mix
local-mount and clustered-mount scenario, we could add more sanity check during
local mounting.

Thanks,
Heming

Junxiao Bi

2022-Jun-04 16:19 UTC

head link

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

> ? 2022?6?4????1:45?heming.zhao at suse.com ???
> 
> ?Hello Junxiao,
> 
>> On 6/4/22 06:28, Junxiao Bi via Ocfs2-devel wrote:
>> This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144.
>> This commit introduced a regression that can cause mount hung.
>> The changes in __ocfs2_find_empty_slot causes that any node with
>> none-zero node number can grab the slot that was already taken by
>> node 0, so node 1 will access the same journal with node 0, when it
>> try to grab journal cluster lock, it will hung because it was already
>> acquired by node 0.
>> It's very easy to reproduce this, in one cluster, mount node 0
first,
>> then node 1, you will see the following call trace from node 1.
> 
> From your description, it looks your env mixed local-mount &
clustered-mount.
No, only cluster mount.> 
> Could you mind to share your test/reproducible steps.
> And which ha stack do you use, pmck or o2cb?
> 
> I failed to reproduce it, my test steps (with pcmk stack):
> ```
> node1:
> mount -t ocfs2 /dev/vdd /mnt
> 
> node2:
> for i in {1..100}; do
> echo "mount <$i>"; mount -t ocfs2 /dev/vdd /mnt;
> sleep 3;
> echo "umount"; umount /mnt;
> done
> ```
> Try set one node with node number 0 and mount it there first. I used o2cb
stack.> This local mount feature helps SUSE customers to maintain ocfs2 partition,
it's useful.
> I want to find whether there is a idear way to fix the hung issue.
> 
>> [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122
seconds.
>> [13148.739691]       Not tainted
5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
>> [13148.742560] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid:
53044 flags:0x00004000
>> [13148.749354] Call Trace:
>> [13148.750718]  <TASK>
>> [13148.752019]  ? usleep_range+0x90/0x89
>> [13148.753882]  __schedule+0x210/0x567
>> [13148.755684]  schedule+0x44/0xa8
>> [13148.757270]  schedule_timeout+0x106/0x13c
>> [13148.759273]  ? __prepare_to_swait+0x53/0x78
>> [13148.761218]  __wait_for_common+0xae/0x163
>> [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
>> [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
>> [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
>> [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
>> [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
>> [13148.775401]  ? iput+0x69/0xba
>> [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
>> [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
>> [13148.781756]  mount_bdev+0x190/0x1b7
>> [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
>> [13148.785634]  legacy_get_tree+0x27/0x48
>> [13148.787466]  vfs_get_tree+0x25/0xd0
>> [13148.789270]  do_new_mount+0x18c/0x2d9
>> [13148.791046]  __x64_sys_mount+0x10e/0x142
>> [13148.792911]  do_syscall_64+0x3b/0x89
>> [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
>> [13148.797051] RIP: 0033:0x7f2309f6e26e
>> [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX:
00000000000000a5
>> [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX:
00007f2309f6e26e
>> [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI:
0000559aa93a22b0
>> [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09:
00007f230a0b4820
>> [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12:
00007ffdcee7d420
>> [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15:
0000000000000000
>> [13148.816564]  </TASK>
>> To fix it, we can just fix __ocfs2_find_empty_slot. But original commit
>> introduced the feature to mount ocfs2 locally even it is cluster based,
>> that is a very dangerous, it can easily cause serious data corruption,
>> there is no way to stop other nodes mounting the fs and corrupting it.
> 
> I can't follow your meaning. When users want to use local mount
feature, they MUST know
> what they are doing, and how to use it.I can?t agree with you. There is no  mechanism to make sure customer will follow
that, you can?t expect customer understand tech well or even read the doc.
It?s not the case that you don?t have choice, setup cluster stack is the way to
stop customer doing something bad, I believe you have to educate customer to
understand this is the cost to guard data security, otherwise when something bad
happens, they will lose important data, maybe even no way to
recover.> 
> From mount.ocfs2 (8), there also writes *only* mount fs on *one* node at
the same time.
> And also tell user fs will be damaged under wrong action.
> 
> ```
> nocluster
> 
>  This  option  allows  users  to  mount a clustered volume without
configuring the cluster
> 
>  stack.  However, you must be aware that you can only mount the file system
from one  node
> 
>  at the same time, otherwise, the file system may be damaged. Please use it
with caution.
> ```
> 
>> Setup ha or other cluster-aware stack is just the cost that we have to
>> take for avoiding corruption, otherwise we have to do it in kernel.
> 
> It's a little bit serious to totally revert this commit just under
lacking sanity
> check. If you or maintainer think the local mount should do more jobs to
prevent mix
> local-mount and clustered-mount scenario, we could add more sanity check
during
> local mounting.I don?t think this should be done in kernel. Setup cluster stack is the way to
forward.

Thanks,
Junxiao> 
> Thanks,
> Heming
>

Ocfs2 devel - Jun 2022 - [PATCH] Revert "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"

[Ocfs2-devel] [PATCH] Revert "ocfs2: mount shared volume without ha stack"