Heming Zhao
2022-Jul-30 01:14 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
the key different between local mount and non-clustered mount: local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do convert job without ha stack. non-clustered mount feature can run totally without ha stack. commit 912f655d78c5 ("ocfs2: mount shared volume without ha stack") had bug, then commit c80af0c250c8f8a3c978aa5aafbe9c39b336b813 reverted it. Let's give some explain for the issue mentioned by commit c80af0c250c8. Under Junxiao's call trace, in __ocfs2_find_empty_slot(), the 'if' accessment is wrong. sl_node_num could be 0 at o2cb env. with current information, the trigger flow (base on 912f655d78c5): 1> nodeA with 'node_num = 0' for mounting. it will succeed. at this time, slotmap extent block will contains es_valid:1 & es_node_num:0 for nodeA then ocfs2_update_disk_slot() will write back slotmap info to disk. 2> then, nodeB with 'node_num = 1' for mounting this time, osb->node_num is 1 (set by config file), osb->preferred is OCFS2_INVALID_SLOT (set by ocfs2_parse_options). ocfs2_find_slot + ocfs2_update_slot_info //read slotmap info from disk | + set si->si_slots[0].es_valid = 1 & si->si_slots[0].sl_node_num = 0 | + __ocfs2_node_num_to_slot //will return -ENOENT. + __ocfs2_find_empty_slot + if ((preferred >= 0) && (preferred < si->si_num_slots)) | fails enter this 'if' for preferred value is OCFS2_INVALID_SLOT | + 'for(i = 0; i < si->si_num_slots; i++)' search slot 0 successfully. | 'si->si_slots[0].sl_node_num' is false. trigger 'break' condition. | + return slot 0. it will cause nodeB grab nodeA journal dlm lock, then trigger hung. How to do for this bug? This commit re-enabled 912f655d78c5, next commit (add MMP support) will fix the issue. Signed-off-by: Heming Zhao <heming.zhao at suse.com> --- fs/ocfs2/ocfs2.h | 4 +++- fs/ocfs2/slot_map.c | 46 ++++++++++++++++++++++++++------------------- fs/ocfs2/super.c | 21 +++++++++++++++++++++ 3 files changed, 51 insertions(+), 20 deletions(-) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index 740b64238312..337527571461 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -277,6 +277,7 @@ enum ocfs2_mount_options OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT = 1 << 15, /* Journal Async Commit */ OCFS2_MOUNT_ERRORS_CONT = 1 << 16, /* Return EIO to the calling process on error */ OCFS2_MOUNT_ERRORS_ROFS = 1 << 17, /* Change filesystem to read-only on error */ + OCFS2_MOUNT_NOCLUSTER = 1 << 18, /* No cluster aware filesystem mount */ }; #define OCFS2_OSB_SOFT_RO 0x0001 @@ -672,7 +673,8 @@ static inline int ocfs2_cluster_o2cb_global_heartbeat(struct ocfs2_super *osb) static inline int ocfs2_mount_local(struct ocfs2_super *osb) { - return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT); + return ((osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT) + || (osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER)); } static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb) diff --git a/fs/ocfs2/slot_map.c b/fs/ocfs2/slot_map.c index da7718cef735..0b0ae3ebb0cf 100644 --- a/fs/ocfs2/slot_map.c +++ b/fs/ocfs2/slot_map.c @@ -252,14 +252,16 @@ static int __ocfs2_find_empty_slot(struct ocfs2_slot_info *si, int i, ret = -ENOSPC; if ((preferred >= 0) && (preferred < si->si_num_slots)) { - if (!si->si_slots[preferred].sl_valid) { + if (!si->si_slots[preferred].sl_valid || + !si->si_slots[preferred].sl_node_num) { ret = preferred; goto out; } } for(i = 0; i < si->si_num_slots; i++) { - if (!si->si_slots[i].sl_valid) { + if (!si->si_slots[i].sl_valid || + !si->si_slots[i].sl_node_num) { ret = i; break; } @@ -454,24 +456,30 @@ int ocfs2_find_slot(struct ocfs2_super *osb) spin_lock(&osb->osb_lock); ocfs2_update_slot_info(si); - /* search for ourselves first and take the slot if it already - * exists. Perhaps we need to mark this in a variable for our - * own journal recovery? Possibly not, though we certainly - * need to warn to the user */ - slot = __ocfs2_node_num_to_slot(si, osb->node_num); - if (slot < 0) { - /* if no slot yet, then just take 1st available - * one. */ - slot = __ocfs2_find_empty_slot(si, osb->preferred_slot); + if (ocfs2_mount_local(osb)) + /* use slot 0 directly in local mode */ + slot = 0; + else { + /* search for ourselves first and take the slot if it already + * exists. Perhaps we need to mark this in a variable for our + * own journal recovery? Possibly not, though we certainly + * need to warn to the user */ + slot = __ocfs2_node_num_to_slot(si, osb->node_num); if (slot < 0) { - spin_unlock(&osb->osb_lock); - mlog(ML_ERROR, "no free slots available!\n"); - status = -EINVAL; - goto bail; - } - } else - printk(KERN_INFO "ocfs2: Slot %d on device (%s) was already " - "allocated to this node!\n", slot, osb->dev_str); + /* if no slot yet, then just take 1st available + * one. */ + slot = __ocfs2_find_empty_slot(si, osb->preferred_slot); + if (slot < 0) { + spin_unlock(&osb->osb_lock); + mlog(ML_ERROR, "no free slots available!\n"); + status = -EINVAL; + goto bail; + } + } else + printk(KERN_INFO "ocfs2: Slot %d on device (%s) was " + "already allocated to this node!\n", + slot, osb->dev_str); + } ocfs2_set_slot(si, slot, osb->node_num); osb->slot_num = slot; diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 438be028935d..f7298816d8d9 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -172,6 +172,7 @@ enum { Opt_dir_resv_level, Opt_journal_async_commit, Opt_err_cont, + Opt_nocluster, Opt_err, }; @@ -205,6 +206,7 @@ static const match_table_t tokens = { {Opt_dir_resv_level, "dir_resv_level=%u"}, {Opt_journal_async_commit, "journal_async_commit"}, {Opt_err_cont, "errors=continue"}, + {Opt_nocluster, "nocluster"}, {Opt_err, NULL} }; @@ -616,6 +618,13 @@ static int ocfs2_remount(struct super_block *sb, int *flags, char *data) goto out; } + tmp = OCFS2_MOUNT_NOCLUSTER; + if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) { + ret = -EINVAL; + mlog(ML_ERROR, "Cannot change nocluster option on remount\n"); + goto out; + } + tmp = OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL | OCFS2_MOUNT_HB_NONE; if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) { @@ -856,6 +865,7 @@ static int ocfs2_verify_userspace_stack(struct ocfs2_super *osb, } if (ocfs2_userspace_stack(osb) && + !(osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) && strncmp(osb->osb_cluster_stack, mopt->cluster_stack, OCFS2_STACK_LABEL_LEN)) { mlog(ML_ERROR, @@ -1127,6 +1137,11 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent) osb->s_mount_opt & OCFS2_MOUNT_DATA_WRITEBACK ? "writeback" : "ordered"); + if ((osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) && + !(osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT)) + printk(KERN_NOTICE "ocfs2: The shared device (%s) is mounted " + "without cluster aware mode.\n", osb->dev_str); + atomic_set(&osb->vol_state, VOLUME_MOUNTED); wake_up(&osb->osb_mount_event); @@ -1437,6 +1452,9 @@ static int ocfs2_parse_options(struct super_block *sb, case Opt_journal_async_commit: mopt->mount_opt |= OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT; break; + case Opt_nocluster: + mopt->mount_opt |= OCFS2_MOUNT_NOCLUSTER; + break; default: mlog(ML_ERROR, "Unrecognized mount option \"%s\" " @@ -1548,6 +1566,9 @@ static int ocfs2_show_options(struct seq_file *s, struct dentry *root) if (opts & OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT) seq_printf(s, ",journal_async_commit"); + if (opts & OCFS2_MOUNT_NOCLUSTER) + seq_printf(s, ",nocluster"); + return 0; } -- 2.37.1
Mark Fasheh
2022-Jul-31 17:42 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
Hi Heming, On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel <ocfs2-devel at oss.oracle.com> wrote:> > the key different between local mount and non-clustered mount: > local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do > convert job without ha stack. non-clustered mount feature can run > totally without ha stack.Can you please elaborate on this? Local mounts can run without a cluster stack so I don't see the difference there. We have tunefs.ocfs2 look for and join the cluster so as to avoid corrupting users data - that's a feature, not a bug. So what I'm seeing here is just opening us to potential corruptions. Is there a specific use case here that you're trying to account for? Are you fixing a particular bug? Thanks, --Mark
heming.zhao at suse.com
2022-Aug-01 01:01 UTC
[Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
Hello Mark, On 8/1/22 01:42, Mark Fasheh wrote:> Hi Heming, > > On Fri, Jul 29, 2022 at 6:15 PM Heming Zhao via Ocfs2-devel > <ocfs2-devel at oss.oracle.com> wrote: >> >> the key different between local mount and non-clustered mount: >> local mount feature (tunefs.ocfs2 --fs-features=[no]local) can't do >> convert job without ha stack. non-clustered mount feature can run >> totally without ha stack. > > Can you please elaborate on this? Local mounts can run without a > cluster stack so I don't see the difference there. We haveI am using pacemaker cluster stack. In my env, the trouble of the converting between local and clustered mounts are only happening on cluster stack. the non-clustered mount feature (Gang He commit: 912f655d78c5) gave ocfs2 the ability to mount volume at any env (with/without cluster stack). The 912f655d78c5 derived from SUSE customer complain: User wanted to fsck the backup ocfs2 volume in non-clustered env. They wanted to access the volume quickly and didn't want to take time/resource to set up HA stack. (by the way, pcmk stack at least needs two nodes to set up a cluster.)> tunefs.ocfs2 look for and join the cluster so as to avoid corrupting > users data - that's a feature, not a bug. So what I'm seeing here is > just opening us to potential corruptions. Is there a specific use case > here that you're trying to account for? Are you fixing a particular > bug? >Tunefs.ocfs2 still needs HA/dlm stack to protect joining action. commit 912f655d78c5 works on non-clustered env, which needs other tech (eg. MMP) to protect corrupting. From my viewpoint, the non-clustered mount code is based on local mount code, which gives more flexible than local mount. non-clustered mount uses unify mount style align with clustered mount. I think users will like more to use non-clustered mount than using tunefs.ocfs2 to change mount type. Thanks, Heming