Eric Ren
2016-Oct-19 05:19 UTC
[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?
Hi all! Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") results in another deadlock as we have discussed in the recent thread: https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html Before this one, a similiar deadlock has been fixed by Junxiao: commit c25a1e0671fb ("ocfs2: fix posix_acl_create deadlock") commit 5ee0fbd50fdf ("ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang") We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, as the deadlock issues have proved out, it's very hard to handle the routines that are called directly by vfs. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl = ocfs2_iop_get_acl, .set_acl = ocfs2_iop_set_acl, }; ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock(). The problem is that the call chain of ocfs2_permission() includes *_acl(). Possibly, there are three solutions I can think of. The first one is to implement the inode permission routine for ocfs2 itself, replacing the existing generic_permission(); this will bring lots of changes and involve too many trivial vfs functions into ocfs2 code. Frown on this. The second one is, what I am trying now, to keep track of the processes who lock/unlock a cluster lock by the following draft patches. But, I quickly find out that a cluster locking which has been taken by processA can be unlocked by processB. For example, systemfiles like journal:0000 is locked during mout, and unlocked during umount. The thrid one is to revert that problematic commit! It looks like get/set_acl() are always been called by other vfs callback like ocfs2_permission(). I think we can do this if it's true, right? Anyway, I'll try to work out if it's true;-) Hope for your input to solve this problem;-) Thanks, Eric
Eric Ren
2016-Oct-19 05:19 UTC
[Ocfs2-devel] [DRAFT 1/2] ocfs2/dlmglue: keep track of the processes who take/put a cluster lock
We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl = ocfs2_iop_get_acl, .set_acl = ocfs2_iop_set_acl, }; ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock(). The problem is that the call chain of ocfs2_permission() includes *_acl(). Possibly, there are two solutions I can think of. One way is to implement the inode permission routine for ocfs2 itself, replacing the existing generic_permission(); this will bring lots of changes and involve too many trivial vfs functions into ocfs2 code. Another way is to keep track of the processes who lock/unlock a cluster lock. This patch provides ocfs2_is_locked_by_me() for process to check if the cluster lock has been requested before. This is now only used for avoiding recursive locking, though it also can help debug cluster locking issue. Unfortunately, this may incur some performance lost. Signed-off-by: Eric Ren <zren at suse.com> --- fs/ocfs2/dlmglue.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/ocfs2/dlmglue.h | 13 ++++++++++++ fs/ocfs2/ocfs2.h | 1 + 3 files changed, 74 insertions(+) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 83d576f..9f91884 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -532,6 +532,7 @@ void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) init_waitqueue_head(&res->l_event); INIT_LIST_HEAD(&res->l_blocked_list); INIT_LIST_HEAD(&res->l_mask_waiters); + INIT_LIST_HEAD(&res->l_holders); } void ocfs2_inode_lock_res_init(struct ocfs2_lock_res *res, @@ -749,6 +750,48 @@ void ocfs2_lock_res_free(struct ocfs2_lock_res *res) res->l_flags = 0UL; } +static inline void ocfs2_add_holder(struct ocfs2_lock_res *lockres) +{ + struct ocfs2_holder *oh = kmalloc(sizeof(struct ocfs2_holder), GFP_KERNEL); + + INIT_LIST_HEAD(&oh->oh_list); + oh->oh_lockres = lockres; + oh->oh_owner_pid = get_pid(task_pid(current)); + + spin_lock(&lockres->l_lock); + list_add_tail(&oh->oh_list, &lockres->l_holders); + spin_unlock(&lockres->l_lock); +} + +static inline void ocfs2_remove_holder(struct ocfs2_lock_res *lockres, + struct ocfs2_holder *oh) +{ + spin_lock(&lockres->l_lock); + list_del(&oh->oh_list); + spin_unlock(&lockres->l_lock); + + put_pid(oh->oh_owner_pid); + kfree(oh); +} + +struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres) +{ + struct ocfs2_holder *oh; + struct pid *pid; + + spin_lock(&lockres->l_lock); + pid = task_pid(current); + list_for_each_entry(oh, &lockres->l_holders, oh_list) { + if (oh->oh_owner_pid == pid) + goto out; + } + oh = NULL; +out: + spin_unlock(&lockres->l_lock); + + return oh; +} + static inline void ocfs2_inc_holders(struct ocfs2_lock_res *lockres, int level) { @@ -1392,6 +1435,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb, int noqueue_attempted = 0; int dlm_locked = 0; int kick_dc = 0; + struct ocfs2_holder *oh; if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) { mlog_errno(-EINVAL); @@ -1403,6 +1447,14 @@ static int __ocfs2_cluster_lock(struct ocfs2_super *osb, if (lockres->l_ops->flags & LOCK_TYPE_USES_LVB) lkm_flags |= DLM_LKF_VALBLK; + /* This block is just used to check recursive locking now */ + oh = ocfs2_is_locked_by_me(lockres); + if (unlikely(oh)) + mlog_bug_on_msg(1, "PID(%d) locks on lockres(%s) recursively\n", + pid_nr(oh->oh_owner_pid), lockres->l_name); + else + ocfs2_add_holder(lockres); + again: wait = 0; @@ -1596,6 +1648,14 @@ static void __ocfs2_cluster_unlock(struct ocfs2_super *osb, unsigned long caller_ip) { unsigned long flags; + struct ocfs2_holder *oh = ocfs2_is_locked_by_me(lockres); + + /* This block is just used to check recursive locking now */ + if (unlikely(!oh)) + mlog_bug_on_msg(1, "PID(%d) unlock lockres(%s) unnecessarily\n", + pid_nr(task_pid(current)), lockres->l_name); + else + ocfs2_remove_holder(lockres, oh); spin_lock_irqsave(&lockres->l_lock, flags); ocfs2_dec_holders(lockres, level); diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h index d293a22..3b1d4e7 100644 --- a/fs/ocfs2/dlmglue.h +++ b/fs/ocfs2/dlmglue.h @@ -70,6 +70,13 @@ struct ocfs2_orphan_scan_lvb { __be32 lvb_os_seqno; }; +struct ocfs2_holder { + struct list_head oh_list; + + struct ocfs2_lock_res *oh_lockres; + struct pid *oh_owner_pid; +}; + /* ocfs2_inode_lock_full() 'arg_flags' flags */ /* don't wait on recovery. */ #define OCFS2_META_LOCK_RECOVERY (0x01) @@ -170,4 +177,10 @@ void ocfs2_put_dlm_debug(struct ocfs2_dlm_debug *dlm_debug); /* To set the locking protocol on module initialization */ void ocfs2_set_locking_protocol(void); + +/* + * Keep a list of processes that have interest in a lockres. + * Note: this is now only uesed for check recursive cluster lock. + */ +struct ocfs2_holder *ocfs2_is_locked_by_me(struct ocfs2_lock_res *lockres); #endif /* DLMGLUE_H */ diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index e63af7d..12933b8 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -172,6 +172,7 @@ struct ocfs2_lock_res { struct list_head l_blocked_list; struct list_head l_mask_waiters; + struct list_head l_holders; unsigned long l_flags; char l_name[OCFS2_LOCK_ID_MAX_LEN]; -- 2.6.6
Eric Ren
2016-Oct-19 05:19 UTC
[Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking
The deadlock issue happens when running discontiguous block group testing on multiple nodes. The easier way to reproduce is to do "chmod -R 777 /mnt/ocfs2" things like this on multiple nodes at the same time by pssh. This is indeed another deadlock caused by: commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). The reason had been explained well by Tariq Saeed in this thread: https://oss.oracle.com/pipermail/ocfs2-devel/2015-September/011085.html For this case, the ocfs2_inode_lock() is misused recursively as below: do_sys_open do_filp_open path_openat may_open inode_permission __inode_permission ocfs2_permission <====== ocfs2_inode_lock() generic_permission get_acl ocfs2_iop_get_acl <====== ocfs2_inode_lock() ocfs2_inode_lock_full_nested <===== deadlock if a remote EX request comes between two ocfs2_inode_lock() Fix by checking if the cluster lock has been acquired aready in the call-chain path. Fixes: commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren <zren at suse.com> --- fs/ocfs2/acl.c | 39 +++++++++++++++++++++++++++------------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c index bed1fcb..7e3544e 100644 --- a/fs/ocfs2/acl.c +++ b/fs/ocfs2/acl.c @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle, int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) { struct buffer_head *bh = NULL; + struct ocfs2_holder *oh; + struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres; int status = 0; - status = ocfs2_inode_lock(inode, &bh, 1); - if (status < 0) { - if (status != -ENOENT) - mlog_errno(status); - return status; + oh = ocfs2_is_locked_by_me(lockres); + if (!oh) { + status = ocfs2_inode_lock(inode, &bh, 1); + if (status < 0) { + if (status != -ENOENT) + mlog_errno(status); + return status; + } } + status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); - ocfs2_inode_unlock(inode, 1); + + if (!oh) + ocfs2_inode_unlock(inode, 1); brelse(bh); return status; } @@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) struct ocfs2_super *osb; struct buffer_head *di_bh = NULL; struct posix_acl *acl; + struct ocfs2_holder *oh; + struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres; int ret; osb = OCFS2_SB(inode->i_sb); if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) return NULL; - ret = ocfs2_inode_lock(inode, &di_bh, 0); - if (ret < 0) { - if (ret != -ENOENT) - mlog_errno(ret); - return ERR_PTR(ret); + + oh = ocfs2_is_locked_by_me(lockres); + if (!oh) { + ret = ocfs2_inode_lock(inode, &di_bh, 0); + if (ret < 0) { + if (ret != -ENOENT) + mlog_errno(ret); + return ERR_PTR(ret); + } } acl = ocfs2_get_acl_nolock(inode, type, di_bh); - ocfs2_inode_unlock(inode, 0); + if (!oh) + ocfs2_inode_unlock(inode, 0); brelse(di_bh); return acl; } -- 2.6.6
Junxiao Bi
2016-Oct-19 06:57 UTC
[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?
Hi Eric,> ? 2016?10?19????1:19?Eric Ren <zren at suse.com> ??? > > Hi all! > > Commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") > results in another deadlock as we have discussed in the recent thread: > https://oss.oracle.com/pipermail/ocfs2-devel/2016-October/012454.html > > Before this one, a similiar deadlock has been fixed by Junxiao: > commit c25a1e0671fb ("ocfs2: fix posix_acl_create deadlock") > commit 5ee0fbd50fdf ("ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang") > > We are in the situation that we have to avoid recursive cluster locking, but > there is no way to check if a cluster lock has been taken by a precess already. > > Mostly, we can avoid recursive locking by writing code carefully. However, as > the deadlock issues have proved out, it's very hard to handle the routines > that are called directly by vfs. For instance: > > const struct inode_operations ocfs2_file_iops = { > .permission = ocfs2_permission, > .get_acl = ocfs2_iop_get_acl, > .set_acl = ocfs2_iop_set_acl, > }; > > > ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock(). > The problem is that the call chain of ocfs2_permission() includes *_acl(). > > Possibly, there are three solutions I can think of. The first one is to > implement the inode permission routine for ocfs2 itself, replacing the > existing generic_permission(); this will bring lots of changes and > involve too many trivial vfs functions into ocfs2 code. Frown on this. > > The second one is, what I am trying now, to keep track of the processes who > lock/unlock a cluster lock by the following draft patches. But, I quickly > find out that a cluster locking which has been taken by processA can be unlocked > by processB. For example, systemfiles like journal:0000 is locked during mout, and > unlocked during umount.I had ever implemented generic recursive locking support, please check the patch at https://oss.oracle.com/pipermail/ocfs2-devel/2015-December/011408.html <https://oss.oracle.com/pipermail/ocfs2-devel/2015-December/011408.html> , the issue that locking and unlocking in different processes was considered. But it was rejected by Mark as recursive locking is not allowed in ocfs2/kernel .> > The thrid one is to revert that problematic commit! It looks like get/set_acl() > are always been called by other vfs callback like ocfs2_permission(). I think > we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)Not sure whether get/set_acl() will be called directly by vfs. Even not now, we can?t make sure that in the future. So revert it may be a little risky. But if refactor is complicated, then this maybe the only way we can do. Thanks, Junxiao.> > Hope for your input to solve this problem;-) > > Thanks, > Eric >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20161019/324b727e/attachment.html
Eric Ren
2016-Oct-24 09:13 UTC
[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?
Hi all, On 10/19/2016 01:19 PM, Eric Ren wrote:> The thrid one is to revert that problematic commit! It looks like get/set_acl() > are always been called by other vfs callback like ocfs2_permission(). I think > we can do this if it's true, right? Anyway, I'll try to work out if it's true;-)After looking into more, I get to know get/set_acl() can be invoked directly from vfs, for instance: fsetxattr() setxattr() vfs_setxattr() __vfs_setxattr() handler->set(handler, dentry, inode, name, value, size) // posix_acl_access_xattr_handler.set = posix_acl_xattr_set posix_acl_xattr_set() set_posix_acl() inode->i_op->set_acl() So, this problem looks really hard to solve:-/ Eric> > Hope for your input to solve this problem;-) > > Thanks, > Eric > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >
Christoph Hellwig
2016-Oct-28 06:20 UTC
[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?
Hi Eric, I've added linux-fsdevel to the cc list as this should get a bit broader attention. On Wed, Oct 19, 2016 at 01:19:40PM +0800, Eric Ren wrote:> Mostly, we can avoid recursive locking by writing code carefully. However, as > the deadlock issues have proved out, it's very hard to handle the routines > that are called directly by vfs. For instance: > > const struct inode_operations ocfs2_file_iops = { > .permission = ocfs2_permission, > .get_acl = ocfs2_iop_get_acl, > .set_acl = ocfs2_iop_set_acl, > }; > > > ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock(). > The problem is that the call chain of ocfs2_permission() includes *_acl().What do you actually protect in ocfs2_permission? It's a trivial wrapper around generic_permission which just looks at the VFS inode. I think the right fix is to remove ocfs2_permission entirely and use the default VFS implementation. That both solves your locking problem, and it will also get you RCU lookup instead of dropping out of RCU mode all the time.
Eric Ren
2016-Nov-09 04:47 UTC
[Ocfs2-devel] [RFC] Should we revert commit "ocfs2: take inode lock in ocfs2_iop_set/get_acl()"? or other ideas?
Hi all, On 10/19/2016 01:19 PM, Eric Ren wrote:> ocfs2_permission() and ocfs2_iop_get/set_acl() both call ocfs2_inode_lock(). > The problem is that the call chain of ocfs2_permission() includes *_acl(). > > Possibly, there are three solutions I can think of. The first one is to > implement the inode permission routine for ocfs2 itself, replacing the > existing generic_permission(); this will bring lots of changes and > involve too many trivial vfs functions into ocfs2 code. Frown on this. > > The second one is, what I am trying now, to keep track of the processes who > lock/unlock a cluster lock by the following draft patches. But, I quickly > find out that a cluster locking which has been taken by processA can be unlocked > by processB. For example, systemfiles like journal:0000 is locked during mout, and > unlocked during umount.We can avoid the problem above by: 1) not keeping track of system file inode: if (!(OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE)) { .... } 2) only keeping track of inode metadata lockres: OCFS2_I(inode)->ip_inode_lockres; because inode open lockres can also be get/release by different processes. Eric> > The thrid one is to revert that problematic commit! It looks like get/set_acl() > are always been called by other vfs callback like ocfs2_permission(). I think > we can do this if it's true, right? Anyway, I'll try to work out if it's true;-) > > Hope for your input to solve this problem;-) > > Thanks, > Eric > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20161109/df737ced/attachment.html
Eric Ren
2016-Nov-09 04:55 UTC
[Ocfs2-devel] [DRAFT 2/2] ocfs2: fix deadlock caused by recursive cluster locking
Hi all, On 10/19/2016 01:19 PM, Eric Ren wrote:> diff --git a/fs/ocfs2/acl.c b/fs/ocfs2/acl.c > index bed1fcb..7e3544e 100644 > --- a/fs/ocfs2/acl.c > +++ b/fs/ocfs2/acl.c > @@ -283,16 +283,24 @@ int ocfs2_set_acl(handle_t *handle, > int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, int type) > { > struct buffer_head *bh = NULL; > + struct ocfs2_holder *oh; > + struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres; > int status = 0; > > - status = ocfs2_inode_lock(inode, &bh, 1); > - if (status < 0) { > - if (status != -ENOENT) > - mlog_errno(status); > - return status; > + oh = ocfs2_is_locked_by_me(lockres); > + if (!oh) { > + status = ocfs2_inode_lock(inode, &bh, 1); > + if (status < 0) { > + if (status != -ENOENT) > + mlog_errno(status); > + return status; > + } > }This is wrong. We also depend ocfs2_inode_lock() pass out "bh" for later use. So, we may need another function something like ocfs2_inode_getbh(): if (!oh) ocfs2_inode_lock(); else ocfs2_inode_getbh(); Eric> + > status = ocfs2_set_acl(NULL, inode, bh, type, acl, NULL, NULL); > - ocfs2_inode_unlock(inode, 1); > + > + if (!oh) > + ocfs2_inode_unlock(inode, 1); > brelse(bh); > return status; > } > @@ -302,21 +310,28 @@ struct posix_acl *ocfs2_iop_get_acl(struct inode *inode, int type) > struct ocfs2_super *osb; > struct buffer_head *di_bh = NULL; > struct posix_acl *acl; > + struct ocfs2_holder *oh; > + struct ocfs2_lock_res *lockres = &OCFS2_I(inode)->ip_inode_lockres; > int ret; > > osb = OCFS2_SB(inode->i_sb); > if (!(osb->s_mount_opt & OCFS2_MOUNT_POSIX_ACL)) > return NULL; > - ret = ocfs2_inode_lock(inode, &di_bh, 0); > - if (ret < 0) { > - if (ret != -ENOENT) > - mlog_errno(ret); > - return ERR_PTR(ret); > + > + oh = ocfs2_is_locked_by_me(lockres); > + if (!oh) { > + ret = ocfs2_inode_lock(inode, &di_bh, 0); > + if (ret < 0) { > + if (ret != -ENOENT) > + mlog_errno(ret); > + return ERR_PTR(ret); > + } > } > > acl = ocfs2_get_acl_nolock(inode, type, di_bh); > > - ocfs2_inode_unlock(inode, 0); > + if (!oh) > + ocfs2_inode_unlock(inode, 0); > brelse(di_bh); > return acl; > }-------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20161109/ddb6d0e4/attachment.html