Jan Kara
2009-Jun-02 12:23 UTC
[Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
Hi, I'm resending this patch series. It's rediffed against linux-next branch of Joel's git tree. The first four patches are obvious fixes of deadlocks in quota code and should go in as soon as possible. The other three patches implement lockdep support for OCFS2 cluster locks. So you can have a look whether the code make sence to you and possibly merge them. They should be NOP when lockdep is disabled and help a lot when debugging various locking problems (with these patches I found the problems fixed in the beginning of the series). BTW: Currently lockdep complains about some configfs issue and turns itself off so that's why I didn't get to finding more deadlocks ;) Honza
Jan Kara
2009-Jun-02 12:23 UTC
[Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in ocfs2_global_read_dquot()
It is not possible to get a read lock and then try to get the same write lock in one thread as that can block on downconvert being requested by other node leading to deadlock. So first drop the quota lock for reading and only after that get it for writing. Signed-off-by: Jan Kara <jack at suse.cz> --- fs/ocfs2/quota_global.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c index 1ed0f7c..edfa60c 100644 --- a/fs/ocfs2/quota_global.c +++ b/fs/ocfs2/quota_global.c @@ -421,6 +421,7 @@ int ocfs2_global_read_dquot(struct dquot *dquot) OCFS2_DQUOT(dquot)->dq_originodes = dquot->dq_dqb.dqb_curinodes; if (!dquot->dq_off) { /* No real quota entry? */ /* Upgrade to exclusive lock for allocation */ + ocfs2_qinfo_unlock(info, 0); err = ocfs2_qinfo_lock(info, 1); if (err < 0) goto out_qlock; @@ -435,7 +436,8 @@ int ocfs2_global_read_dquot(struct dquot *dquot) out_qlock: if (ex) ocfs2_qinfo_unlock(info, 1); - ocfs2_qinfo_unlock(info, 0); + else + ocfs2_qinfo_unlock(info, 0); out: if (err < 0) mlog_errno(err); -- 1.6.0.2
Jan Kara
2009-Jun-02 12:24 UTC
[Ocfs2-devel] [PATCH] ocfs2: Fix lock inversion in ocfs2_local_read_info()
This function is called with dqio_mutex held but it has to acquire lock from global quota file which ranks above this lock. This is not deadlockable lock inversion since this code path is take only during mount when noone else can race with us but let's clean this up to silence lockdep. We just drop the dqio_mutex in the beginning of the function and reacquire it in the end since we don't need it - noone can race with us at this moment. Signed-off-by: Jan Kara <jack at suse.cz> --- fs/ocfs2/quota_local.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/quota_local.c b/fs/ocfs2/quota_local.c index 07deec5..71cf410 100644 --- a/fs/ocfs2/quota_local.c +++ b/fs/ocfs2/quota_local.c @@ -655,6 +655,9 @@ static int ocfs2_local_read_info(struct super_block *sb, int type) struct ocfs2_quota_recovery *rec; int locked = 0; + /* We don't need the lock and we have to acquire quota file locks + * which will later depend on this lock */ + mutex_unlock(&sb_dqopt(sb)->dqio_mutex); info->dqi_maxblimit = 0x7fffffffffffffffLL; info->dqi_maxilimit = 0x7fffffffffffffffLL; oinfo = kmalloc(sizeof(struct ocfs2_mem_dqinfo), GFP_NOFS); @@ -733,6 +736,7 @@ static int ocfs2_local_read_info(struct super_block *sb, int type) goto out_err; } + mutex_lock(&sb_dqopt(sb)->dqio_mutex); return 0; out_err: if (oinfo) { @@ -746,6 +750,7 @@ out_err: kfree(oinfo); } brelse(bh); + mutex_lock(&sb_dqopt(sb)->dqio_mutex); return -1; } -- 1.6.0.2
Jan Kara
2009-Jun-02 12:24 UTC
[Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock with quotas in ocfs2_setattr()
We called vfs_dq_transfer() with global quota file lock held. This can lead to deadlocks as if vfs_dq_transfer() has to allocate new quota structure, it calls ocfs2_dquot_acquire() which tries to get quota file lock again and this can block if another node requested the lock in the mean time. Since we have to call vfs_dq_transfer() with transaction already started and quota file lock ranks above the transaction start, we cannot just rely on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock if they need it. We fix the problem by acquiring pointers to all quota structures needed by vfs_dq_transfer() already before calling the function. By this we are sure that all quota structures are properly allocated and they can be freed only after we drop references to them. Thus we don't need quota file lock anywhere inside vfs_dq_transfer(). Signed-off-by: Jan Kara <jack at suse.cz> --- fs/ocfs2/file.c | 53 ++++++++++++++++++++++++++++++----------------------- 1 files changed, 30 insertions(+), 23 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index c2a87c8..1a96cac 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -894,9 +894,9 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) struct ocfs2_super *osb = OCFS2_SB(sb); struct buffer_head *bh = NULL; handle_t *handle = NULL; - int locked[MAXQUOTAS] = {0, 0}; - int credits, qtype; - struct ocfs2_mem_dqinfo *oinfo; + int qtype; + struct dquot *transfer_from[MAXQUOTAS] = { }; + struct dquot *transfer_to[MAXQUOTAS] = { }; mlog_entry("(0x%p, '%.*s')\n", dentry, dentry->d_name.len, dentry->d_name.name); @@ -969,30 +969,37 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) if ((attr->ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid) || (attr->ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid)) { - credits = OCFS2_INODE_UPDATE_CREDITS; + /* + * Gather pointers to quota structures so that allocation / + * freeing of quota structures happens here and not inside + * vfs_dq_transfer() where we have problems with lock ordering + */ if (attr->ia_valid & ATTR_UID && attr->ia_uid != inode->i_uid && OCFS2_HAS_RO_COMPAT_FEATURE(sb, OCFS2_FEATURE_RO_COMPAT_USRQUOTA)) { - oinfo = sb_dqinfo(sb, USRQUOTA)->dqi_priv; - status = ocfs2_lock_global_qf(oinfo, 1); - if (status < 0) + transfer_to[USRQUOTA] = dqget(sb, attr->ia_uid, + USRQUOTA); + transfer_from[USRQUOTA] = dqget(sb, inode->i_uid, + USRQUOTA); + if (!transfer_to[USRQUOTA] || !transfer_from[USRQUOTA]) { + status = -ESRCH; goto bail_unlock; - credits += ocfs2_calc_qinit_credits(sb, USRQUOTA) + - ocfs2_calc_qdel_credits(sb, USRQUOTA); - locked[USRQUOTA] = 1; + } } if (attr->ia_valid & ATTR_GID && attr->ia_gid != inode->i_gid && OCFS2_HAS_RO_COMPAT_FEATURE(sb, OCFS2_FEATURE_RO_COMPAT_GRPQUOTA)) { - oinfo = sb_dqinfo(sb, GRPQUOTA)->dqi_priv; - status = ocfs2_lock_global_qf(oinfo, 1); - if (status < 0) + transfer_to[GRPQUOTA] = dqget(sb, attr->ia_gid, + GRPQUOTA); + transfer_from[GRPQUOTA] = dqget(sb, inode->i_gid, + GRPQUOTA); + if (!transfer_to[GRPQUOTA] || !transfer_from[GRPQUOTA]) { + status = -ESRCH; goto bail_unlock; - credits += ocfs2_calc_qinit_credits(sb, GRPQUOTA) + - ocfs2_calc_qdel_credits(sb, GRPQUOTA); - locked[GRPQUOTA] = 1; + } } - handle = ocfs2_start_trans(osb, credits); + handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS + + 2 * ocfs2_quota_trans_credits(sb)); if (IS_ERR(handle)) { status = PTR_ERR(handle); mlog_errno(status); @@ -1030,12 +1037,6 @@ int ocfs2_setattr(struct dentry *dentry, struct iattr *attr) bail_commit: ocfs2_commit_trans(osb, handle); bail_unlock: - for (qtype = 0; qtype < MAXQUOTAS; qtype++) { - if (!locked[qtype]) - continue; - oinfo = sb_dqinfo(sb, qtype)->dqi_priv; - ocfs2_unlock_global_qf(oinfo, 1); - } ocfs2_inode_unlock(inode, 1); bail_unlock_rw: if (size_change) @@ -1043,6 +1044,12 @@ bail_unlock_rw: bail: brelse(bh); + /* Release quota pointers in case we acquired them */ + for (qtype = 0; qtype < MAXQUOTAS; qtype++) { + dqput(transfer_to[qtype]); + dqput(transfer_from[qtype]); + } + if (!status && attr->ia_valid & ATTR_MODE) { status = ocfs2_acl_chmod(inode); if (status < 0) -- 1.6.0.2
Jan Kara
2009-Jun-02 12:24 UTC
[Ocfs2-devel] [PATCH] ocfs2: Fix possible deadlock in quota recovery
In ocfs2_finish_quota_recovery() we acquired global quota file lock and started recovering local quota file. During this process we need to get quota structures, which calls ocfs2_dquot_acquire() which gets global quota file lock again. This second lock can block in case some other node has requested the quota file lock in the mean time. Fix the problem by moving quota file locking down into the function where it is really needed. Then dqget() or dqput() won't be called with the lock held. Signed-off-by: Jan Kara <jack at suse.cz> --- fs/ocfs2/quota_local.c | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/ocfs2/quota_local.c b/fs/ocfs2/quota_local.c index 71cf410..5a460fa 100644 --- a/fs/ocfs2/quota_local.c +++ b/fs/ocfs2/quota_local.c @@ -444,10 +444,6 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode, mlog_entry("ino=%lu type=%u", (unsigned long)lqinode->i_ino, type); - status = ocfs2_lock_global_qf(oinfo, 1); - if (status < 0) - goto out; - list_for_each_entry_safe(rchunk, next, &(rec->r_list[type]), rc_list) { chunk = rchunk->rc_chunk; hbh = NULL; @@ -480,12 +476,18 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode, type); goto out_put_bh; } + status = ocfs2_lock_global_qf(oinfo, 1); + if (status < 0) { + mlog_errno(status); + goto out_put_dquot; + } + handle = ocfs2_start_trans(OCFS2_SB(sb), OCFS2_QSYNC_CREDITS); if (IS_ERR(handle)) { status = PTR_ERR(handle); mlog_errno(status); - goto out_put_dquot; + goto out_drop_lock; } mutex_lock(&sb_dqopt(sb)->dqio_mutex); spin_lock(&dq_data_lock); @@ -523,6 +525,8 @@ static int ocfs2_recover_local_quota_file(struct inode *lqinode, out_commit: mutex_unlock(&sb_dqopt(sb)->dqio_mutex); ocfs2_commit_trans(OCFS2_SB(sb), handle); +out_drop_lock: + ocfs2_unlock_global_qf(oinfo, 1); out_put_dquot: dqput(dquot); out_put_bh: @@ -537,8 +541,6 @@ out_put_bh: if (status < 0) break; } - ocfs2_unlock_global_qf(oinfo, 1); -out: if (status < 0) free_recovery_list(&(rec->r_list[type])); mlog_exit(status); -- 1.6.0.2
Jan Kara
2009-Jun-02 12:24 UTC
[Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories
We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin(). So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir() to also use this lock ordering. Signed-off-by: Jan Kara <jack at suse.cz> --- fs/ocfs2/dir.c | 21 ++++++++++----------- 1 files changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c index c575230..b358f3b 100644 --- a/fs/ocfs2/dir.c +++ b/fs/ocfs2/dir.c @@ -2900,6 +2900,8 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh, alloc = ocfs2_clusters_for_bytes(sb, bytes); dx_alloc = 0; + down_write(&oi->ip_alloc_sem); + if (ocfs2_supports_indexed_dirs(osb)) { credits += ocfs2_add_dir_index_credits(sb); @@ -2940,8 +2942,6 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh, goto out; } - down_write(&oi->ip_alloc_sem); - /* * Prepare for worst case allocation scenario of two separate * extents in the unindexed tree. @@ -2953,7 +2953,7 @@ static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh, if (IS_ERR(handle)) { ret = PTR_ERR(handle); mlog_errno(ret); - goto out_sem; + goto out; } if (vfs_dq_alloc_space_nodirty(dir, @@ -3172,10 +3172,8 @@ out_commit: ocfs2_commit_trans(osb, handle); -out_sem: - up_write(&oi->ip_alloc_sem); - out: + up_write(&oi->ip_alloc_sem); if (data_ac) ocfs2_free_alloc_context(data_ac); if (meta_ac) @@ -3322,11 +3320,15 @@ static int ocfs2_extend_dir(struct ocfs2_super *osb, brelse(new_bh); new_bh = NULL; + down_write(&OCFS2_I(dir)->ip_alloc_sem); + drop_alloc_sem = 1; dir_i_size = i_size_read(dir); credits = OCFS2_SIMPLE_DIR_EXTEND_CREDITS; goto do_extend; } + down_write(&OCFS2_I(dir)->ip_alloc_sem); + drop_alloc_sem = 1; dir_i_size = i_size_read(dir); mlog(0, "extending dir %llu (i_size = %lld)\n", (unsigned long long)OCFS2_I(dir)->ip_blkno, dir_i_size); @@ -3370,9 +3372,6 @@ do_extend: credits++; /* For attaching the new dirent block to the * dx_root */ - down_write(&OCFS2_I(dir)->ip_alloc_sem); - drop_alloc_sem = 1; - handle = ocfs2_start_trans(osb, credits); if (IS_ERR(handle)) { status = PTR_ERR(handle); @@ -3435,10 +3434,10 @@ bail_bh: *new_de_bh = new_bh; get_bh(*new_de_bh); bail: - if (drop_alloc_sem) - up_write(&OCFS2_I(dir)->ip_alloc_sem); if (handle) ocfs2_commit_trans(osb, handle); + if (drop_alloc_sem) + up_write(&OCFS2_I(dir)->ip_alloc_sem); if (data_ac) ocfs2_free_alloc_context(data_ac); -- 1.6.0.2
Jan Kara
2009-Jun-02 12:24 UTC
[Ocfs2-devel] [PATCH] vfs: Set special lockdep map for dirs only if not set by fs
Some filesystems need to set lockdep map for i_mutex differently for different directories. For example OCFS2 has system directories (for orphan inode tracking and for gathering all system files like journal or quota files into a single place) which have different locking locking rules than standard directories. For a filesystem setting lockdep map is naturaly done when the inode is read but we have to modify unlock_new_inode() not to overwrite the lockdep map the filesystem has set. CC: peterz at infradead.org CC: mingo at redhat.com Signed-off-by: Jan Kara <jack at suse.cz> --- fs/inode.c | 17 +++++++++++------ include/linux/lockdep.h | 15 +++++++++++++++ 2 files changed, 26 insertions(+), 6 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 6ad14a1..44e1848 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -654,12 +654,17 @@ void unlock_new_inode(struct inode *inode) if (inode->i_mode & S_IFDIR) { struct file_system_type *type = inode->i_sb->s_type; - /* - * ensure nobody is actually holding i_mutex - */ - mutex_destroy(&inode->i_mutex); - mutex_init(&inode->i_mutex); - lockdep_set_class(&inode->i_mutex, &type->i_mutex_dir_key); + /* Set new key only if filesystem hasn't already changed it */ + if (!lockdep_match_class(&inode->i_mutex, + &type->i_mutex_key)) { + /* + * ensure nobody is actually holding i_mutex + */ + mutex_destroy(&inode->i_mutex); + mutex_init(&inode->i_mutex); + lockdep_set_class(&inode->i_mutex, + &type->i_mutex_dir_key); + } } #endif /* diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index da5a5a1..b25d1b5 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -258,6 +258,16 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name, #define lockdep_set_subclass(lock, sub) \ lockdep_init_map(&(lock)->dep_map, #lock, \ (lock)->dep_map.key, sub) +/* + * Compare locking classes + */ +#define lockdep_match_class(lock, key) lockdep_match_key(&(lock)->dep_map, key) + +static inline int lockdep_match_key(struct lockdep_map *lock, + struct lock_class_key *key) +{ + return lock->key == key; +} /* * Acquire a lock. @@ -326,6 +336,11 @@ static inline void lockdep_on(void) #define lockdep_set_class_and_subclass(lock, key, sub) \ do { (void)(key); } while (0) #define lockdep_set_subclass(lock, sub) do { } while (0) +/* + * We don't define lockdep_match_class() and lockdep_match_key() for !LOCKDEP + * case since the result is not well defined and the caller should rather + * #ifdef the call himself. + */ # define INIT_LOCKDEP # define lockdep_reset() do { debug_locks = 1; } while (0) -- 1.6.0.2
Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack at suse.cz> --- fs/ocfs2/dlmglue.c | 99 ++++++++++++++++++++++++++++++++++++++++++++-------- fs/ocfs2/dlmglue.h | 12 +++++-- fs/ocfs2/inode.c | 11 ++++++ fs/ocfs2/inode.h | 8 ++++ fs/ocfs2/namei.c | 15 +++++--- fs/ocfs2/ocfs2.h | 4 ++ fs/ocfs2/sysfile.c | 17 +++++++++ 7 files changed, 142 insertions(+), 24 deletions(-) diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index e15fc7d..5cf731f 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -92,6 +92,9 @@ struct ocfs2_unblock_ctl { enum ocfs2_unblock_action unblock_action; }; +/* Lockdep class keys */ +struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES]; + static int ocfs2_check_meta_downconvert(struct ocfs2_lock_res *lockres, int new_level); static void ocfs2_set_meta_lvb(struct ocfs2_lock_res *lockres); @@ -313,9 +316,16 @@ static int ocfs2_lock_create(struct ocfs2_super *osb, u32 dlm_flags); static inline int ocfs2_may_continue_on_blocked_lock(struct ocfs2_lock_res *lockres, int wanted); -static void ocfs2_cluster_unlock(struct ocfs2_super *osb, - struct ocfs2_lock_res *lockres, - int level); +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level, unsigned long ip); +static inline void ocfs2_cluster_unlock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level) +{ + __ocfs2_cluster_unlock(osb, lockres, level, _RET_IP_); +} + static inline void ocfs2_generic_handle_downconvert_action(struct ocfs2_lock_res *lockres); static inline void ocfs2_generic_handle_convert_action(struct ocfs2_lock_res *lockres); static inline void ocfs2_generic_handle_attach_action(struct ocfs2_lock_res *lockres); @@ -485,6 +495,13 @@ static void ocfs2_lock_res_init_common(struct ocfs2_super *osb, ocfs2_add_lockres_tracking(res, osb->osb_dlm_debug); ocfs2_init_lock_stats(res); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + if (type != OCFS2_LOCK_TYPE_OPEN) + lockdep_init_map(&res->l_lockdep_map, ocfs2_lock_type_strings[type], + &lockdep_keys[type], 0); + else + res->l_lockdep_map.key = NULL; +#endif } void ocfs2_lock_res_init_once(struct ocfs2_lock_res *res) @@ -1239,11 +1256,21 @@ static int ocfs2_wait_for_mask_interruptible(struct ocfs2_mask_waiter *mw, return ret; } -static int ocfs2_cluster_lock(struct ocfs2_super *osb, - struct ocfs2_lock_res *lockres, - int level, - u32 lkm_flags, - int arg_flags) +#ifdef CONFIG_DEBUG_LOCK_ALLOC +static int __ocfs2_cluster_lock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level, + u32 lkm_flags, + int arg_flags, + int l_subclass, + unsigned long ip) +#else +static int __ocfs2_cluster_lock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level, + u32 lkm_flags, + int arg_flags) +#endif { struct ocfs2_mask_waiter mw; int wait, catch_signals = !(osb->s_mount_opt & OCFS2_MOUNT_NOINTR); @@ -1386,13 +1413,45 @@ out: } ocfs2_update_lock_stats(lockres, level, &mw, ret); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + if (!ret && lockres->l_lockdep_map.key != NULL) { + if (level == DLM_LOCK_PR) + rwsem_acquire_read(&lockres->l_lockdep_map, l_subclass, + !!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip); + else + rwsem_acquire(&lockres->l_lockdep_map, l_subclass, + !!(arg_flags & OCFS2_META_LOCK_NOQUEUE), ip); + } +#endif mlog_exit(ret); return ret; } -static void ocfs2_cluster_unlock(struct ocfs2_super *osb, - struct ocfs2_lock_res *lockres, - int level) +static inline int ocfs2_cluster_lock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level, + u32 lkm_flags, + int arg_flags) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags, arg_flags, + 0, _RET_IP_); +#else + return __ocfs2_cluster_lock(osb, lockres, level, lkm_flags, arg_flags); +#endif +} + + +#ifdef CONFIG_DEBUG_LOCK_ALLOC +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level, + unsigned long ip) +#else +static void __ocfs2_cluster_unlock(struct ocfs2_super *osb, + struct ocfs2_lock_res *lockres, + int level) +#endif { unsigned long flags; @@ -1401,6 +1460,10 @@ static void ocfs2_cluster_unlock(struct ocfs2_super *osb, ocfs2_dec_holders(lockres, level); ocfs2_downconvert_on_unlock(osb, lockres); spin_unlock_irqrestore(&lockres->l_lock, flags); +#ifdef CONFIG_DEBUG_LOCK_ALLOC + if (lockres->l_lockdep_map.key != NULL) + rwsem_release(&lockres->l_lockdep_map, 1, ip); +#endif mlog_exit_void(); } @@ -2145,10 +2208,11 @@ static int ocfs2_assign_bh(struct inode *inode, * returns < 0 error if the callback will never be called, otherwise * the result of the lock will be communicated via the callback. */ -int ocfs2_inode_lock_full(struct inode *inode, - struct buffer_head **ret_bh, - int ex, - int arg_flags) +int ocfs2_inode_lock_full_nested(struct inode *inode, + struct buffer_head **ret_bh, + int ex, + int arg_flags, + int subclass) { int status, level, acquired; u32 dlm_flags; @@ -2186,7 +2250,12 @@ int ocfs2_inode_lock_full(struct inode *inode, if (arg_flags & OCFS2_META_LOCK_NOQUEUE) dlm_flags |= DLM_LKF_NOQUEUE; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + status = __ocfs2_cluster_lock(osb, lockres, level, dlm_flags, + arg_flags, subclass, _RET_IP_); +#else status = ocfs2_cluster_lock(osb, lockres, level, dlm_flags, arg_flags); +#endif if (status < 0) { if (status != -EAGAIN && status != -EIOCBRETRY) mlog_errno(status); diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h index e1fd572..9bb9eb5 100644 --- a/fs/ocfs2/dlmglue.h +++ b/fs/ocfs2/dlmglue.h @@ -96,17 +96,23 @@ void ocfs2_open_unlock(struct inode *inode); int ocfs2_inode_lock_atime(struct inode *inode, struct vfsmount *vfsmnt, int *level); -int ocfs2_inode_lock_full(struct inode *inode, +int ocfs2_inode_lock_full_nested(struct inode *inode, struct buffer_head **ret_bh, int ex, - int arg_flags); + int arg_flags, + int subclass); int ocfs2_inode_lock_with_page(struct inode *inode, struct buffer_head **ret_bh, int ex, struct page *page); +/* Variants without special locking class or flags */ +#define ocfs2_inode_lock_full(i, r, e, f)\ + ocfs2_inode_lock_full_nested(i, r, e, f, 0) +#define ocfs2_inode_lock_nested(i, b, e, s)\ + ocfs2_inode_lock_full_nested(i, b, e, 0, s) /* 99% of the time we don't want to supply any additional flags -- * those are for very specific cases only. */ -#define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full(i, b, e, 0) +#define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, 0) void ocfs2_inode_unlock(struct inode *inode, int ex); int ocfs2_super_lock(struct ocfs2_super *osb, diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 10e1fa8..4dc8890 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -215,6 +215,8 @@ bail: static int ocfs2_init_locked_inode(struct inode *inode, void *opaque) { struct ocfs2_find_inode_args *args = opaque; + static struct lock_class_key ocfs2_quota_ip_alloc_sem_key, + ocfs2_file_ip_alloc_sem_key; mlog_entry("inode = %p, opaque = %p\n", inode, opaque); @@ -223,6 +225,15 @@ static int ocfs2_init_locked_inode(struct inode *inode, void *opaque) if (args->fi_sysfile_type != 0) lockdep_set_class(&inode->i_mutex, &ocfs2_sysfile_lock_key[args->fi_sysfile_type]); + if (args->fi_sysfile_type == USER_QUOTA_SYSTEM_INODE || + args->fi_sysfile_type == GROUP_QUOTA_SYSTEM_INODE || + args->fi_sysfile_type == LOCAL_USER_QUOTA_SYSTEM_INODE || + args->fi_sysfile_type == LOCAL_GROUP_QUOTA_SYSTEM_INODE) + lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem, + &ocfs2_quota_ip_alloc_sem_key); + else + lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem, + &ocfs2_file_ip_alloc_sem_key); mlog_exit(0); return 0; diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h index ea71525..7096516 100644 --- a/fs/ocfs2/inode.h +++ b/fs/ocfs2/inode.h @@ -109,6 +109,14 @@ struct ocfs2_inode_info /* Indicates that the metadata cache should be used as an array. */ #define OCFS2_INODE_CACHE_INLINE 0x00000080 +/* Locking subclasses of inode cluster lock */ +enum { + OI_LS_NORMAL, + OI_LS_PARENT, + OI_LS_RENAME1, + OI_LS_RENAME2, +}; + static inline struct ocfs2_inode_info *OCFS2_I(struct inode *inode) { return container_of(inode, struct ocfs2_inode_info, vfs_inode); diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 33464c6..8601f93 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -118,7 +118,7 @@ static struct dentry *ocfs2_lookup(struct inode *dir, struct dentry *dentry, mlog(0, "find name %.*s in directory %llu\n", dentry->d_name.len, dentry->d_name.name, (unsigned long long)OCFS2_I(dir)->ip_blkno); - status = ocfs2_inode_lock(dir, NULL, 0); + status = ocfs2_inode_lock_nested(dir, NULL, 0, OI_LS_PARENT); if (status < 0) { if (status != -ENOENT) mlog_errno(status); @@ -636,7 +636,7 @@ static int ocfs2_link(struct dentry *old_dentry, if (S_ISDIR(inode->i_mode)) return -EPERM; - err = ocfs2_inode_lock(dir, &parent_fe_bh, 1); + err = ocfs2_inode_lock_nested(dir, &parent_fe_bh, 1, OI_LS_PARENT); if (err < 0) { if (err != -ENOENT) mlog_errno(err); @@ -800,7 +800,8 @@ static int ocfs2_unlink(struct inode *dir, return -EPERM; } - status = ocfs2_inode_lock(dir, &parent_node_bh, 1); + status = ocfs2_inode_lock_nested(dir, &parent_node_bh, 1, + OI_LS_PARENT); if (status < 0) { if (status != -ENOENT) mlog_errno(status); @@ -978,7 +979,8 @@ static int ocfs2_double_lock(struct ocfs2_super *osb, inode1 = tmpinode; } /* lock id2 */ - status = ocfs2_inode_lock(inode2, bh2, 1); + status = ocfs2_inode_lock_nested(inode2, bh2, 1, + OI_LS_RENAME1); if (status < 0) { if (status != -ENOENT) mlog_errno(status); @@ -987,7 +989,7 @@ static int ocfs2_double_lock(struct ocfs2_super *osb, } /* lock id1 */ - status = ocfs2_inode_lock(inode1, bh1, 1); + status = ocfs2_inode_lock_nested(inode1, bh1, 1, OI_LS_RENAME2); if (status < 0) { /* * An error return must mean that no cluster locks @@ -1103,7 +1105,8 @@ static int ocfs2_rename(struct inode *old_dir, * won't have to concurrently downconvert the inode and the * dentry locks. */ - status = ocfs2_inode_lock(old_inode, &old_inode_bh, 1); + status = ocfs2_inode_lock_nested(old_inode, &old_inode_bh, 1, + OI_LS_PARENT); if (status < 0) { if (status != -ENOENT) mlog_errno(status); diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index 1386281..99e89ba 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -34,6 +34,7 @@ #include <linux/workqueue.h> #include <linux/kref.h> #include <linux/mutex.h> +#include <linux/lockdep.h> #ifndef CONFIG_OCFS2_COMPAT_JBD # include <linux/jbd2.h> #else @@ -149,6 +150,9 @@ struct ocfs2_lock_res { unsigned int l_lock_max_exmode; /* Max wait for EX */ unsigned int l_lock_refresh; /* Disk refreshes */ #endif +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map l_lockdep_map; +#endif }; struct ocfs2_dlm_debug { diff --git a/fs/ocfs2/sysfile.c b/fs/ocfs2/sysfile.c index ab713eb..6f53f5e 100644 --- a/fs/ocfs2/sysfile.c +++ b/fs/ocfs2/sysfile.c @@ -50,6 +50,8 @@ static inline int is_in_system_inode_array(struct ocfs2_super *osb, int type, u32 slot); +static struct lock_class_key ocfs2_sysfile_cluster_lock_key[NUM_SYSTEM_INODES]; + static inline int is_global_system_inode(int type) { return type >= OCFS2_FIRST_ONLINE_SYSTEM_INODE && @@ -118,6 +120,21 @@ static struct inode * _ocfs2_get_system_file_inode(struct ocfs2_super *osb, inode = NULL; goto bail; } +#ifdef CONFIG_DEBUG_LOCK_ALLOC + if (type == LOCAL_USER_QUOTA_SYSTEM_INODE || + type == LOCAL_GROUP_QUOTA_SYSTEM_INODE || + type == JOURNAL_SYSTEM_INODE) { + /* Ignore inode lock on these inodes as the lock does not + * really belong to any process and lockdep cannot handle + * that */ + OCFS2_I(inode)->ip_inode_lockres.l_lockdep_map.key = NULL; + } else { + lockdep_init_map(&OCFS2_I(inode)->ip_inode_lockres. + l_lockdep_map, + ocfs2_system_inodes[type].si_name, + &ocfs2_sysfile_cluster_lock_key[type], 0); + } +#endif bail: return inode; -- 1.6.0.2
Joel Becker
2009-Jun-02 18:43 UTC
[Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
On Tue, Jun 02, 2009 at 02:23:58PM +0200, Jan Kara wrote:> BTW: Currently lockdep complains about some configfs issue and turns itself > off so that's why I didn't get to finding more deadlocks ;)That's in linux-next of the configfs tree :-) Joel -- "Hell is oneself, hell is alone, the other figures in it, merely projections." - T. S. Eliot Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127
Mark Fasheh
2009-Jun-03 21:05 UTC
[Ocfs2-devel] [PATCH] ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories
On Tue, Jun 02, 2009 at 02:24:03PM +0200, Jan Kara wrote:> We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin(). > So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir() > to also use this lock ordering. > > Signed-off-by: Jan Kara <jack at suse.cz>Acked-by: Mark Fasheh <mfasheh at suse.com> -- Mark Fasheh
Joel Becker
2009-Jun-04 02:26 UTC
[Ocfs2-devel] [PATCH 0/7] [RESEND] Fix some deadlocks in quota code and implement lockdep for cluster locks
On Tue, Jun 02, 2009 at 02:23:58PM +0200, Jan Kara wrote:> I'm resending this patch series. It's rediffed against linux-next branch of > Joel's git tree. The first four patches are obvious fixes of deadlocks in quota > code and should go in as soon as possible. The other three patches implement > lockdep support for OCFS2 cluster locks. So you can have a look whether the > code make sence to you and possibly merge them. They should be NOP when lockdep > is disabled and help a lot when debugging various locking problems (with these > patches I found the problems fixed in the beginning of the series). > BTW: Currently lockdep complains about some configfs issue and turns itself > off so that's why I didn't get to finding more deadlocks ;)The fixes are now in the merge-window branch of ocfs2.git. They also live on the quota-fixes topic branch. The lockdep changes still need some marinating. Joel -- Life's Little Instruction Book #274 "Leave everything a little better than you found it." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127