Wengang Wang
2010-Jun-16 06:52 UTC
[Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
When we need to take both dlm_domain_lock and dlm->spinlock, we should take them in order of: dlm_domain_lock then dlm->spinlock. There is pathes disobey this order. That is calling dlm_lockres_put with dlm-> spinlock held. dlm_lockres_put() finally calls dlm_put() which take dlm_domain_lock. The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from dlm_put(). dlm_ctxt_release() is only called on the release of the last reference. Any path should not be holding dlm->spinlock when dropping the "last" reference. Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com> --- fs/ocfs2/dlm/dlmdomain.c | 12 ++++-------- 1 files changed, 4 insertions(+), 8 deletions(-) diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index ab82add..754baf2 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref) dlm = container_of(kref, struct dlm_ctxt, dlm_refs); + if (spin_is_locked(&dlm->spinlock)) + BUG(); BUG_ON(dlm->num_joins); BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED); + spin_lock(&dlm_domain_lock); /* we may still be in the list if we hit an error during join. */ list_del_init(&dlm->list); - spin_unlock(&dlm_domain_lock); - mlog(0, "freeing memory from domain %s\n", dlm->name); - wake_up(&dlm_domain_events); - + mlog(0, "freeing memory from domain %s\n", dlm->name); dlm_free_ctxt_mem(dlm); - - spin_lock(&dlm_domain_lock); } void dlm_put(struct dlm_ctxt *dlm) { - spin_lock(&dlm_domain_lock); kref_put(&dlm->dlm_refs, dlm_ctxt_release); - spin_unlock(&dlm_domain_lock); } static void __dlm_get(struct dlm_ctxt *dlm) -- 1.6.6.1
Wengang Wang
2010-Jun-21 05:31 UTC
[Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
Why atomic operations on dlm_refs need spinlock's protect? /* NOTE: Next three are protected by dlm_domain_lock */ struct kref dlm_refs; enum dlm_ctxt_state dlm_state; unsigned int num_joins; regards, wengang. On 10-06-16 14:52, Wengang Wang wrote:> When we need to take both dlm_domain_lock and dlm->spinlock, we should take > them in order of: > dlm_domain_lock then dlm->spinlock. > > There is pathes disobey this order. That is calling dlm_lockres_put with dlm-> > spinlock held. dlm_lockres_put() finally calls dlm_put() which take > dlm_domain_lock. > > The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from > dlm_put(). dlm_ctxt_release() is only called on the release of the last > reference. Any path should not be holding dlm->spinlock when dropping the "last" > reference. > > Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com> > --- > fs/ocfs2/dlm/dlmdomain.c | 12 ++++-------- > 1 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c > index ab82add..754baf2 100644 > --- a/fs/ocfs2/dlm/dlmdomain.c > +++ b/fs/ocfs2/dlm/dlmdomain.c > @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref) > > dlm = container_of(kref, struct dlm_ctxt, dlm_refs); > > + if (spin_is_locked(&dlm->spinlock)) > + BUG(); > BUG_ON(dlm->num_joins); > BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED); > > + spin_lock(&dlm_domain_lock); > /* we may still be in the list if we hit an error during join. */ > list_del_init(&dlm->list); > - > spin_unlock(&dlm_domain_lock); > > - mlog(0, "freeing memory from domain %s\n", dlm->name); > - > wake_up(&dlm_domain_events); > - > + mlog(0, "freeing memory from domain %s\n", dlm->name); > dlm_free_ctxt_mem(dlm); > - > - spin_lock(&dlm_domain_lock); > } > > void dlm_put(struct dlm_ctxt *dlm) > { > - spin_lock(&dlm_domain_lock); > kref_put(&dlm->dlm_refs, dlm_ctxt_release); > - spin_unlock(&dlm_domain_lock); > } > > static void __dlm_get(struct dlm_ctxt *dlm) > -- > 1.6.6.1 > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Wengang Wang
2010-Jun-21 13:20 UTC
[Ocfs2-devel] [PATCH] ocfs2/dlm: remove potential deadlock
This patch is not good, please ignore it. I will post a revised one. regards, wengang. On 10-06-16 14:52, Wengang Wang wrote:> When we need to take both dlm_domain_lock and dlm->spinlock, we should take > them in order of: > dlm_domain_lock then dlm->spinlock. > > There is pathes disobey this order. That is calling dlm_lockres_put with dlm-> > spinlock held. dlm_lockres_put() finally calls dlm_put() which take > dlm_domain_lock. > > The fix is moving the locking on dlm_domain_lock to dlm_ctxt_release() from > dlm_put(). dlm_ctxt_release() is only called on the release of the last > reference. Any path should not be holding dlm->spinlock when dropping the "last" > reference. > > Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com> > --- > fs/ocfs2/dlm/dlmdomain.c | 12 ++++-------- > 1 files changed, 4 insertions(+), 8 deletions(-) > > diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c > index ab82add..754baf2 100644 > --- a/fs/ocfs2/dlm/dlmdomain.c > +++ b/fs/ocfs2/dlm/dlmdomain.c > @@ -321,28 +321,24 @@ static void dlm_ctxt_release(struct kref *kref) > > dlm = container_of(kref, struct dlm_ctxt, dlm_refs); > > + if (spin_is_locked(&dlm->spinlock)) > + BUG(); > BUG_ON(dlm->num_joins); > BUG_ON(dlm->dlm_state == DLM_CTXT_JOINED); > > + spin_lock(&dlm_domain_lock); > /* we may still be in the list if we hit an error during join. */ > list_del_init(&dlm->list); > - > spin_unlock(&dlm_domain_lock); > > - mlog(0, "freeing memory from domain %s\n", dlm->name); > - > wake_up(&dlm_domain_events); > - > + mlog(0, "freeing memory from domain %s\n", dlm->name); > dlm_free_ctxt_mem(dlm); > - > - spin_lock(&dlm_domain_lock); > } > > void dlm_put(struct dlm_ctxt *dlm) > { > - spin_lock(&dlm_domain_lock); > kref_put(&dlm->dlm_refs, dlm_ctxt_release); > - spin_unlock(&dlm_domain_lock); > } > > static void __dlm_get(struct dlm_ctxt *dlm) > -- > 1.6.6.1 > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-devel