Srinivas Eeda
2010-Mar-22 23:50 UTC
[Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
In dlm_assert_master_handler, spinlock is released little sooner which creates a window for two nodes to race during mastery. In a scenario where node A had a head start during lock mastery and dlm spinlock is just released on node B in dlm_assert_master_handler. Right then a process on node B started to master the resource. It finds the mle but doesn't find lockres. Since mle *master* is not set it creates a lockres and waits for the mastery and doesn't send a master request. dlm_assert_master_handler doesn't know about this and doesn't set have_lockres_ref(doesn't set DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a hole that results in loss of refmap bit on the master node. Signed-off-by: Srinivas Eeda <srinivas.eeda at oracle.com> --- fs/ocfs2/dlm/dlmmaster.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 83bcaf2..6e694b6 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -1875,7 +1875,6 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data, ok: spin_unlock(&res->spinlock); } - spin_unlock(&dlm->spinlock); // mlog(0, "woo! got an assert_master from node %u!\n", // assert->node_idx); @@ -1926,7 +1925,6 @@ ok: /* master is known, detach if not already detached. * ensures that only one assert_master call will happen * on this mle. */ - spin_lock(&dlm->spinlock); spin_lock(&dlm->master_lock); rr = atomic_read(&mle->mle_refs.refcount); @@ -1959,7 +1957,6 @@ ok: __dlm_put_mle(mle); } spin_unlock(&dlm->master_lock); - spin_unlock(&dlm->spinlock); } else if (res) { if (res->owner != assert->node_idx) { mlog(0, "assert_master from %u, but current " @@ -1967,6 +1964,7 @@ ok: res->owner, namelen, name); } } + spin_unlock(&dlm->spinlock); done: ret = 0; -- 1.5.6.5
Sunil Mushran
2010-Mar-23 00:41 UTC
[Ocfs2-devel] [PATCH 1/1] dlm: fix a race in lockres mastery
================================================================In o2dlm, the resource master maintains a (ref)map of all nodes that are informed of the fact that that node masters that resource. This is done to prevent the master from purging the resource before the other node can create a lock. This patch plugs a race between the mastery handler thread and the mastery thread that allows the node to discover the master of the resource without informing the master node. Fixes ossbz#1012 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012 ================================================================ Add my sob and change the comment to the above. Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com> Srinivas Eeda wrote:> In dlm_assert_master_handler, spinlock is released little sooner which creates > a window for two nodes to race during mastery. > > In a scenario where node A had a head start during lock mastery and dlm > spinlock is just released on node B in dlm_assert_master_handler. Right then > a process on node B started to master the resource. It finds the mle but > doesn't find lockres. Since mle *master* is not set it creates a lockres and > waits for the mastery and doesn't send a master request. > > dlm_assert_master_handler doesn't know about this and doesn't set > have_lockres_ref(doesn't set DLM_ASSERT_RESPONSE_MASTERY_REF) which creates a > hole that results in loss of refmap bit on the master node. > > Signed-off-by: Srinivas Eeda <srinivas.eeda at oracle.com> > --- > fs/ocfs2/dlm/dlmmaster.c | 4 +--- > 1 files changed, 1 insertions(+), 3 deletions(-) > > diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c > index 83bcaf2..6e694b6 100644 > --- a/fs/ocfs2/dlm/dlmmaster.c > +++ b/fs/ocfs2/dlm/dlmmaster.c > @@ -1875,7 +1875,6 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data, > ok: > spin_unlock(&res->spinlock); > } > - spin_unlock(&dlm->spinlock); > > // mlog(0, "woo! got an assert_master from node %u!\n", > // assert->node_idx); > @@ -1926,7 +1925,6 @@ ok: > /* master is known, detach if not already detached. > * ensures that only one assert_master call will happen > * on this mle. */ > - spin_lock(&dlm->spinlock); > spin_lock(&dlm->master_lock); > > rr = atomic_read(&mle->mle_refs.refcount); > @@ -1959,7 +1957,6 @@ ok: > __dlm_put_mle(mle); > } > spin_unlock(&dlm->master_lock); > - spin_unlock(&dlm->spinlock); > } else if (res) { > if (res->owner != assert->node_idx) { > mlog(0, "assert_master from %u, but current " > @@ -1967,6 +1964,7 @@ ok: > res->owner, namelen, name); > } > } > + spin_unlock(&dlm->spinlock); > > done: > ret = 0;