piaojun
2017-Nov-01 07:56 UTC
[Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lockres
Hi Changwei, thanks for reviewing, and I think waiting for recoverying done before migrating seems another solution, but I wonder if new problems will be invoked as following comments. thanks, Jun On 2017/11/1 15:13, Changwei Ge wrote:> Hi Jun, > > I probably get your point. > > You mean that dlm finds no lock resource to be migrated and no more lock > resource is managed by its hash table. After that a node dies all of a > sudden and the dead node is put into dlm's recovery map, right?that is it.> Furthermore, a lock resource is migrated from other nodes and local node > has already asserted master to them. > > If so, I want to suggest a easier way to solve it. > We don't have to add a new flag to dlm structure, just leverage existed > dlm status and bitmap. > It will bring a bonus - no lock resource will be marked with RECOVERING, > it's a safer way, I suppose. > > Please take a review. > > Thanks, > Changwei > > > Subject: [PATCH] ocfs2/dlm: a node can't be involved in recovery if it > is being shutdown > > Signed-off-by: Changwei Ge <ge.changwei at h3c.com> > --- > fs/ocfs2/dlm/dlmdomain.c | 4 ++++ > fs/ocfs2/dlm/dlmrecovery.c | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c > index a2b19fbdcf46..5e9283e509a4 100644 > --- a/fs/ocfs2/dlm/dlmdomain.c > +++ b/fs/ocfs2/dlm/dlmdomain.c > @@ -707,11 +707,15 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm) > * want new domain joins to communicate with us at > * least until we've completed migration of our > * resources. */ > + spin_lock(&dlm->spinlock); > dlm->dlm_state = DLM_CTXT_IN_SHUTDOWN; > + spin_unlock(&dlm->spinlock);I guess there will be misuse of 'dlm->spinlock' and dlm_domain_lock.> leave = 1; > } > spin_unlock(&dlm_domain_lock); > > + dlm_wait_for_recovery(dlm); > + > if (leave) { > mlog(0, "shutting down domain %s\n", dlm->name); > dlm_begin_exit_domain(dlm); > diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c > index 74407c6dd592..764c95b2b35c 100644 > --- a/fs/ocfs2/dlm/dlmrecovery.c > +++ b/fs/ocfs2/dlm/dlmrecovery.c > @@ -2441,6 +2441,9 @@ static void __dlm_hb_node_down(struct dlm_ctxt > *dlm, int idx) > { > assert_spin_locked(&dlm->spinlock); > > + if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN) > + return; > +'dlm->dlm_state' probably need be to protected by 'dlm_domain_lock'. and I wander if there is more work to be done when in 'DLM_CTXT_IN_SHUTDOWN'?> if (dlm->reco.new_master == idx) { > mlog(0, "%s: recovery master %d just died\n", > dlm->name, idx); >
Changwei Ge
2017-Nov-01 08:11 UTC
[Ocfs2-devel] [PATCH] ocfs2/dlm: wait for dlm recovery done when migrating all lockres
Hi Jun, Thanks for reviewing. I don't think we have to worry about misusing *dlm_domain_lock* and *dlm::spinlock*. I admit my change may look a little different from most of other code snippets where using these two spin locks. But our purpose is to close the race between __dlm_hb_node_down and dlm_unregister_domain, right? And my change meets that. :-) I suppose we can do it in a flexible way. Thanks, Changwei On 2017/11/1 15:57, piaojun wrote:> Hi Changwei, > > thanks for reviewing, and I think waiting for recoverying done before > migrating seems another solution, but I wonder if new problems will be > invoked as following comments. > > thanks, > Jun > > On 2017/11/1 15:13, Changwei Ge wrote: >> Hi Jun, >> >> I probably get your point. >> >> You mean that dlm finds no lock resource to be migrated and no more lock >> resource is managed by its hash table. After that a node dies all of a >> sudden and the dead node is put into dlm's recovery map, right? > that is it. >> Furthermore, a lock resource is migrated from other nodes and local node >> has already asserted master to them. >> >> If so, I want to suggest a easier way to solve it. >> We don't have to add a new flag to dlm structure, just leverage existed >> dlm status and bitmap. >> It will bring a bonus - no lock resource will be marked with RECOVERING, >> it's a safer way, I suppose. >> >> Please take a review. >> >> Thanks, >> Changwei >> >> >> Subject: [PATCH] ocfs2/dlm: a node can't be involved in recovery if it >> is being shutdown >> >> Signed-off-by: Changwei Ge <ge.changwei at h3c.com> >> --- >> fs/ocfs2/dlm/dlmdomain.c | 4 ++++ >> fs/ocfs2/dlm/dlmrecovery.c | 3 +++ >> 2 files changed, 7 insertions(+) >> >> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c >> index a2b19fbdcf46..5e9283e509a4 100644 >> --- a/fs/ocfs2/dlm/dlmdomain.c >> +++ b/fs/ocfs2/dlm/dlmdomain.c >> @@ -707,11 +707,15 @@ void dlm_unregister_domain(struct dlm_ctxt *dlm) >> * want new domain joins to communicate with us at >> * least until we've completed migration of our >> * resources. */ >> + spin_lock(&dlm->spinlock); >> dlm->dlm_state = DLM_CTXT_IN_SHUTDOWN; >> + spin_unlock(&dlm->spinlock); > I guess there will be misuse of 'dlm->spinlock' and dlm_domain_lock. >> leave = 1; >> } >> spin_unlock(&dlm_domain_lock); >> >> + dlm_wait_for_recovery(dlm); >> + >> if (leave) { >> mlog(0, "shutting down domain %s\n", dlm->name); >> dlm_begin_exit_domain(dlm); >> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c >> index 74407c6dd592..764c95b2b35c 100644 >> --- a/fs/ocfs2/dlm/dlmrecovery.c >> +++ b/fs/ocfs2/dlm/dlmrecovery.c >> @@ -2441,6 +2441,9 @@ static void __dlm_hb_node_down(struct dlm_ctxt >> *dlm, int idx) >> { >> assert_spin_locked(&dlm->spinlock); >> >> + if (dlm->dlm_state == DLM_CTXT_IN_SHUTDOWN) >> + return; >> + > 'dlm->dlm_state' probably need be to protected by 'dlm_domain_lock'. > and I wander if there is more work to be done when in > 'DLM_CTXT_IN_SHUTDOWN'? >> if (dlm->reco.new_master == idx) { >> mlog(0, "%s: recovery master %d just died\n", >> dlm->name, idx); >> >