Coly Li
2008-Dec-02 12:06 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
When calling flock(2) by multiple processes concurrently, it is reported to be very easy resulting DLM_BADARGS error. In dmesg the similar lines can be found: Sep 2 11:17:32 acct20 kernel: (6935,1):ocfs2_lock_create:901 ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource F00000000000000005d1cfe83ab8dad: bad api args Sep 2 11:17:32 acct20 kernel: (6935,1):ocfs2_file_lock:1486 ERROR: status = -22 Here is the analysis how this error comes, in condition that 2 process concurrently call flock(2) on same file: 1) call sequence of process 1 is (first listed first called): [ocfs2_do_flock] [ocfs2_file_lock] [ocfs2_lock_create] [dlmlock] [dlm_get_lock_resource] [dlm_do_master_request] [o2net_send_message_vec] In o2net_send_message_vec(), at line 1049: wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw)); Here process 1 is scheduled out, and process 2 is scheduled in to ran. At this moment, res->owner (ownership of the locking file resource) is set to DLM_LOCK_RES_OWNER_UNKNOWN. 2) call sequence of process 2 is (first listed first called): [ocfs2_do_flock] [ocfs2_file_lock] [ocfs2_lock_create] [dlmlock] [dlmlock_remote] [o2net_send_message_vec] in dlmlock(), at line 725: if (res->owner == dlm->node_num) status = dlmlock_master(dlm, res, lock, flags); else status = dlmlock_remote(dlm, res, lock, flags); Since res->owner is still set to DLM_LOCK_RES_OWNER_UNKNOWN and not decided in process 1 yet, process 2 calls dlmlock_remote() here. But inside dlmlock_remote(), at line 247: __dlm_wait_on_lockres(res); After this function returned, res->owner is set correctly to 0 (node 0). Then at line 257: status = dlm_send_remote_lock_request(dlm, res, lock, flags); dlm_send_remote_lock_request() calls o2net_send_message() at line 330, as bellowed: tmpret = o2net_send_message(DLM_CREATE_LOCK_MSG, dlm->key, &create, sizeof(create), res->owner, &status); o2net_send_message() calls o2net_send_message_vec() and at line 987, the code is: if (target_node == o2nm_this_node()) { ret = -ELOOP; goto out; } Here target_node is res->owner which is already set to 0. But before calling dlmlock_remote(), res->owner is not set properly (still is DLM_LOCK_RES_OWNER_UNKNOWN). After entering dlmlock_remote(), and returning from __dlm_wait_on_lockres(), though res->owner is correct, the execution flow is in mistaken location. 3) The result is, process 1 can make the flock, and process 2 generates BADARGS error. 20 processes condition also got tested, only first process can success the flock, and other processes all got BADARGS. With Sunil and Mark's suggestion, the solution is, 1) If a tmpres returned from hashed resources, check its owner. 2) if tmpres->owner is DLM_LOCK_RES_OWNER_UNKNOWN, it means another process owns the resource but dose not finish its initiation. We should wait on DLM_LOCK_RES_IN_PROGRESS until the initiation accomplished. 3) Once the initiation done, re-lookup the resource. After basic testing on 2 nodes (each node forks 200,000 processes calling flock(2)) and 8 nodes (each node forks 1000 processes calling flock(2)), it works fine so far. Signed-off-by: Coly Li <coyli at suse.de> Cc: Sunil Mushran <sunil.mushran at oracle.com> Cc: Mark Fasheh <mfasheh at suse.com> Cc: Jeff Mahoney <jeffm at suse.com> --- fs/ocfs2/dlm/dlmmaster.c | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 44f87ca..c777844 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -742,6 +742,19 @@ lookup: goto lookup; } + /* tmpres might be initiating by another process, in this case + * we should wait until the initiation done */ + spin_lock(&tmpres->spinlock); + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN || + (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) { + __dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_IN_PROGRESS); + spin_unlock(&tmpres->spinlock); + dlm_lockres_put(tmpres); + tmpres = NULL; + goto lookup; + } + spin_unlock(&tmpres->spinlock); + mlog(0, "found in hash!\n"); if (res) dlm_lockres_put(res); -- Coly Li SuSE PRC Labs
Sunil Mushran
2008-Dec-02 22:09 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
Coly Li wrote:> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c > index 44f87ca..c777844 100644 > --- a/fs/ocfs2/dlm/dlmmaster.c > +++ b/fs/ocfs2/dlm/dlmmaster.c > @@ -742,6 +742,19 @@ lookup: > goto lookup; > } > > + /* tmpres might be initiating by another process, in this case > + * we should wait until the initiation done */ > + spin_lock(&tmpres->spinlock); > + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN || > + (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) { > + __dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_IN_PROGRESS); > + spin_unlock(&tmpres->spinlock); > + dlm_lockres_put(tmpres); > + tmpres = NULL; > + goto lookup; > + } > + spin_unlock(&tmpres->spinlock); > + > mlog(0, "found in hash!\n"); > if (res) > dlm_lockres_put(res);This will work but we are wasting cycles as we are looking up the same lockres again. How about the following? The mlog is for testing. --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -742,6 +742,17 @@ lookup: goto lookup; } + /* wait if lock resource is being mastered by another thread */ + spin_lock(&tmpres->spinlock); + if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) { + mlog(ML_NOTICE, "%s:%.*s Another thread is mastering " + "this resource\n", dlm->name, namelen, lockid); + __dlm_wait_on_lockres_flags(tmpres, + DLM_LOCK_RES_IN_PROGRESS); + BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN); + } + spin_unlock(&tmpres->spinlock); + mlog(0, "found in hash!\n"); if (res) dlm_lockres_put(res); Secondly, the patch comments are more than required. We should prune it down a bit. Like: ========================================================================ocfs2/dlm: Fix lockres mastery race dlm_get_lock_resource() is supposed to return a lock resource with a proper master. If multiple concurrent threads attempt to lookup the lockres for the same lockid, one or more threads are likely to return a lockres without a proper master, if the lock mastery is underway. This patch makes the threads wait in dlm_get_lock_resource() while the mastery is underway, ensuring all threads return the lockres with a proper master. This issue is limited to users using the flock() syscall. For all other fs operations, dlmglue ensures that only one thread is performing dlm operations for a given lockid at any one time. Signed-off-by: Coly Li <...> ======================================================================== Corrections welcome. Lastly, test the patch not only with the flock() test but also otherwise. Say, with recovery. Part of fixing bugs is ensuring the fix does not break existing functionality. Run all multinode tests in ocfs2-test. Thanks Sunil
Coly Li
2008-Dec-03 19:48 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
Sunil Mushran Wrote: [snip]> Corrections welcome. > > Lastly, test the patch not only with the flock() test but also otherwise. > Say, with recovery. Part of fixing bugs is ensuring the fix does not break > existing functionality. Run all multinode tests in ocfs2-test.On a 8 nodes cluster setup by xen, I tested these test cases, - run_buildkernel.py - cross_delete.py - dlmstress1.sh - run_extend_and_write.py - run_lvb_torture.py - mmap_test - test_netfail.py Not tested cases are run_create_racer.py and open_delete.py, cause I encountered some error when setup them. When I ran dlmsress1.sh and test_netfail.py, - Kernel BUG at fs/ocfs2/dlm/dlmast.c:334 was triggered 3 times. - Kernel BUG at fs/ocfs2/dlm/dlmmaster.c:1694 was triggered once. Right now, there is not obvious clue they are directly related to this patch. They might stay there already, I will do more testing to confirm. Thanks. -- Coly Li SuSE PRC Labs
tristan.ye
2008-Dec-04 01:31 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
On Thu, 2008-12-04 at 03:48 +0800, Coly Li wrote:> > Sunil Mushran Wrote: > [snip] > > > Corrections welcome. > > > > Lastly, test the patch not only with the flock() test but also otherwise. > > Say, with recovery. Part of fixing bugs is ensuring the fix does not break > > existing functionality. Run all multinode tests in ocfs2-test. > On a 8 nodes cluster setup by xen, I tested these test cases, > - run_buildkernel.py > - cross_delete.py > - dlmstress1.sh > - run_extend_and_write.py > - run_lvb_torture.py > - mmap_test > - test_netfail.py > > Not tested cases are run_create_racer.py and open_delete.py, cause I encountered some error when > setup them.Hi Coly, How did you setup the env for testing? are you using the latest testsuite from ocfs2-test.git? As for the cluster testing, we need to install openmpi first, and setup passwordless ssh&rsh connection among all nodes. Besides, all testing binaries need to be run from a installed BINDIR after a successful building&installation. Regards, Tristan> > When I ran dlmsress1.sh and test_netfail.py, > - Kernel BUG at fs/ocfs2/dlm/dlmast.c:334 was triggered 3 times. > - Kernel BUG at fs/ocfs2/dlm/dlmmaster.c:1694 was triggered once. > Right now, there is not obvious clue they are directly related to this patch. They might stay there > already, I will do more testing to confirm. > > Thanks.
tristan.ye
2008-Dec-11 08:26 UTC
[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking
On Thu, 2008-12-11 at 16:15 +0800, Coly Li wrote:> > tristan.ye Wrote: > > On Thu, 2008-12-04 at 03:48 +0800, Coly Li wrote: > >> Sunil Mushran Wrote: > >> [snip] > >> > >>> Corrections welcome. > >>> > >>> Lastly, test the patch not only with the flock() test but also otherwise. > >>> Say, with recovery. Part of fixing bugs is ensuring the fix does not break > >>> existing functionality. Run all multinode tests in ocfs2-test. > >> On a 8 nodes cluster setup by xen, I tested these test cases, > >> - run_buildkernel.py > >> - cross_delete.py > >> - dlmstress1.sh > >> - run_extend_and_write.py > >> - run_lvb_torture.py > >> - mmap_test > >> - test_netfail.py > >> > >> Not tested cases are run_create_racer.py and open_delete.py, cause I encountered some error when > >> setup them. > > Hi Coly, > > > > How did you setup the env for testing? are you using the latest > > testsuite from ocfs2-test.git? As for the cluster testing, we need to > I download the code by svn, here is my command, > svn co http://oss.oracle.com/projects/ocfs2-test/src ocfs2-test > Is it out of date ?Actually we've moved to git repository for ocfs2-test src management, so you'd better turn to our git repo for pulling latest test sources. Such as, 1. git clone git://oss.oracle.com/git/ocfs2-test ocfs2-test 2. ./autogen.sh 3. make && make install DESTDIR=/your/dest/path All testing binaries should be fairly ready for test if you've already configed your openmpi and ssh&rsh connection well. Just feel free to let Marcos and me know if you're being hindered by any problem during the testing:) Regards, Tristan> > > install openmpi first, and setup passwordless ssh&rsh connection among > > all nodes. > > > Yeah, I did that. There is very helpful document on ocfs2 wiki, thanks to the author :) > > > Besides, all testing binaries need to be run from a installed BINDIR > > after a successful building&installation. > When I tried run_create_racer.py and open_delete.py, maybe I didn't input correct/proper parameters > to the scripts, I will start another thread to ask for help :) > [snip] > > Thanks for your feedback. Since I switched my email address, my reply comes late ...