thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Coly Li

2008-Dec-02 12:06 UTC

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

When calling flock(2) by multiple processes concurrently, it is reported to be
very easy resulting
DLM_BADARGS error. In dmesg the similar lines can be found:
Sep  2 11:17:32 acct20 kernel: (6935,1):ocfs2_lock_create:901 ERROR: Dlm error
"DLM_BADARGS" while
calling dlmlock on resource
F00000000000000005d1cfe83ab8dad: bad api args
Sep  2 11:17:32 acct20 kernel: (6935,1):ocfs2_file_lock:1486 ERROR: status = -22

Here is the analysis how this error comes, in condition that 2 process
concurrently call flock(2) on
same file:

1) call sequence of process 1 is (first listed first called):
   [ocfs2_do_flock] [ocfs2_file_lock] [ocfs2_lock_create] [dlmlock]
[dlm_get_lock_resource]
[dlm_do_master_request] [o2net_send_message_vec]
   In o2net_send_message_vec(), at line 1049:
        wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw));
   Here process 1 is scheduled out, and process 2 is scheduled in to ran.
   At this moment, res->owner (ownership of the locking file resource) is set
to
DLM_LOCK_RES_OWNER_UNKNOWN.

2) call sequence of process 2 is (first listed first called):
   [ocfs2_do_flock] [ocfs2_file_lock] [ocfs2_lock_create] [dlmlock]
[dlmlock_remote]
[o2net_send_message_vec]
   in dlmlock(), at line 725:
                if (res->owner == dlm->node_num)
                        status = dlmlock_master(dlm, res, lock, flags);
                else
                        status = dlmlock_remote(dlm, res, lock, flags);
    Since res->owner is still set to DLM_LOCK_RES_OWNER_UNKNOWN and not
decided in process 1 yet,
process 2 calls dlmlock_remote() here. But inside dlmlock_remote(), at line 247:
        __dlm_wait_on_lockres(res);
    After this function returned, res->owner is set correctly to 0 (node 0).
Then at line 257:
        status = dlm_send_remote_lock_request(dlm, res, lock, flags);
    dlm_send_remote_lock_request() calls o2net_send_message() at line 330, as
bellowed:
        tmpret = o2net_send_message(DLM_CREATE_LOCK_MSG, dlm->key,
&create,
                                    sizeof(create), res->owner, &status);
    o2net_send_message() calls o2net_send_message_vec() and at line 987, the
code is:
        if (target_node == o2nm_this_node()) {
                ret = -ELOOP;
                goto out;
        }
   Here target_node is res->owner which is already set to 0. But before
calling dlmlock_remote(),
res->owner is not set properly (still is DLM_LOCK_RES_OWNER_UNKNOWN). After
entering
dlmlock_remote(), and returning from __dlm_wait_on_lockres(), though
res->owner is correct, the
execution flow is in mistaken location.

3) The result is, process 1 can make the flock, and process 2 generates BADARGS
error. 20 processes
condition also got tested, only first process can success the flock, and other
processes all got
BADARGS.

With Sunil and Mark's suggestion, the solution is,
1) If a tmpres returned from hashed resources, check its owner.
2) if tmpres->owner is DLM_LOCK_RES_OWNER_UNKNOWN, it means another process
owns the resource but
dose not finish its initiation. We should wait on DLM_LOCK_RES_IN_PROGRESS until
the initiation
accomplished.
3) Once the initiation done, re-lookup the resource.

After basic testing on 2 nodes (each node forks 200,000 processes calling
flock(2)) and 8 nodes
(each node forks 1000 processes calling flock(2)), it works fine so far.

Signed-off-by: Coly Li <coyli at suse.de>
Cc: Sunil Mushran <sunil.mushran at oracle.com>
Cc: Mark Fasheh <mfasheh at suse.com>
Cc: Jeff Mahoney <jeffm at suse.com>
---
 fs/ocfs2/dlm/dlmmaster.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 44f87ca..c777844 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -742,6 +742,19 @@ lookup:
 			goto lookup;
 		}

+		/* tmpres might be initiating by another process, in this case
+		 * we should wait until the initiation done  */
+		spin_lock(&tmpres->spinlock);
+		if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN ||
+		    (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) {
+			__dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_IN_PROGRESS);
+			spin_unlock(&tmpres->spinlock);
+			dlm_lockres_put(tmpres);
+			tmpres = NULL;
+			goto lookup;
+		}
+		spin_unlock(&tmpres->spinlock);
+
 		mlog(0, "found in hash!\n");
 		if (res)
 			dlm_lockres_put(res);

-- 
Coly Li
SuSE PRC Labs

Sunil Mushran

2008-Dec-02 22:09 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

Coly Li wrote:> diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
> index 44f87ca..c777844 100644
> --- a/fs/ocfs2/dlm/dlmmaster.c
> +++ b/fs/ocfs2/dlm/dlmmaster.c
> @@ -742,6 +742,19 @@ lookup:
>  			goto lookup;
>  		}
>
> +		/* tmpres might be initiating by another process, in this case
> +		 * we should wait until the initiation done  */
> +		spin_lock(&tmpres->spinlock);
> +		if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN ||
> +		    (tmpres->state & DLM_LOCK_RES_IN_PROGRESS)) {
> +			__dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_IN_PROGRESS);
> +			spin_unlock(&tmpres->spinlock);
> +			dlm_lockres_put(tmpres);
> +			tmpres = NULL;
> +			goto lookup;
> +		}
> +		spin_unlock(&tmpres->spinlock);
> +
>  		mlog(0, "found in hash!\n");
>  		if (res)
>  			dlm_lockres_put(res);
This will work but we are wasting cycles as we are looking up the same
lockres again. How about the following? The mlog is for testing.

--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -742,6 +742,17 @@ lookup:
                        goto lookup;
                }

+               /* wait if lock resource is being mastered by another 
thread */
+               spin_lock(&tmpres->spinlock);
+               if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
+                       mlog(ML_NOTICE, "%s:%.*s Another thread is 
mastering "
+                            "this resource\n", dlm->name, namelen,
lockid);
+                       __dlm_wait_on_lockres_flags(tmpres,
+                                                   
DLM_LOCK_RES_IN_PROGRESS);
+                       BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
+               }
+               spin_unlock(&tmpres->spinlock);
+
                mlog(0, "found in hash!\n");
                if (res)
                        dlm_lockres_put(res);

Secondly, the patch comments are more than required. We should prune it
down a bit. Like:

========================================================================ocfs2/dlm:
Fix lockres mastery race

dlm_get_lock_resource() is supposed to return a lock resource with a proper
master. If multiple concurrent threads attempt to lookup the lockres for the
same lockid, one or more threads are likely to return a lockres without a
proper master, if the lock mastery is underway.

This patch makes the threads wait in dlm_get_lock_resource() while the 
mastery
is underway, ensuring all threads return the lockres with a proper master.

This issue is limited to users using the flock() syscall. For all other fs
operations, dlmglue ensures that only one thread is performing dlm 
operations
for a given lockid at any one time.

Signed-off-by: Coly Li <...>
========================================================================
Corrections welcome.

Lastly, test the patch not only with the flock() test but also otherwise.
Say, with recovery. Part of fixing bugs is ensuring the fix does not break
existing functionality. Run all multinode tests in ocfs2-test.

Thanks
Sunil

Coly Li

2008-Dec-03 19:48 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

Sunil Mushran Wrote:
[snip]
> Corrections welcome.
> 
> Lastly, test the patch not only with the flock() test but also otherwise.
> Say, with recovery. Part of fixing bugs is ensuring the fix does not break
> existing functionality. Run all multinode tests in ocfs2-test.On a 8 nodes cluster setup by xen, I tested these test cases,
- run_buildkernel.py
- cross_delete.py
- dlmstress1.sh
- run_extend_and_write.py
- run_lvb_torture.py
- mmap_test
- test_netfail.py

Not tested cases are run_create_racer.py and open_delete.py, cause I encountered
some error when
setup them.

When I ran dlmsress1.sh and test_netfail.py,
- Kernel BUG at fs/ocfs2/dlm/dlmast.c:334 was triggered 3 times.
- Kernel BUG at fs/ocfs2/dlm/dlmmaster.c:1694 was triggered once.
Right now, there is not obvious clue they are directly related to this patch.
They might stay there
already, I will do more testing to confirm.

Thanks.
-- 
Coly Li
SuSE PRC Labs

tristan.ye

2008-Dec-04 01:31 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

On Thu, 2008-12-04 at 03:48 +0800, Coly Li wrote:> 
> Sunil Mushran Wrote:
> [snip]
> 
> > Corrections welcome.
> > 
> > Lastly, test the patch not only with the flock() test but also
otherwise.
> > Say, with recovery. Part of fixing bugs is ensuring the fix does not
break
> > existing functionality. Run all multinode tests in ocfs2-test.
> On a 8 nodes cluster setup by xen, I tested these test cases,
> - run_buildkernel.py
> - cross_delete.py
> - dlmstress1.sh
> - run_extend_and_write.py
> - run_lvb_torture.py
> - mmap_test
> - test_netfail.py
> 
> Not tested cases are run_create_racer.py and open_delete.py, cause I
encountered some error when
> setup them.Hi Coly,

How did you setup the env for testing? are you using the latest
testsuite from ocfs2-test.git? As for the cluster testing, we need to
install openmpi first, and setup passwordless ssh&rsh connection among
all nodes.

Besides, all testing binaries need to be run from a installed BINDIR
after a successful building&installation.

Regards,
Tristan
> 
> When I ran dlmsress1.sh and test_netfail.py,
> - Kernel BUG at fs/ocfs2/dlm/dlmast.c:334 was triggered 3 times.
> - Kernel BUG at fs/ocfs2/dlm/dlmmaster.c:1694 was triggered once.
> Right now, there is not obvious clue they are directly related to this
patch. They might stay there
> already, I will do more testing to confirm.
> 
> Thanks.

tristan.ye

2008-Dec-11 08:26 UTC

head link

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

On Thu, 2008-12-11 at 16:15 +0800, Coly Li wrote:> 
> tristan.ye Wrote:
> > On Thu, 2008-12-04 at 03:48 +0800, Coly Li wrote:
> >> Sunil Mushran Wrote:
> >> [snip]
> >>
> >>> Corrections welcome.
> >>>
> >>> Lastly, test the patch not only with the flock() test but also
otherwise.
> >>> Say, with recovery. Part of fixing bugs is ensuring the fix
does not break
> >>> existing functionality. Run all multinode tests in ocfs2-test.
> >> On a 8 nodes cluster setup by xen, I tested these test cases,
> >> - run_buildkernel.py
> >> - cross_delete.py
> >> - dlmstress1.sh
> >> - run_extend_and_write.py
> >> - run_lvb_torture.py
> >> - mmap_test
> >> - test_netfail.py
> >>
> >> Not tested cases are run_create_racer.py and open_delete.py, cause
I encountered some error when
> >> setup them.
> > Hi Coly,
> > 
> > How did you setup the env for testing? are you using the latest
> > testsuite from ocfs2-test.git? As for the cluster testing, we need to
> I download the code by svn, here is my command,
> svn co http://oss.oracle.com/projects/ocfs2-test/src ocfs2-test
> Is it out of date ?
Actually we've moved to git repository for ocfs2-test src management, so
you'd better turn to our git repo for pulling latest test sources. Such
as,

1. git clone git://oss.oracle.com/git/ocfs2-test ocfs2-test

2. ./autogen.sh 

3. make && make install DESTDIR=/your/dest/path

All testing binaries should be fairly ready for test if you've already
configed your openmpi and ssh&rsh connection well.

Just feel free to let Marcos and me know if you're being hindered by any
problem during the testing:)


Regards,
Tristan
> 
> > install openmpi first, and setup passwordless ssh&rsh connection
among
> > all nodes.
> > 
> Yeah, I did that. There is very helpful document on ocfs2 wiki, thanks to
the author :)
> 
> > Besides, all testing binaries need to be run from a installed BINDIR
> > after a successful building&installation.
> When I tried run_create_racer.py and open_delete.py, maybe I didn't
input correct/proper parameters
> to the scripts, I will start another thread to ask for help :)
> [snip]
> 
> Thanks for your feedback. Since I switched my email address, my reply comes
late ...

Ocfs2 devel - Dec 2008 - [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking

[Ocfs2-devel] [PATCH] ocfs2: fix DLM_BADARGS error in concurrent file locking