thr3ads.net - Ocfs2 users - [Ocfs2-users] flock errors in dmesg [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Brian Kroth

2009-Jan-15 16:53 UTC

[Ocfs2-users] flock errors in dmesg

I've been working on creating a mail cluster using ocfs2.  Dovecot was
configured to use flock since the kernel we're running is debian based
2.6.26 which supports cluster aware flock.  User space is 1.4.1.  During
testing everything seemed fine, but when we got a real load on things we
got a whole bunch of these messages in dmesg on the node that was
hosting imap.  Note that it's maildir and only one node is hosting imap
so we don't actually need flock.

I think we're going to switch back to dotlock'ing but I was hoping
someone could interpret these error messages for me?  Are they
dangerous?

Thanks,
Brian

[257387.675734] (21573,0):ocfs2_file_lock:1587 ERROR: status = -22
[257387.675734] (21573,0):ocfs2_do_flock:79 ERROR: status = -22
[257392.121692] (21360,0):dlm_send_remote_lock_request:333 ERROR: status
= -40
[257392.121938] (21360,0):dlmlock_remote:269 ERROR: dlm status DLM_BADARGS
[257392.122023] (21360,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
[257392.122079] (21360,0):ocfs2_lock_create:998 ERROR: DLM error -22
while calling ocfs2_dlm_lock on resource F0000000000000008fca8f0e7e038e3
[257392.122220] (21360,0):ocfs2_file_lock:1587 ERROR: status = -22
[257392.122326] (21360,0):ocfs2_do_flock:79 ERROR: status = -22
[257479.277941] (21950,0):dlm_send_remote_lock_request:333 ERROR: status
= -40
[257479.277941] (21950,0):dlmlock_remote:269 ERROR: dlm status DLM_BADARGS
[257479.277941] (21950,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
[257479.277941] (21950,0):ocfs2_lock_create:998 ERROR: DLM error -22
while calling ocfs2_dlm_lock on resource F00000000000000085d6ff2e7e0a106
[257479.277941] (21950,0):ocfs2_file_lock:1587 ERROR: status = -22
[257479.277941] (21950,0):ocfs2_do_flock:79 ERROR: status = -22
[257480.407024] (21947,0):dlm_send_remote_lock_request:333 ERROR: status
= -40
[257480.407024] (21947,0):dlmlock_remote:269 ERROR: dlm status DLM_BADARGS
[257480.407024] (21947,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
[257480.407024] (21947,0):ocfs2_lock_create:998 ERROR: DLM error -22
while calling ocfs2_dlm_lock on resource F000000000000000955ae83e7e0a13d
[257480.407024] (21947,0):ocfs2_file_lock:1587 ERROR: status = -22
[257480.407024] (21947,0):ocfs2_do_flock:79 ERROR: status = -22
[257483.221066] (21972,1):dlm_send_remote_lock_request:333 ERROR: status
= -40
[257483.221066] (21972,1):dlmlock_remote:269 ERROR: dlm status DLM_BADARGS
[257483.221066] (21972,1):dlmlock:747 ERROR: dlm status = DLM_BADARGS
[257483.221066] (21972,1):ocfs2_lock_create:998 ERROR: DLM error -22
while calling ocfs2_dlm_lock on resource F000000000000000955ae84e7e0a23c
[257483.221066] (21972,1):ocfs2_file_lock:1587 ERROR: status = -22
[257483.221066] (21972,1):ocfs2_do_flock:79 ERROR: status = -22
[257725.200695] (12536,0):dlm_send_remote_lock_request:333 ERROR: status
= -40
[257725.200695] (12536,0):dlmlock_remote:269 ERROR: dlm status DLM_BADARGS
[257725.200695] (12536,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
[257725.200695] (12536,0):ocfs2_lock_create:998 ERROR: DLM error -22
while calling ocfs2_dlm_lock on resource F000000000000000758938de7e0de1f
[257725.200695] (12536,0):ocfs2_file_lock:1587 ERROR: status = -22
[257725.200695] (12536,0):ocfs2_do_flock:79 ERROR: status = -22
[257959.288124] (18619,1):dlm_send_remote_lock_request:333 ERROR: status
= -40
[257959.288124] (18619,1):dlmlock_remote:269 ERROR: dlm status DLM_BADARGS
[257959.288124] (18619,1):dlmlock:747 ERROR: dlm status = DLM_BADARGS
[257959.288124] (18619,1):ocfs2_lock_create:998 ERROR: DLM error -22
while calling ocfs2_dlm_lock on resource F000000000000000585c3e9e7e0e40d
[257959.288124] (18619,1):ocfs2_file_lock:1587 ERROR: status = -22
[257959.288124] (18619,1):ocfs2_do_flock:79 ERROR: status = -22

Sunil Mushran

2009-Jan-15 17:45 UTC

head link

[Ocfs2-users] flock errors in dmesg

We just pushed the fix upstream.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b791d68562e4ce5ab57cbacb10a1ad4ee33956e


Brian Kroth wrote:> I've been working on creating a mail cluster using ocfs2.  Dovecot was
> configured to use flock since the kernel we're running is debian based
> 2.6.26 which supports cluster aware flock.  User space is 1.4.1.  During
> testing everything seemed fine, but when we got a real load on things we
> got a whole bunch of these messages in dmesg on the node that was
> hosting imap.  Note that it's maildir and only one node is hosting imap
> so we don't actually need flock.
>
> I think we're going to switch back to dotlock'ing but I was hoping
> someone could interpret these error messages for me?  Are they
> dangerous?
>
> Thanks,
> Brian
>
> [257387.675734] (21573,0):ocfs2_file_lock:1587 ERROR: status = -22
> [257387.675734] (21573,0):ocfs2_do_flock:79 ERROR: status = -22
> [257392.121692] (21360,0):dlm_send_remote_lock_request:333 ERROR: status
> = -40
> [257392.121938] (21360,0):dlmlock_remote:269 ERROR: dlm status >
DLM_BADARGS
> [257392.122023] (21360,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
> [257392.122079] (21360,0):ocfs2_lock_create:998 ERROR: DLM error -22
> while calling ocfs2_dlm_lock on resource F0000000000000008fca8f0e7e038e3
> [257392.122220] (21360,0):ocfs2_file_lock:1587 ERROR: status = -22
> [257392.122326] (21360,0):ocfs2_do_flock:79 ERROR: status = -22
> [257479.277941] (21950,0):dlm_send_remote_lock_request:333 ERROR: status
> = -40
> [257479.277941] (21950,0):dlmlock_remote:269 ERROR: dlm status >
DLM_BADARGS
> [257479.277941] (21950,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
> [257479.277941] (21950,0):ocfs2_lock_create:998 ERROR: DLM error -22
> while calling ocfs2_dlm_lock on resource F00000000000000085d6ff2e7e0a106
> [257479.277941] (21950,0):ocfs2_file_lock:1587 ERROR: status = -22
> [257479.277941] (21950,0):ocfs2_do_flock:79 ERROR: status = -22
> [257480.407024] (21947,0):dlm_send_remote_lock_request:333 ERROR: status
> = -40
> [257480.407024] (21947,0):dlmlock_remote:269 ERROR: dlm status >
DLM_BADARGS
> [257480.407024] (21947,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
> [257480.407024] (21947,0):ocfs2_lock_create:998 ERROR: DLM error -22
> while calling ocfs2_dlm_lock on resource F000000000000000955ae83e7e0a13d
> [257480.407024] (21947,0):ocfs2_file_lock:1587 ERROR: status = -22
> [257480.407024] (21947,0):ocfs2_do_flock:79 ERROR: status = -22
> [257483.221066] (21972,1):dlm_send_remote_lock_request:333 ERROR: status
> = -40
> [257483.221066] (21972,1):dlmlock_remote:269 ERROR: dlm status >
DLM_BADARGS
> [257483.221066] (21972,1):dlmlock:747 ERROR: dlm status = DLM_BADARGS
> [257483.221066] (21972,1):ocfs2_lock_create:998 ERROR: DLM error -22
> while calling ocfs2_dlm_lock on resource F000000000000000955ae84e7e0a23c
> [257483.221066] (21972,1):ocfs2_file_lock:1587 ERROR: status = -22
> [257483.221066] (21972,1):ocfs2_do_flock:79 ERROR: status = -22
> [257725.200695] (12536,0):dlm_send_remote_lock_request:333 ERROR: status
> = -40
> [257725.200695] (12536,0):dlmlock_remote:269 ERROR: dlm status >
DLM_BADARGS
> [257725.200695] (12536,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS
> [257725.200695] (12536,0):ocfs2_lock_create:998 ERROR: DLM error -22
> while calling ocfs2_dlm_lock on resource F000000000000000758938de7e0de1f
> [257725.200695] (12536,0):ocfs2_file_lock:1587 ERROR: status = -22
> [257725.200695] (12536,0):ocfs2_do_flock:79 ERROR: status = -22
> [257959.288124] (18619,1):dlm_send_remote_lock_request:333 ERROR: status
> = -40
> [257959.288124] (18619,1):dlmlock_remote:269 ERROR: dlm status >
DLM_BADARGS
> [257959.288124] (18619,1):dlmlock:747 ERROR: dlm status = DLM_BADARGS
> [257959.288124] (18619,1):ocfs2_lock_create:998 ERROR: DLM error -22
> while calling ocfs2_dlm_lock on resource F000000000000000585c3e9e7e0e40d
> [257959.288124] (18619,1):ocfs2_file_lock:1587 ERROR: status = -22
> [257959.288124] (18619,1):ocfs2_do_flock:79 ERROR: status = -22
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Coly Li

2009-Jan-15 17:58 UTC

head link

[Ocfs2-users] flock errors in dmesg

Brian Kroth Wrote:> I've been working on creating a mail cluster using ocfs2.  Dovecot was
> configured to use flock since the kernel we're running is debian based
> 2.6.26 which supports cluster aware flock.  User space is 1.4.1.  During
> testing everything seemed fine, but when we got a real load on things we
> got a whole bunch of these messages in dmesg on the node that was
> hosting imap.  Note that it's maildir and only one node is hosting imap
> so we don't actually need flock.
> 
> I think we're going to switch back to dotlock'ing but I was hoping
> someone could interpret these error messages for me?  Are they
> dangerous?
> 
This is an known issue and the patch gets merged in 2.6.29-rc1. Here is the
patch for your reference.

Author: Sunil Mushran <sunil.mushran at oracle.com>
    ocfs2/dlm: Fix race during lockres mastery

    dlm_get_lock_resource() is supposed to return a lock resource with a proper
    master. If multiple concurrent threads attempt to lookup the lockres for the
    same lockid while the lock mastery in underway, one or more threads are
likely
    to return a lockres without a proper master.

    This patch makes the threads wait in dlm_get_lock_resource() while the
mastery
    is underway, ensuring all threads return the lockres with a proper master.

    This issue is known to be limited to users using the flock() syscall. For
all
    other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each
    lockid.

    Users encountering this bug will see flock() return EINVAL and dmesg have
the
    following error:
    ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource
<LOCKID>: bad api args

    Reported-by: Coly Li <coyli at suse.de>
    Signed-off-by: Sunil Mushran <sunil.mushran at oracle.com>
    Signed-off-by: Mark Fasheh <mfasheh at suse.com>
---
7b791d68562e4ce5ab57cbacb10a1ad4ee33956e
 fs/ocfs2/dlm/dlmmaster.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index cbf3abe..54e182a 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -732,14 +732,21 @@ lookup:
 	if (tmpres) {
 		int dropping_ref = 0;

+		spin_unlock(&dlm->spinlock);
+
 		spin_lock(&tmpres->spinlock);
+		/* We wait for the other thread that is mastering the resource */
+		if (tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN) {
+			__dlm_wait_on_lockres(tmpres);
+			BUG_ON(tmpres->owner == DLM_LOCK_RES_OWNER_UNKNOWN);
+		}
+
 		if (tmpres->owner == dlm->node_num) {
 			BUG_ON(tmpres->state & DLM_LOCK_RES_DROPPING_REF);
 			dlm_lockres_grab_inflight_ref(dlm, tmpres);
 		} else if (tmpres->state & DLM_LOCK_RES_DROPPING_REF)
 			dropping_ref = 1;
 		spin_unlock(&tmpres->spinlock);
-		spin_unlock(&dlm->spinlock);

 		/* wait until done messaging the master, drop our ref to allow
 		 * the lockres to be purged, start over. */

> Thanks,
> Brian[snip]

-- 
Coly Li
SuSE Labs

Ocfs2 users - Jan 2009 - flock errors in dmesg

[Ocfs2-users] flock errors in dmesg

[Ocfs2-users] flock errors in dmesg

[Ocfs2-users] flock errors in dmesg