thr3ads.net - Ocfs2 users - [Ocfs2-users] fcntl exclusive lock implementation in ocfs2 [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Jeff Fookson

2007-Apr-04 17:03 UTC

[Ocfs2-users] fcntl exclusive lock implementation in ocfs2

I am currently testing ocfs2 for use in a two-node cluster that will run 
the Cyrus imapd and am having issues
that seem to be related to occasionally long times being needed while 
the software blocks waiting to get a writelock
via the 'fcntl' system call. I am aware that the current ocfs2 supports 
neither a writable mmap nor a cluster-aware
flock, so my tests are done doing all writing to only one node of the 
cluster and the Cyrus configuration
is such that none of the requisite databases require a writable 'mmap' 
(i.e. all databases are skiplist, not Berkeley DB).
I am using drbd to provide the appropriate
support for having the disks on the two nodes to behave as a shared 
resource; as permitted by drbd, version 8,
the disks on both nodes are drbd primaries and mounted on their 
respective machines. I am testing by having modest
size mail messages delivered to just one of the machines at the rate of 
1/sec. The system will run fine in this mode, sometimes for
days but then will get hopelessly wedged with many   'lmtpd' processes 
waiting to get exclusive locks on the various Cyrus
databases. As the system approaches this deadlock condition, 'strace' 
shows times of many seconds being spent in 'fcntl'
waiting for the lock and the load average skyrockets because of all the 
'lmtpd' processes.
 Since mail is being delivered at essentially a constant rate and there 
is no other activity on the systems, I'm confused
as to how the machines will often run for extended times before suddenly 
getting into this pathological state.

I realize that because my setup is using several complex layers 
(actually the full storage design has

md->drbd->lvm->ocfs2->Cyrus imapd)  I will also consult the drbd and
Cyrus mailing lists, but I'm hoping
that someone on this list might have some insight into how fcntl-based 
locking is implemented under ocfs2
that may help point the way to what is causing the deadlock after many 
days of running well.

The machines are both running CentOS 4.4 with a 2.6.19 kernel; the ocfs2 
code is that included with the kernel
sources; drbd is version 8.0 and the Cyrus version is 2.3.8.

Thank you for any thoughts on this matter.

Jeff Fookson

-- 
Jeffrey E. Fookson, PhD			Phone: (520) 621 3091
Support Systems Analyst, Principal	jfookson@as.arizona.edu
Steward Observatory
University of Arizona

Sunil Mushran

2007-Apr-05 17:20 UTC

head link

[Ocfs2-users] fcntl exclusive lock implementation in ocfs2

ocfs2 currently lets vfs handle fcntl locking.

Jeff Fookson wrote:> I am currently testing ocfs2 for use in a two-node cluster that will 
> run the Cyrus imapd and am having issues
> that seem to be related to occasionally long times being needed while 
> the software blocks waiting to get a writelock
> via the 'fcntl' system call. I am aware that the current ocfs2 
> supports neither a writable mmap nor a cluster-aware
> flock, so my tests are done doing all writing to only one node of the 
> cluster and the Cyrus configuration
> is such that none of the requisite databases require a writable
'mmap'
> (i.e. all databases are skiplist, not Berkeley DB).
> I am using drbd to provide the appropriate
> support for having the disks on the two nodes to behave as a shared 
> resource; as permitted by drbd, version 8,
> the disks on both nodes are drbd primaries and mounted on their 
> respective machines. I am testing by having modest
> size mail messages delivered to just one of the machines at the rate 
> of 1/sec. The system will run fine in this mode, sometimes for
> days but then will get hopelessly wedged with many   'lmtpd'
processes
> waiting to get exclusive locks on the various Cyrus
> databases. As the system approaches this deadlock condition,
'strace'
> shows times of many seconds being spent in 'fcntl'
> waiting for the lock and the load average skyrockets because of all 
> the 'lmtpd' processes.
> Since mail is being delivered at essentially a constant rate and there 
> is no other activity on the systems, I'm confused
> as to how the machines will often run for extended times before 
> suddenly getting into this pathological state.
>
> I realize that because my setup is using several complex layers 
> (actually the full storage design has
>
> md->drbd->lvm->ocfs2->Cyrus imapd)  I will also consult the
drbd and
> Cyrus mailing lists, but I'm hoping
> that someone on this list might have some insight into how fcntl-based 
> locking is implemented under ocfs2
> that may help point the way to what is causing the deadlock after many 
> days of running well.
>
> The machines are both running CentOS 4.4 with a 2.6.19 kernel; the 
> ocfs2 code is that included with the kernel
> sources; drbd is version 8.0 and the Cyrus version is 2.3.8.
>
> Thank you for any thoughts on this matter.
>
> Jeff Fookson
>

Ocfs2 users - Apr 2007 - fcntl exclusive lock implementation in ocfs2

[Ocfs2-users] fcntl exclusive lock implementation in ocfs2

[Ocfs2-users] fcntl exclusive lock implementation in ocfs2