thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre File Locking not locking [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Darren George

2008-Aug-21 08:52 UTC

[Lustre-discuss] Lustre File Locking not locking

Hi,
I am having problems with file locking or the lack of it.
I''m setting up a lustre environment consisting of a co-lo MGS/MDS and 3
OSS server.  I am using CTDB and SAMBA to get windows to co-operate.
I have configured the OSS server to also be clients in order to get 
CTDB/SAMBA working.
All seems okay and working apart from file locking which is not working 
from a linux or windows prospective.

I have setup the following

MGS/MDS = 192.168.3.171
OSS1 = 192.168.3.172
OSS2 = 192.168.2.2
OSS3 = 192.168.1.2

I have Lustre installed on all servers which have Centos 4 using the 
following packages:

kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.i686.rpm
lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5smp.i686.rpm
lustre-modules-1.6.5-2.6.9_67.0.7.EL_lustre.1.6.5smp.i686.rpm
lustre-1.6.5-2.6.9_67.0.7.EL_lustre.1.6.5smp.i686.rpm

I have rebuilt e2fsprogs

rpmbuild --rebuild e2fsprogs-1.40.7.sun3-0redhat.src.rpm

Installed the subsequent packages

e2fsprogs-1.40.7.sun3-0redhat.i386.rpm
e2fsprogs-debuginfo-1.40.7.sun3-0redhat.i386.rpm
e2fsprogs-devel-1.40.7.sun3-0redhat.i386.rpm
uuidd-1.40.7.sun3-0redhat.i386.rpm

The following has been added to modprobe.conf on all lustre servers

../lnet/parameters dir
options lnet networks=tcp

I have configured Lustre using the following commands:

****************** MDT/MGS ****************************************
mkfs.lustre --fsname lustre --mdt --mgs /dev/sdb
mkdir -p /mnt/mdt
mount -t lustre /dev/sdb /mnt/mdt

***************** OSS1 *********************************************
mkfs.lustre --fsname lustre --ost --mgsnode=192.168.3.171 at tcp0 /dev/sdb
mkdir -p /mnt/ost1
mount -t lustre /dev/sdb /mnt/ost1

****************** OSS2 ********************************************
mkfs.lustre --fsname lustre --ost --mgsnode=192.168.3.171 at tcp0 /dev/sdb
mkdir -p /mnt/ost2
mount -t lustre /dev/sdb /mnt/ost2

****************** OSS3 ********************************************
mkfs.lustre --fsname lustre --ost --mgsnode=192.168.3.171 at tcp0 /dev/sdb
mkdir -p /mnt/ost3
mount -t lustre /dev/sdb /mnt/ost3


I have mounted the clients on all OSS server using the following commands
****************** Client *********************************************
mkdir -p /mnt/lustre
mount -t lustre -o flock 192.168.3.171 at tcp0:/lustre /mnt/lustre

As you can see I have used the -o flock on the client mount command.

Can you please advise..
Have I missed something or configured it wrongly?
What are the best tools I can use to check why file locking is not working ?
If you need more info please let me know.

Regards
Darren George

Bernd Schubert

2008-Aug-21 09:05 UTC

head link

[Lustre-discuss] Lustre File Locking not locking

On Thursday 21 August 2008 10:52:12 Darren George wrote:> Hi,
> I am having problems with file locking or the lack of it.
> I''m setting up a lustre environment consisting of a co-lo MGS/MDS
and 3
> OSS server.  I am using CTDB and SAMBA to get windows to co-operate.
> I have configured the OSS server to also be clients in order to get
> CTDB/SAMBA working.
> All seems okay and working apart from file locking which is not working
> from a linux or windows prospective.
You provided quite a lot information, but not a single prove the locking 
doesn''t work. How did you figure that out?

[...]
>
> I have mounted the clients on all OSS server using the following commands                           ^^^^^^^^^^^^^^^^^^^^^^

What does "on all OSS server" mean? Are you using your servers also as
Lustre
clients? This might/will deadlock.
> ****************** Client *********************************************
> mkdir -p /mnt/lustre
> mount -t lustre -o flock 192.168.3.171 at tcp0:/lustre /mnt/lustre
>
> As you can see I have used the -o flock on the client mount command.
Did you already try "-o localflock"?


Cheers,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

Bernd Schubert

2008-Aug-21 10:23 UTC

head link

[Lustre-discuss] Lustre File Locking not locking

Hello Darren,

On Thursday 21 August 2008 11:32:45 Darren George wrote:> Hi Bernd,
>
> Thanks for your quick responce.
> I am indeed using the servers as lustre clients as well.
> I see the following from messages log on the server the windows client
> is connected to
>
> localhost kernel: LustreError: 32074:0:(file.c:2570:ll_file_flock()) LBUG
> Aug 21 17:06:12 localhost kernel: LustreError:
> 32074:0:(file.c:2569:ll_file_flock()) unknown fcntl lock type: 96
doh, what it type 96? From include/asm-generic/fcntl.h:

/* for posix fcntl() and lockf() */
#ifndef F_RDLCK
#define F_RDLCK		0
#define F_WRLCK		1
#define F_UNLCK		2
#endif

Now you run into a rather ugly programming technique of the Lustre 
developers, rather often they simply call LBUG(), although the 
problem is not grave:

In lustre/llite/file.c:

        switch (file_lock->fl_type) {
        case F_RDLCK:
                einfo.ei_mode = LCK_PR;
                break;
        case F_UNLCK:
                /* An unlock request may or may not have any relation to
                 * existing locks so we may not be able to pass a lock handle
                 * via a normal ldlm_lock_cancel() request. The request may even
                 * unlock a byte range in the middle of an existing lock. In
                 * order to process an unlock request we need all of the same
                 * information that is given with a normal read or write record
                 * lock request. To avoid creating another ldlm unlock (cancel)
                 * message we''ll treat a LCK_NL flock request as an
unlock. */
                einfo.ei_mode = LCK_NL;
                break;
        case F_WRLCK:
                einfo.ei_mode = LCK_PW;
                break;
        default:
                CERROR("unknown fcntl lock type: %d\n",
file_lock->fl_type);
                LBUG();
        }

IHMO, instead of calling LBUG() here, simply "return EINVAL" should be
done.
So with the present code it seems whenever the userspace is setting a 
wrong struct flock l_type, it will trigger a LBUG(). I''m going to check
this
and then will fill in a bugzilla entry.


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

Andreas Dilger

2008-Aug-21 10:32 UTC

head link

[Lustre-discuss] Lustre File Locking not locking

On Aug 21, 2008  12:23 +0200, Bernd Schubert wrote:> IHMO, instead of calling LBUG() here, simply "return EINVAL"
should be done.
> So with the present code it seems whenever the userspace is setting a 
> wrong struct flock l_type, it will trigger a LBUG(). I''m going to
check this
> and then will fill in a bugzilla entry.
Yes, this is very old code, and it should be fixed.  Ideally it would
be fixed to handle this lock type properly, but EINVAL is definitely
better than the LBUG.  I believe there is already a bug open for this.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Bernd Schubert

2008-Aug-21 11:15 UTC

head link

[Lustre-discuss] Lustre File Locking not locking

On Thursday 21 August 2008 12:32:32 Andreas Dilger
wrote:> On Aug 21, 2008  12:23 +0200, Bernd Schubert wrote:
> > IHMO, instead of calling LBUG() here, simply "return EINVAL"
should be
> > done. So with the present code it seems whenever the userspace is
setting
> > a wrong struct flock l_type, it will trigger a LBUG(). I''m
going to check
> > this and then will fill in a bugzilla entry.
>
> Yes, this is very old code, and it should be fixed.  Ideally it would
> be fixed to handle this lock type properly, but EINVAL is definitely
> better than the LBUG.  I believe there is already a bug open for this.
Hmm, but what is type 96? In binary it is "1100000", so maybe an
endian
problem and it should "11"? But then what is "11", if at all
corresponds to
1 | 2, thus F_WRLCK | F_UNLCK. But does it make sense to set both at once?


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

Bernd Schubert

2008-Aug-21 15:13 UTC

head link

[Lustre-discuss] Lustre File Locking not locking

On Thursday 21 August 2008 13:15:00 Bernd Schubert
wrote:> On Thursday 21 August 2008 12:32:32 Andreas Dilger wrote:
> > On Aug 21, 2008  12:23 +0200, Bernd Schubert wrote:
> > > IHMO, instead of calling LBUG() here, simply "return
EINVAL" should be
> > > done. So with the present code it seems whenever the userspace is
> > > setting a wrong struct flock l_type, it will trigger a LBUG().
I''m
> > > going to check this and then will fill in a bugzilla entry.
> >
> > Yes, this is very old code, and it should be fixed.  Ideally it would
> > be fixed to handle this lock type properly, but EINVAL is definitely
> > better than the LBUG.  I believe there is already a bug open for this.
>
> Hmm, but what is type 96? In binary it is "1100000", so maybe an
endian
> problem and it should "11"? But then what is "11", if
at all corresponds to
> 1 | 2, thus F_WRLCK | F_UNLCK. But does it make sense to set both at once?
Ah, it is probably the flock call and not lockf/fcntl, I always get confused 
by the two different locking methods.

This is bug #5135 (https://bugzilla.lustre.org/show_bug.cgi?id=5135).


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

Andreas Dilger

2008-Aug-21 18:43 UTC

head link

[Lustre-discuss] Lustre File Locking not locking

On Aug 21, 2008  13:15 +0200, Bernd Schubert wrote:> On Thursday 21 August 2008 12:32:32 Andreas Dilger wrote:
> > On Aug 21, 2008  12:23 +0200, Bernd Schubert wrote:
> > > IHMO, instead of calling LBUG() here, simply "return
EINVAL" should be
> > > done. So with the present code it seems whenever the userspace is
setting
> > > a wrong struct flock l_type, it will trigger a LBUG().
I''m going to check
> > > this and then will fill in a bugzilla entry.
> >
> > Yes, this is very old code, and it should be fixed.  Ideally it would
> > be fixed to handle this lock type properly, but EINVAL is definitely
> > better than the LBUG.  I believe there is already a bug open for this.
> 
> Hmm, but what is type 96? In binary it is "1100000", so maybe an
endian
> problem and it should "11"? But then what is "11", if
at all corresponds to
> 1 | 2, thus F_WRLCK | F_UNLCK. But does it make sense to set both at once?
My recollection is that it has to do something with BSD locking modes or
similar.

See https://bugzilla.lustre.org/show_bug.cgi?id=15920

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Aug 2008 - Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking

[Lustre-discuss] Lustre File Locking not locking