CFS gurus,
An example of the interesting problem that I just discovered...
-------------
Step 1: Lustre client creates a file.
Step 2: An NFS client does an ls -l and gets stale meta-data. (very stale)
Configuration:
--------------------------------------------------------
nfsg2 is a Lustre client.
nfsg2 Lustre version: 1.4.6.2
nfsg2 Linux kernel: 2.4.21-40.EL plus Lustre patches
fc1 is a Fedora Core 4 NFS client, and is mounting Lusre
over NFS from nfsg1 with actimeo=0.
nfsg1 is the NFS server and a Lustre client.
nfsg1 Lustre version: 1.4.6.2
nfsg1 Linux kernel: 2.4.21-40.EL plus Lustre patches
mds: Lustre version: 1.4.6.2
mds: Linux kernel: 2.6.9-34.0.1.EL
osts: Lustre version: 1.4.6.2
osts: Linux kernel 2.6.9-34.0.1.EL
---------------------------------------------------------
The example:
[root@nfsg2 lustre]# date
Thu Jul 13 10:02:26 CDT 2006
[root@nfsg2 lustre]# date > stamp
[root@fc1 lustre]# date
Thu Jul 13 10:02:29 CDT 2006
.... wait 7 hours ....
[root@fc1 lustre]# date
Thu Jul 13 16:57:06 CDT 2006
[root@fc1 lustre]# ls -l
total 48
-rwxr-xr-x 1 root root 46 Jul 12 10:01 dols
-rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c
-rwxr-xr-x 1 root root 99 Jul 11 16:24 rock
-rw-r--r-- 1 root root 0 Jul 13 10:04 stamp
---------- size = 0, after 7 hours ? -----------
then 3 seconds later....
[root@fc1 lustre]# date
Thu Jul 13 16:57:09 CDT 2006
[root@fc1 lustre]# ls -l
total 56
-rwxr-xr-x 1 root root 46 Jul 12 10:01 dols
-rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c
-rwxr-xr-x 1 root root 99 Jul 11 16:24 rock
-rw-r--r-- 1 root root 29 Jul 13 10:04 stamp
---------- now the size is correct... just 7 hours late .....
In the example above, the meta-data is 7 hours stale.
If one does another ls -l then the correct size is returned, but
the first ls -l returns a size of zero, and appears to do so for an
unlimited amount of time. ( minutes, hours, even days or weeks)
The problem here appears to be that Lustre clients can create files, but
the NFS server (which is another Lustre client) is getting very stale data
and shipping it off to the NFS client. (regardless of the value of
actimeo=0)
---------
Any suggestions on how to work around this, or a patch ?
Thank you,
Don
Don, CFS really doesn''t want to support Linux 2.4 NFS exports from Lustre; it depends on a patch to the NFS server that has been difficult to stabilize to say the least. The issue is important if it is still present with Linux 2.6 in the 1.4.7 release. We should check that. Thanks! - Peter - > -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Iozone > Sent: Friday, July 14, 2006 4:13 PM > To: lustre-discuss@clusterfs.com > Subject: [Lustre-discuss] A curious meta-data issue > > CFS gurus, > > An example of the interesting problem that I just discovered... > > ------------- > Step 1: Lustre client creates a file. > Step 2: An NFS client does an ls -l and gets stale > meta-data. (very stale) > > Configuration: > -------------------------------------------------------- > nfsg2 is a Lustre client. > nfsg2 Lustre version: 1.4.6.2 > nfsg2 Linux kernel: 2.4.21-40.EL plus Lustre patches > > fc1 is a Fedora Core 4 NFS client, and is mounting Lusre > over NFS from nfsg1 with actimeo=0. > > nfsg1 is the NFS server and a Lustre client. > nfsg1 Lustre version: 1.4.6.2 > nfsg1 Linux kernel: 2.4.21-40.EL plus Lustre patches > > mds: Lustre version: 1.4.6.2 > mds: Linux kernel: 2.6.9-34.0.1.EL > osts: Lustre version: 1.4.6.2 > osts: Linux kernel 2.6.9-34.0.1.EL > --------------------------------------------------------- > > The example: > > [root@nfsg2 lustre]# date > Thu Jul 13 10:02:26 CDT 2006 > [root@nfsg2 lustre]# date > stamp > > [root@fc1 lustre]# date > Thu Jul 13 10:02:29 CDT 2006 > .... wait 7 hours .... > [root@fc1 lustre]# date > Thu Jul 13 16:57:06 CDT 2006 > [root@fc1 lustre]# ls -l > total 48 > -rwxr-xr-x 1 root root 46 Jul 12 10:01 dols > -rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c > -rwxr-xr-x 1 root root 99 Jul 11 16:24 rock > -rw-r--r-- 1 root root 0 Jul 13 10:04 stamp > > ---------- size = 0, after 7 hours ? ----------- > > then 3 seconds later.... > > [root@fc1 lustre]# date > Thu Jul 13 16:57:09 CDT 2006 > [root@fc1 lustre]# ls -l > total 56 > -rwxr-xr-x 1 root root 46 Jul 12 10:01 dols > -rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c > -rwxr-xr-x 1 root root 99 Jul 11 16:24 rock > -rw-r--r-- 1 root root 29 Jul 13 10:04 stamp > ---------- now the size is correct... just 7 hours late ..... > > > In the example above, the meta-data is 7 hours stale. > If one does another ls -l then the correct size is returned, > but the first ls -l returns a size of zero, and appears to > do so for an unlimited amount of time. ( minutes, hours, > even days or weeks) > > The problem here appears to be that Lustre clients can > create files, but the NFS server (which is another Lustre > client) is getting very stale data and shipping it off to > the NFS client. (regardless of the value of > actimeo=0) > --------- > > Any suggestions on how to work around this, or a patch ? > > Thank you, > Don > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
----- Original Message -----
From: "Peter J. Braam" <braam@clusterfs.com>
To: "Iozone" <capps@iozone.org>;
<lustre-discuss@clusterfs.com>
Sent: Friday, July 14, 2006 5:36 PM
Subject: RE: [Lustre-discuss] A curious meta-data issue
Don,
CFS really doesn''t want to support Linux 2.4 NFS exports from Lustre;
it
depends on a patch to the NFS server that has been difficult to
stabilize to say the least.
The issue is important if it is still present with Linux 2.6 in the
1.4.7 release. We should check that.
Thanks!
- Peter -
----------
Peter,
It is also interesting that the stale meta-data happens
if the NFS client is running a 2.6 based kernel. The older
Linux 2.4 NFS clients get correct meta-data...... So, in this
case, the newer the NFS client, the worse the situation....:-(
Would it be possible to have someone take a peek and
see if this can be fixed for the 2.4 based NFS server/Lustre
client ? as this is what is currently available and in use by
customers, and NFS server support in Lustre 1.4.7 is not
available yet. I''ll volunteer my cluster for the debug :-)
I''ve got a very reproducible test case :-)
Enjoy,
Don