CFS gurus, An example of the interesting problem that I just discovered... ------------- Step 1: Lustre client creates a file. Step 2: An NFS client does an ls -l and gets stale meta-data. (very stale) Configuration: -------------------------------------------------------- nfsg2 is a Lustre client. nfsg2 Lustre version: 1.4.6.2 nfsg2 Linux kernel: 2.4.21-40.EL plus Lustre patches fc1 is a Fedora Core 4 NFS client, and is mounting Lusre over NFS from nfsg1 with actimeo=0. nfsg1 is the NFS server and a Lustre client. nfsg1 Lustre version: 1.4.6.2 nfsg1 Linux kernel: 2.4.21-40.EL plus Lustre patches mds: Lustre version: 1.4.6.2 mds: Linux kernel: 2.6.9-34.0.1.EL osts: Lustre version: 1.4.6.2 osts: Linux kernel 2.6.9-34.0.1.EL --------------------------------------------------------- The example: [root@nfsg2 lustre]# date Thu Jul 13 10:02:26 CDT 2006 [root@nfsg2 lustre]# date > stamp [root@fc1 lustre]# date Thu Jul 13 10:02:29 CDT 2006 .... wait 7 hours .... [root@fc1 lustre]# date Thu Jul 13 16:57:06 CDT 2006 [root@fc1 lustre]# ls -l total 48 -rwxr-xr-x 1 root root 46 Jul 12 10:01 dols -rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c -rwxr-xr-x 1 root root 99 Jul 11 16:24 rock -rw-r--r-- 1 root root 0 Jul 13 10:04 stamp ---------- size = 0, after 7 hours ? ----------- then 3 seconds later.... [root@fc1 lustre]# date Thu Jul 13 16:57:09 CDT 2006 [root@fc1 lustre]# ls -l total 56 -rwxr-xr-x 1 root root 46 Jul 12 10:01 dols -rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c -rwxr-xr-x 1 root root 99 Jul 11 16:24 rock -rw-r--r-- 1 root root 29 Jul 13 10:04 stamp ---------- now the size is correct... just 7 hours late ..... In the example above, the meta-data is 7 hours stale. If one does another ls -l then the correct size is returned, but the first ls -l returns a size of zero, and appears to do so for an unlimited amount of time. ( minutes, hours, even days or weeks) The problem here appears to be that Lustre clients can create files, but the NFS server (which is another Lustre client) is getting very stale data and shipping it off to the NFS client. (regardless of the value of actimeo=0) --------- Any suggestions on how to work around this, or a patch ? Thank you, Don
Don, CFS really doesn''t want to support Linux 2.4 NFS exports from Lustre; it depends on a patch to the NFS server that has been difficult to stabilize to say the least. The issue is important if it is still present with Linux 2.6 in the 1.4.7 release. We should check that. Thanks! - Peter - > -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Iozone > Sent: Friday, July 14, 2006 4:13 PM > To: lustre-discuss@clusterfs.com > Subject: [Lustre-discuss] A curious meta-data issue > > CFS gurus, > > An example of the interesting problem that I just discovered... > > ------------- > Step 1: Lustre client creates a file. > Step 2: An NFS client does an ls -l and gets stale > meta-data. (very stale) > > Configuration: > -------------------------------------------------------- > nfsg2 is a Lustre client. > nfsg2 Lustre version: 1.4.6.2 > nfsg2 Linux kernel: 2.4.21-40.EL plus Lustre patches > > fc1 is a Fedora Core 4 NFS client, and is mounting Lusre > over NFS from nfsg1 with actimeo=0. > > nfsg1 is the NFS server and a Lustre client. > nfsg1 Lustre version: 1.4.6.2 > nfsg1 Linux kernel: 2.4.21-40.EL plus Lustre patches > > mds: Lustre version: 1.4.6.2 > mds: Linux kernel: 2.6.9-34.0.1.EL > osts: Lustre version: 1.4.6.2 > osts: Linux kernel 2.6.9-34.0.1.EL > --------------------------------------------------------- > > The example: > > [root@nfsg2 lustre]# date > Thu Jul 13 10:02:26 CDT 2006 > [root@nfsg2 lustre]# date > stamp > > [root@fc1 lustre]# date > Thu Jul 13 10:02:29 CDT 2006 > .... wait 7 hours .... > [root@fc1 lustre]# date > Thu Jul 13 16:57:06 CDT 2006 > [root@fc1 lustre]# ls -l > total 48 > -rwxr-xr-x 1 root root 46 Jul 12 10:01 dols > -rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c > -rwxr-xr-x 1 root root 99 Jul 11 16:24 rock > -rw-r--r-- 1 root root 0 Jul 13 10:04 stamp > > ---------- size = 0, after 7 hours ? ----------- > > then 3 seconds later.... > > [root@fc1 lustre]# date > Thu Jul 13 16:57:09 CDT 2006 > [root@fc1 lustre]# ls -l > total 56 > -rwxr-xr-x 1 root root 46 Jul 12 10:01 dols > -rw-r--r-- 1 root root 26324 Jul 11 14:02 fsx.c > -rwxr-xr-x 1 root root 99 Jul 11 16:24 rock > -rw-r--r-- 1 root root 29 Jul 13 10:04 stamp > ---------- now the size is correct... just 7 hours late ..... > > > In the example above, the meta-data is 7 hours stale. > If one does another ls -l then the correct size is returned, > but the first ls -l returns a size of zero, and appears to > do so for an unlimited amount of time. ( minutes, hours, > even days or weeks) > > The problem here appears to be that Lustre clients can > create files, but the NFS server (which is another Lustre > client) is getting very stale data and shipping it off to > the NFS client. (regardless of the value of > actimeo=0) > --------- > > Any suggestions on how to work around this, or a patch ? > > Thank you, > Don > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
----- Original Message ----- From: "Peter J. Braam" <braam@clusterfs.com> To: "Iozone" <capps@iozone.org>; <lustre-discuss@clusterfs.com> Sent: Friday, July 14, 2006 5:36 PM Subject: RE: [Lustre-discuss] A curious meta-data issue Don, CFS really doesn''t want to support Linux 2.4 NFS exports from Lustre; it depends on a patch to the NFS server that has been difficult to stabilize to say the least. The issue is important if it is still present with Linux 2.6 in the 1.4.7 release. We should check that. Thanks! - Peter - ---------- Peter, It is also interesting that the stale meta-data happens if the NFS client is running a 2.6 based kernel. The older Linux 2.4 NFS clients get correct meta-data...... So, in this case, the newer the NFS client, the worse the situation....:-( Would it be possible to have someone take a peek and see if this can be fixed for the 2.4 based NFS server/Lustre client ? as this is what is currently available and in use by customers, and NFS server support in Lustre 1.4.7 is not available yet. I''ll volunteer my cluster for the debug :-) I''ve got a very reproducible test case :-) Enjoy, Don