Phil Schwan
2006-May-19 07:36 UTC
[Lustre-discuss] Multiple Lustre clients, record locking -> POSIX ?
On Fri, 2004-08-20 at 11:42, Capps@iozone.org wrote:> It appears that record locking does > not work across multiple Lustre clients. In fact, it appears > that exclusive write locks are granted to everyone. This > results in the obvious, file corruption. (That''s not good, right > ?) > This is happening on Lustre 1.2.1.When you say "record locking", are you referring to lockf/flock? If so, then it was not our intent to support this in Lustre 1.2 at all -- we leave the file system method NULL. It looks like we need to define a method which returns -EOPNOTSUPP or similar instead.> I was hoping that Lustre (being POSIX compliant) would handle > record locking across multiple clients. Did I miss something > here ? > If this is a known constraint of the version of Lustre that I am > using, does anyone know what version of Lustre might > have working record locks ? I can''t really test the new > functionality > in Iozone, on top of Lustre, until the filesystem supports record > locking. Any estimate on when this functionality might arrive, > would be appreciated.We tried to be quite clear that record locking is not yet supported, but apparently not loudly enough. We have some patches which implement the feature, but they need review and a better regression test suite. Not very many people ask for record locking yet, so this tends to keep slipping down our priority queue. -Phil
Kumaran Rajaram
2006-May-19 07:36 UTC
[Lustre-discuss] Multiple Lustre clients, record locking -> POSIX ?
Paul, One of the primary optimization that the MPI-IO layer/ distributed I/O middleware does is to coalesce multiple small non-contiguous file requests (prominents in parallel apps) into a single large I/O chunk with holes internally in the middleware. These optimization makes sure that you generate minimal network traffic (single big network packet instead of multiple n/w packet with small requests). Also the disk I/O subsystem can efficiently service large requests instead of numerous small requests (seek + read/writes). For writes we perform read-modify-write on the internal buffer. After we get the require data from the disks, we memcpy the required data to user buffers. So when we require implementing this optimization, the processes lock the specific file subregion to gurantee file consistency. This optimization would requrie file-locking support from underlying file system. We can as well perform as you mentioned separate seeks + read/writes, but its going to drastically reduce the performance. We initially had this model for file-systems not supporting file-locking to guarantee file-consistency, but later invented file-lockign based on message-passing to optimize + guarantee file consistency. -Kums On Tue, 24 Aug 2004, PAulN wrote:> Hi Kums, > If the components of a distributed app are aware of what offsets they''re > permitted to write is this really a problem? Or does this break if the > write I/O''s > are not aligned on page boundaries? > paul > > Kumaran Rajaram wrote: > > >Phil, > > > > We faced similar issues when we tried to access/modify a single file > >concurrently from multiple processes (across multiple clients) using the > >MPI-IO interfaces. We faced similar issues with other file systems as > >well, so we resorted to implementing our own file/record-locking in the > >MPI-IO middleware (on top of file-systems). > > > >-Kums > > > >On Fri, 20 Aug 2004, Phil Schwan wrote: > > > > > > > >>On Fri, 2004-08-20 at 11:42, Capps@iozone.org wrote: > >> > >> > >>> It appears that record locking does > >>> not work across multiple Lustre clients. In fact, it appears > >>> that exclusive write locks are granted to everyone. This > >>> results in the obvious, file corruption. (That''s not good, right > >>>?) > >>> This is happening on Lustre 1.2.1. > >>> > >>> > >>When you say "record locking", are you referring to lockf/flock? If so, > >>then it was not our intent to support this in Lustre 1.2 at all -- we > >>leave the file system method NULL. > >> > >>It looks like we need to define a method which returns -EOPNOTSUPP or > >>similar instead. > >> > >> > >> > >>> I was hoping that Lustre (being POSIX compliant) would handle > >>> record locking across multiple clients. Did I miss something > >>> here ? > >>> If this is a known constraint of the version of Lustre that I am > >>> using, does anyone know what version of Lustre might > >>> have working record locks ? I can''t really test the new > >>>functionality > >>> in Iozone, on top of Lustre, until the filesystem supports record > >>> locking. Any estimate on when this functionality might arrive, > >>> would be appreciated. > >>> > >>> > >>We tried to be quite clear that record locking is not yet supported, but > >>apparently not loudly enough. We have some patches which implement the > >>feature, but they need review and a better regression test suite. Not > >>very many people ask for record locking yet, so this tends to keep > >>slipping down our priority queue. > >> > >>-Phil > >> > >>_______________________________________________ > >>Lustre-discuss mailing list > >>Lustre-discuss@lists.clusterfs.com > >>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss > >> > >> > >> > >_______________________________________________ > >Lustre-discuss mailing list > >Lustre-discuss@lists.clusterfs.com > >https://lists.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > >
HP
2006-May-19 07:36 UTC
[Lustre-discuss] Multiple Lustre clients, record locking ->POSIX ?
Phil, Thanks for the quick reply. Actually I was thinking of something more like: myflock.l_type=F_WRLCK; /* Apply write lock */ myflock.l_whence=SEEK_SET; myflock.l_start=0; myflock.l_len=size; myflock.l_pid=getpid(); ret=fcntl(fd,F_SETLKW, &myflock); /* Lock range */ if(ret < 0) { printf("T1: Lock failed\n"); exit(0); } I would be happy to try out your patchs and provide some testing and feedback. Do you have any estimate on when this functionality will be released ? Thanks, Don Capps ----- Original Message ----- From: "Phil Schwan" <phil@clusterfs.com> To: "Capps@iozone.org" <capps@iozone.org> Cc: <lustre-discuss@lists.clusterfs.com> Sent: Friday, August 20, 2004 11:57 AM Subject: Re: [Lustre-discuss] Multiple Lustre clients, record locking ->POSIX ?> On Fri, 2004-08-20 at 11:42, Capps@iozone.org wrote: > > It appears that record locking does > > not work across multiple Lustre clients. In fact, it appears > > that exclusive write locks are granted to everyone. This > > results in the obvious, file corruption. (That''s not good, right > > ?) > > This is happening on Lustre 1.2.1. > > When you say "record locking", are you referring to lockf/flock? If so, > then it was not our intent to support this in Lustre 1.2 at all -- we > leave the file system method NULL. > > It looks like we need to define a method which returns -EOPNOTSUPP or > similar instead. > > > I was hoping that Lustre (being POSIX compliant) would handle > > record locking across multiple clients. Did I miss something > > here ? > > If this is a known constraint of the version of Lustre that I am > > using, does anyone know what version of Lustre might > > have working record locks ? I can''t really test the new > > functionality > > in Iozone, on top of Lustre, until the filesystem supports record > > locking. Any estimate on when this functionality might arrive, > > would be appreciated. > > We tried to be quite clear that record locking is not yet supported, but > apparently not loudly enough. We have some patches which implement the > feature, but they need review and a better regression test suite. Not > very many people ask for record locking yet, so this tends to keep > slipping down our priority queue. > > -Phil > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >
Kumaran Rajaram
2006-May-19 07:36 UTC
[Lustre-discuss] Multiple Lustre clients, record locking -> POSIX ?
Phil, We faced similar issues when we tried to access/modify a single file concurrently from multiple processes (across multiple clients) using the MPI-IO interfaces. We faced similar issues with other file systems as well, so we resorted to implementing our own file/record-locking in the MPI-IO middleware (on top of file-systems). -Kums On Fri, 20 Aug 2004, Phil Schwan wrote:> On Fri, 2004-08-20 at 11:42, Capps@iozone.org wrote: > > It appears that record locking does > > not work across multiple Lustre clients. In fact, it appears > > that exclusive write locks are granted to everyone. This > > results in the obvious, file corruption. (That''s not good, right > > ?) > > This is happening on Lustre 1.2.1. > > When you say "record locking", are you referring to lockf/flock? If so, > then it was not our intent to support this in Lustre 1.2 at all -- we > leave the file system method NULL. > > It looks like we need to define a method which returns -EOPNOTSUPP or > similar instead. > > > I was hoping that Lustre (being POSIX compliant) would handle > > record locking across multiple clients. Did I miss something > > here ? > > If this is a known constraint of the version of Lustre that I am > > using, does anyone know what version of Lustre might > > have working record locks ? I can''t really test the new > > functionality > > in Iozone, on top of Lustre, until the filesystem supports record > > locking. Any estimate on when this functionality might arrive, > > would be appreciated. > > We tried to be quite clear that record locking is not yet supported, but > apparently not loudly enough. We have some patches which implement the > feature, but they need review and a better regression test suite. Not > very many people ask for record locking yet, so this tends to keep > slipping down our priority queue. > > -Phil > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >
PAulN
2006-May-19 07:36 UTC
[Lustre-discuss] Multiple Lustre clients, record locking -> POSIX ?
Hi Kums, If the components of a distributed app are aware of what offsets they''re permitted to write is this really a problem? Or does this break if the write I/O''s are not aligned on page boundaries? paul Kumaran Rajaram wrote:>Phil, > > We faced similar issues when we tried to access/modify a single file >concurrently from multiple processes (across multiple clients) using the >MPI-IO interfaces. We faced similar issues with other file systems as >well, so we resorted to implementing our own file/record-locking in the >MPI-IO middleware (on top of file-systems). > >-Kums > >On Fri, 20 Aug 2004, Phil Schwan wrote: > > > >>On Fri, 2004-08-20 at 11:42, Capps@iozone.org wrote: >> >> >>> It appears that record locking does >>> not work across multiple Lustre clients. In fact, it appears >>> that exclusive write locks are granted to everyone. This >>> results in the obvious, file corruption. (That''s not good, right >>>?) >>> This is happening on Lustre 1.2.1. >>> >>> >>When you say "record locking", are you referring to lockf/flock? If so, >>then it was not our intent to support this in Lustre 1.2 at all -- we >>leave the file system method NULL. >> >>It looks like we need to define a method which returns -EOPNOTSUPP or >>similar instead. >> >> >> >>> I was hoping that Lustre (being POSIX compliant) would handle >>> record locking across multiple clients. Did I miss something >>> here ? >>> If this is a known constraint of the version of Lustre that I am >>> using, does anyone know what version of Lustre might >>> have working record locks ? I can''t really test the new >>>functionality >>> in Iozone, on top of Lustre, until the filesystem supports record >>> locking. Any estimate on when this functionality might arrive, >>> would be appreciated. >>> >>> >>We tried to be quite clear that record locking is not yet supported, but >>apparently not loudly enough. We have some patches which implement the >>feature, but they need review and a better regression test suite. Not >>very many people ask for record locking yet, so this tends to keep >>slipping down our priority queue. >> >>-Phil >> >>_______________________________________________ >>Lustre-discuss mailing list >>Lustre-discuss@lists.clusterfs.com >>https://lists.clusterfs.com/mailman/listinfo/lustre-discuss >> >> >> >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss@lists.clusterfs.com >https://lists.clusterfs.com/mailman/listinfo/lustre-discuss > >
Capps@iozone.org
2006-May-19 07:36 UTC
[Lustre-discuss] Multiple Lustre clients, record locking -> POSIX ?
Lustre team, I''ve been busy trying to add some new functionality to Iozone, (benchmark) and ran into something. (ouch) The new functionality is to have Iozone go multi-nodal, and then have all of the instances share a single file, that is being presented from Lustre. Each transfer, from both clients, is being protected with record locking. Oops, that''s where things get funky. It appears that record locking does not work across multiple Lustre clients. In fact, it appears that exclusive write locks are granted to everyone. This results in the obvious, file corruption. (That''s not good, right ?) This is happening on Lustre 1.2.1. I was hoping that Lustre (being POSIX compliant) would handle record locking across multiple clients. Did I miss something here ? If this is a known constraint of the version of Lustre that I am using, does anyone know what version of Lustre might have working record locks ? I can''t really test the new functionality in Iozone, on top of Lustre, until the filesystem supports record locking. Any estimate on when this functionality might arrive, would be appreciated. Thanks, Don Capps capps@iozone.org P.S. Someday I hope to have Lustre on my Ipaq. Now that would be cooler than a talking frog :-)