Sébastien Buisson
2008-Oct-23 14:06 UTC
[Lustre-discuss] Shared pointers, file locking and MPI-IO
Hi all, We are using MPI-IO on Lustre, so that we can achieve good performance when writing or reading the same file from multiple processes and clients. In the shared pointers read/write routines, MPI-IO makes an intensive use of file locks (for instance, see the file src/mpi/romio/adio/common/ad_get_sh_fp.c from mpich2-1.0.7). The problem is that file locking support in Lustre is very limited today: - there is the localflock option for the local mode. In this mode, locks are coherent on a node (a single-node flock), but not across all clients; - there is the flock option for the consistent mode. In this mode, locks are coherent across all clients. But the Lustre Operations Manual includes a big warning about this mode, saying that it can affect performance and stability. That being said, does anybody in the Lustre community know about current alternatives to the MPI-IO shared pointers read/write routines, which do not use the file locking mechanism but provide the same functionality? By the way, do we have an idea of the Lustre version that will provide the reliable version of the flock consistent mode? Best regards, Sebastien.
Andreas Dilger
2008-Oct-27 19:20 UTC
[Lustre-discuss] Shared pointers, file locking and MPI-IO
On Oct 23, 2008 16:06 +0200, S?bastien Buisson wrote:> We are using MPI-IO on Lustre, so that we can achieve good performance > when writing or reading the same file from multiple processes and clients. > In the shared pointers read/write routines, MPI-IO makes an intensive > use of file locks (for instance, see the file > src/mpi/romio/adio/common/ad_get_sh_fp.c from mpich2-1.0.7). The problem > is that file locking support in Lustre is very limited today: > - there is the localflock option for the local mode. In this mode, locks > are coherent on a node (a single-node flock), but not across all clients; > - there is the flock option for the consistent mode. In this mode, locks > are coherent across all clients. But the Lustre Operations Manual > includes a big warning about this mode, saying that it can affect > performance and stability. > > That being said, does anybody in the Lustre community know about current > alternatives to the MPI-IO shared pointers read/write routines, which do > not use the file locking mechanism but provide the same functionality?Tom Wang is our MPI-IO expert, and has done a lot of work to improve the performance of MPI-IO with Lustre, so he can probably answer this question better than I can.> By the way, do we have an idea of the Lustre version that will provide > the reliable version of the flock consistent mode?There are in fact a number of users that enable the flock support in conjunction with MPI-IO already. The reason we haven''t made it enabled for general usage is that there may still be some bad interactions with flock under less well-behaved applications. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:> On Oct 23, 2008 16:06 +0200, S?bastien Buisson wrote: > >> We are using MPI-IO on Lustre, so that we can achieve good performance >> when writing or reading the same file from multiple processes and clients. >> In the shared pointers read/write routines, MPI-IO makes an intensive >> use of file locks (for instance, see the file >> src/mpi/romio/adio/common/ad_get_sh_fp.c from mpich2-1.0.7). The problem >> is that file locking support in Lustre is very limited today: >> - there is the localflock option for the local mode. In this mode, locks >> are coherent on a node (a single-node flock), but not across all clients; >> - there is the flock option for the consistent mode. In this mode, locks >> are coherent across all clients. But the Lustre Operations Manual >> includes a big warning about this mode, saying that it can affect >> performance and stability. >> >> That being said, does anybody in the Lustre community know about current >> alternatives to the MPI-IO shared pointers read/write routines, which do >> not use the file locking mechanism but provide the same functionality? >> > > Tom Wang is our MPI-IO expert, and has done a lot of work to improve the > performance of MPI-IO with Lustre, so he can probably answer this question > better than I can. >Hello, As I know, there are no other ways to implement this shared pointers, in MPI-IO. You must rely on this flock currently, if you want to use the shared file pointer. Considering each client will maintain the file offset itself for each file, so if MPI want to achieve this shared file-pointer over multi-clients (also atomic), it seems flock is the only way here. I also enable flock in my own cluster sometimes, it works fine in most case. But I only ran IOR or flash IO. But these benchmarks does not touch shared file pointers. Could you please describe how you use this shared file pointer in your application? Thanks WangDi> >> By the way, do we have an idea of the Lustre version that will provide >> the reliable version of the flock consistent mode? >> > > There are in fact a number of users that enable the flock support in > conjunction with MPI-IO already. The reason we haven''t made it enabled > for general usage is that there may still be some bad interactions with > flock under less well-behaved applications. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Robert Latham
2008-Nov-12 17:46 UTC
[Lustre-discuss] Shared pointers, file locking and MPI-IO
On Thu, Oct 23, 2008 at 04:06:33PM +0200, S?bastien Buisson wrote:> That being said, does anybody in the Lustre community know about current > alternatives to the MPI-IO shared pointers read/write routines, which do > not use the file locking mechanism but provide the same functionality? > > By the way, do we have an idea of the Lustre version that will provide > the reliable version of the flock consistent mode?Hi Sebastien We implemented shared file pointers in ROMIO with MPI one-sided operations a few years back. It''s possibly more work than you had in mind, but you can get really good performance if you coordinate with these memory regions instead of through the file system. http://www.mcs.anl.gov/~robl/papers/latham_rmaops.pdf That paper also discsusses an approach to the ordered-mode operations that uses MPI_SCAN to share file offset information instead of locking a file. We could not make these part of our MPI-IO implementation because it is common for an MPI implementation to not make "background progress" in this passive-target RMA mode, and if a process went off and did something computationally intensive for a very long time it could hang up all the other processes. But if you as the application know such an "out to lunch" situation won''t happen, then you''ll have no problems with this approach. Hope these help you out ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B