When running an earlier version of lustre (1.4.6) we had partial mpi-io support. Since upgrading to 1.4.11, that seems to have disappeared. The version that we''re running is a local compile & 1.4.6 was provided by Sun when we deployed the machine. After searching the list archives I see references to the adio driver in the contrib tree & I suspect that the person who built the version that we''re running didn''t include it. Is that the only missing component or do I need to do anything else to get mpi-io to work? Thanks in advance. -- Jim Williams, CISSP CCNA PGP Key: http://pgp.mit.edu/ HPC Systems Analyst ID 0x4C4D9F19 Arctic Region Supercomputing Center University of Alaska Fairbanks 909 Koyukuk Dr, Suite 105 PO Box 756020 Voice: 907-450-8623 Fairbanks, AK 99775-6020 Fax: 907-450-8605 # find / -iname base -exec chown -R us {} \;
Jim Williams wrote:> When running an earlier version of lustre (1.4.6) we had partial mpi-io support. > Since upgrading to 1.4.11, that seems to have disappeared. The version that > we''re running is a local compile & 1.4.6 was provided by Sun when we deployed > the machine. After searching the list archives I see references to the adio > driver in the contrib tree & I suspect that the person who built the version > that we''re running didn''t include it. Is that the only missing component or do > I need to do anything else to get mpi-io to work? Thanks in advance. >I am not sure what kind of mpi-io support you want. Actualy, without lustre adio driver you can still built and install mpi-io, ufs adio driver will be used in this case. We are working on lustre adio driver, you can track this on bug 12521. The new version lustre adio driver should be available in 1 or 2 month. If you want try original lustre adio driver, it should be in lustre/lustre/contrib on 1.4.11 and 1.6.3. Thanks WangDi
Hi, Jim, The MPI-IO patch that was deposited in the lustre contrib tree has been continuously maintained and developed at ORNL. You may refer to this webpage for the latest version that is compatible with the latest MPICH2-1.0.6p1. I have users reported successes about it. # http://ft.ornl.gov/projects/io/ --> Code Downoad and Readme Some adventurous features are forthcoming too :) -- Weikuan http://ft.ornl.gov/~wyu/ On Jan 11, 2008 8:13 PM, Jim Williams <jwilliam at arsc.edu> wrote:> When running an earlier version of lustre (1.4.6) we had partial mpi-io > support. > Since upgrading to 1.4.11, that seems to have disappeared. The version > that > we''re running is a local compile & 1.4.6 was provided by Sun when we > deployed > the machine. After searching the list archives I see references to the > adio > driver in the contrib tree & I suspect that the person who built the > version > that we''re running didn''t include it. Is that the only missing component > or do > I need to do anything else to get mpi-io to work? Thanks in advance. > -- > Jim Williams, CISSP CCNA PGP Key: http://pgp.mit.edu/ > HPC Systems Analyst ID 0x4C4D9F19 > Arctic Region Supercomputing Center University of Alaska Fairbanks > 909 Koyukuk Dr, Suite 105 > PO Box 756020 Voice: 907-450-8623 > Fairbanks, AK 99775-6020 Fax: 907-450-8605 > > # find / -iname base -exec chown -R us {} \; > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080112/18bbe404/attachment-0002.html
On Jan 12, 2008 11:17 -0500, Weikuan Yu wrote:> The MPI-IO patch that was deposited in the lustre contrib tree has been > continuously maintained and developed at ORNL. You may refer to this webpage > for the latest version that is compatible with the latest MPICH2-1.0.6p1. I > have users reported successes about it. > > # http://ft.ornl.gov/projects/io/ --> Code Downoad and ReadmeTom, it probably makes sense to add this new patch to the CVS lustre/contrib so it is more likely to be in the hands of Lustre users. Please also add the above URL to lustre/contrib/README so users can check for updates. Weikuan, has this updated version been submitted to MPICH2 yet? Is there any reason NOT to submit it to the upstream MPICH2 maintainers so all users get the benefit of this work?> On Jan 11, 2008 8:13 PM, Jim Williams <jwilliam at arsc.edu> wrote: > > When running an earlier version of lustre (1.4.6) we had partial mpi-io > > support. > > Since upgrading to 1.4.11, that seems to have disappeared. The version > > that we''re running is a local compile & 1.4.6 was provided by Sun when we > > deployed the machine. After searching the list archives I see references > > to the adio driver in the contrib tree & I suspect that the person who > > built the version that we''re running didn''t include it. Is that the only > > missing component or do I need to do anything else to get mpi-io to work?Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:> > Weikuan, > has this updated version been submitted to MPICH2 yet? Is there any reason > NOT to submit it to the upstream MPICH2 maintainers so all users get the > benefit of this work? >Yes, indeed it has been submitted to MPICH2 maintainers. It is my hope that it would be incorporated into the upcoming release of MPICH2 soon. Robert L., could you comment here? I will try to provide info as needed. Thanks, --Weikuan http://ft.ornl.gov/~wyu/
Hi WangDi, 12521 is a private bugzilla. Could it be possible to access it ? Generally speaking, do you know performance problem with ufs adio driver with small I/Os from multiple clients and if it is the case should it be enhanced with lustre adio driver ? Do some bugzilla exist about this problem ? Regards C?dric Lambert Tom.Wang a ?crit :> Jim Williams wrote: > >> When running an earlier version of lustre (1.4.6) we had partial mpi-io support. >> Since upgrading to 1.4.11, that seems to have disappeared. The version that >> we''re running is a local compile & 1.4.6 was provided by Sun when we deployed >> the machine. After searching the list archives I see references to the adio >> driver in the contrib tree & I suspect that the person who built the version >> that we''re running didn''t include it. Is that the only missing component or do >> I need to do anything else to get mpi-io to work? Thanks in advance. >> >> > I am not sure what kind of mpi-io support you want. Actualy, without > lustre adio driver you can still built and install mpi-io, ufs adio driver > will be used in this case. We are working on lustre adio driver, you can > track > this on bug 12521. The new version lustre adio driver should be available > in 1 or 2 month. If you want try original lustre adio driver, it should > be in > lustre/lustre/contrib on 1.4.11 and 1.6.3. > > Thanks > WangDi > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
C?dric Lambert wrote:> Hi WangDi, > > 12521 is a private bugzilla. Could it be possible to access it ? >Oh, this adio driver is developed by Oak Ridge National lib and SUN together.> Generally speaking, do you know performance problem with ufs adio driver with small I/Os from multiple clients and if it is the case should it be enhanced with lustre adio driver ? >Yes, most of the filesystem does not like this kind of small I/Os from mulitple clients. UFS driver will not reorganize these data from multiple clients, if it is not interleave. Lustre ADIO driver will reorganize these small piece of data no matter whether it is interleaved. And also it will reorganize the data according where the data is located,(which ost) instead of split the data evenly when redistributing the data to each clients. Thanks WangDi> Do some bugzilla exist about this problem ? >Hmm, It seems other bug could not resolve this effectively if the small I/O from multi-clients, except ADIO driver. Because only adio driver could regroup these data from different client currently, and also might write it by data-sieving to avoid small I/O. Thanks WangDi> Regards > C?dric Lambert > > Tom.Wang a ?crit : > >> Jim Williams wrote: >> >> >>> When running an earlier version of lustre (1.4.6) we had partial mpi-io support. >>> Since upgrading to 1.4.11, that seems to have disappeared. The version that >>> we''re running is a local compile & 1.4.6 was provided by Sun when we deployed >>> the machine. After searching the list archives I see references to the adio >>> driver in the contrib tree & I suspect that the person who built the version >>> that we''re running didn''t include it. Is that the only missing component or do >>> I need to do anything else to get mpi-io to work? Thanks in advance. >>> >>> >>> >> I am not sure what kind of mpi-io support you want. Actualy, without >> lustre adio driver you can still built and install mpi-io, ufs adio driver >> will be used in this case. We are working on lustre adio driver, you can >> track >> this on bug 12521. The new version lustre adio driver should be available >> in 1 or 2 month. If you want try original lustre adio driver, it should >> be in >> lustre/lustre/contrib on 1.4.11 and 1.6.3. >> >> Thanks >> WangDi >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
On Mon, Jan 14, 2008 at 02:43:49PM -0500, Weikuan Yu wrote:> Andreas Dilger wrote: > > > > Weikuan, > > has this updated version been submitted to MPICH2 yet? Is there any reason > > NOT to submit it to the upstream MPICH2 maintainers so all users get the > > benefit of this work? > > > > Yes, indeed it has been submitted to MPICH2 maintainers. It is my hope > that it would be incorporated into the upcoming release of MPICH2 soon. > > Robert L., could you comment here? I will try to provide info as needed.I added the bulk of Weikuan''s ad_lustre driver to ROMIO a few weeks ago. It will be in the upcomming MPICH2-1.0.7 release. Thanks for the contribution! ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B
Robert Latham wrote:> On Mon, Jan 14, 2008 at 02:43:49PM -0500, Weikuan Yu wrote: >> Andreas Dilger wrote: >>> Weikuan, >>> has this updated version been submitted to MPICH2 yet? Is there any reason >>> NOT to submit it to the upstream MPICH2 maintainers so all users get the >>> benefit of this work? >>> >> Yes, indeed it has been submitted to MPICH2 maintainers. It is my hope >> that it would be incorporated into the upcoming release of MPICH2 soon. >> >> Robert L., could you comment here? I will try to provide info as needed. > > I added the bulk of Weikuan''s ad_lustre driver to ROMIO a few weeks > ago. It will be in the upcomming MPICH2-1.0.7 release. > > Thanks for the contribution! >Great! Thanks for making this happen, Rob! -- Weikuan Yu <+> 1-865-574-7990 http://ft.ornl.gov/~wyu/
I''m not actually sure what ROMIO abstract device the multiple CFS deployments I utilize were defined with. Probably just UFS, or maybe NFS. Did you have a recommended option yourself. Besides the fact that most of the adio that were created over the years are completely obsolete and could be cleaned from ROMIO, what will the new one for Lustre offer? Particularly with respect to controls via the lfs utility that I can already get? Marty Barnaby Weikuan Yu wrote:> Robert Latham wrote: > >> On Mon, Jan 14, 2008 at 02:43:49PM -0500, Weikuan Yu wrote: >> >>> Andreas Dilger wrote: >>> >>>> Weikuan, >>>> has this updated version been submitted to MPICH2 yet? Is there any reason >>>> NOT to submit it to the upstream MPICH2 maintainers so all users get the >>>> benefit of this work? >>>> >>>> >>> Yes, indeed it has been submitted to MPICH2 maintainers. It is my hope >>> that it would be incorporated into the upcoming release of MPICH2 soon. >>> >>> Robert L., could you comment here? I will try to provide info as needed. >>> >> I added the bulk of Weikuan''s ad_lustre driver to ROMIO a few weeks >> ago. It will be in the upcomming MPICH2-1.0.7 release. >> >> Thanks for the contribution! >> >> > > Great! Thanks for making this happen, Rob! > > -- > Weikuan Yu <+> 1-865-574-7990 > http://ft.ornl.gov/~wyu/ > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080311/c2d156dc/attachment-0002.html
On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote:> I''m not actually sure what ROMIO abstract device the multiple CFS > deployments I utilize were defined with. Probably just UFS, or maybe NFS. > Did you have a recommended option yourself.The UFS driver is the one used for Lustre if no other one exists.> Besides the fact that most of the adio that were created over the years are > completely obsolete and could be cleaned from ROMIO, what will the new one > for Lustre offer? Particularly with respect to controls via the lfs utility > that I can already get?There is improved collective IO that aligns the IO on Lustre stripe boundaries. Also the hints given to the MPIIO layer (before open, not after) result in lustre picking a better stripe count/size.> Marty Barnaby > > Weikuan Yu wrote: >> Robert Latham wrote: >> >>> On Mon, Jan 14, 2008 at 02:43:49PM -0500, Weikuan Yu wrote: >>> >>>> Andreas Dilger wrote: >>>> >>>>> Weikuan, >>>>> has this updated version been submitted to MPICH2 yet? Is there any reason >>>>> NOT to submit it to the upstream MPICH2 maintainers so all users get the >>>>> benefit of this work? >>>>> >>>>> >>>> Yes, indeed it has been submitted to MPICH2 maintainers. It is my hope >>>> that it would be incorporated into the upcoming release of MPICH2 soon. >>>> >>>> Robert L., could you comment here? I will try to provide info as needed. >>>> >>> I added the bulk of Weikuan''s ad_lustre driver to ROMIO a few weeks >>> ago. It will be in the upcomming MPICH2-1.0.7 release. >>> >>> Thanks for the contribution! >>> >>> >> >> Great! Thanks for making this happen, Rob! >> >> -- >> Weikuan Yu <+> 1-865-574-7990 >> http://ft.ornl.gov/~wyu/ >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >> I''m not actually sure what ROMIO abstract device the multiple CFS >> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >> Did you have a recommended option yourself. > > The UFS driver is the one used for Lustre if no other one exists. > >> Besides the fact that most of the adio that were created over the years are >> completely obsolete and could be cleaned from ROMIO, what will the new one >> for Lustre offer? Particularly with respect to controls via the lfs utility >> that I can already get? > > There is improved collective IO that aligns the IO on Lustre stripe > boundaries. Also the hints given to the MPIIO layer (before open, > not after) result in lustre picking a better stripe count/size. >In addition, the one integrated into MPICH2-1.0.7 contains direct I/O support. Lockless I/O support was purged out due into my lack of confidence in low-level file system support. But it can be revived when possible. -- Weikuan Yu <+> 1-865-574-7990 http://ft.ornl.gov/~wyu/
To return to this discussion, in recent testing, I have found that writing to a Lustre FS via a higher level library, like PNetCDF, fails because the default for value for romio_ds_write is not disable. This is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c I believe it has something to do with locking issues. I''m not sure how best to handle this, I''d prefer the data sieving default be disable, though I don''t know all the implications there. Maybe an ad_lustre_open should be a place where the _ds_ hints are set to disable. Marty Barnaby Weikuan Yu wrote:> Andreas Dilger wrote: > >> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >> >>> I''m not actually sure what ROMIO abstract device the multiple CFS >>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >>> Did you have a recommended option yourself. >>> >> The UFS driver is the one used for Lustre if no other one exists. >> >> >>> Besides the fact that most of the adio that were created over the years are >>> completely obsolete and could be cleaned from ROMIO, what will the new one >>> for Lustre offer? Particularly with respect to controls via the lfs utility >>> that I can already get? >>> >> There is improved collective IO that aligns the IO on Lustre stripe >> boundaries. Also the hints given to the MPIIO layer (before open, >> not after) result in lustre picking a better stripe count/size. >> >> > > In addition, the one integrated into MPICH2-1.0.7 contains direct I/O > support. Lockless I/O support was purged out due into my lack of > confidence in low-level file system support. But it can be revived when > possible. > > -- > Weikuan Yu <+> 1-865-574-7990 > http://ft.ornl.gov/~wyu/ > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080508/72fadc52/attachment-0001.html
On Thu, May 08, 2008 at 12:35:11PM -0600, Marty Barnaby wrote:> To return to this discussion, in recent testing, I have found that writing > to a Lustre FS via a higher level library, like PNetCDF, fails because the > default for value for romio_ds_write is not disable. This is set in the > mpich code in the file /src/mpi/romio/adio/common/ad_hints.c > > I believe it has something to do with locking issues. I''m not sure how best > to handle this, I''d prefer the data sieving default be disable, though I > don''t know all the implications there. Maybe an ad_lustre_open should be a > place where the _ds_ hints are set to disable.I''m not sure why you want to turn off the data sieving optimization for libraries such as pnetcdf and HDF5 which are almost certainly going to exercise said optimization. However, if you really want to, just set up whatever you want in your own MPI_Info structure and pass it to ncmpi_open, ncmpi_create, or H5Pset_fapl_mpio. It sounds like you have a lot more information about what does and does not work for your site. I''m not sure it''s appropriate to unilaterally turn off this hint for all or even most codes running against all lustre deployments. Don''t get me wrong... there is undoubtedly a lot of lustre-specific hint-tuning that can be done in ad_lustre. ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B
Hi Marty Barnaby wrote:> To return to this discussion, in recent testing, I have found that > writing to a Lustre FS via a higher level library, like PNetCDF, fails > because the default for value for romio_ds_write is not disable. This > is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.cYou can use MPI_Info_set to disable romio_ds_write. What is the fail? flock? since data-sieving need flock.> > I believe it has something to do with locking issues. I''m not sure how > best to handle this, I''d prefer the data sieving default be disable, > though I don''t know all the implications there.I agree data sieving should be disable. And also it check the contiguous buftype or filetype only by fileview, which is not enough sometimes, and trigger unnecessary read-modify-write even for contiguous write(especially for those higher level library, if you choose collective write). Since lustre has client cache and also the overhead of flock and read-modify-write, so I doubt the performance improvements we could get from data-sieving on lustre, although I do not have performance data to prove that.> Maybe an ad_lustre_open should be a place where the _ds_ hints are > set to disable.Yes, we should disable this for stride write in lustre. ad_lustre_open seems a right place to do this. Thanks WangDi> > Marty Barnaby > > > Weikuan Yu wrote: >> Andreas Dilger wrote: >> >>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >>> >>>> I''m not actually sure what ROMIO abstract device the multiple CFS >>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >>>> Did you have a recommended option yourself. >>>> >>> The UFS driver is the one used for Lustre if no other one exists. >>> >>> >>>> Besides the fact that most of the adio that were created over the years are >>>> completely obsolete and could be cleaned from ROMIO, what will the new one >>>> for Lustre offer? Particularly with respect to controls via the lfs utility >>>> that I can already get? >>>> >>> There is improved collective IO that aligns the IO on Lustre stripe >>> boundaries. Also the hints given to the MPIIO layer (before open, >>> not after) result in lustre picking a better stripe count/size. >>> >>> >> >> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O >> support. Lockless I/O support was purged out due into my lack of >> confidence in low-level file system support. But it can be revived when >> possible. >> >> -- >> Weikuan Yu <+> 1-865-574-7990 >> http://ft.ornl.gov/~wyu/ >> >> > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Regards, Tom Wangdi -- Sun Lustre Group System Software Engineer http://www.sun.com
Hi Marty Barnaby wrote:> To return to this discussion, in recent testing, I have found that > writing to a Lustre FS via a higher level library, like PNetCDF, fails > because the default for value for romio_ds_write is not disable. This > is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.cYou can use MPI_Info_set to disable romio_ds_write. What is the fail? flock? since data-sieving need flock.> > I believe it has something to do with locking issues. I''m not sure how > best to handle this, I''d prefer the data sieving default be disable, > though I don''t know all the implications there.I agree data sieving should be disable. And also it check the contiguous buftype or filetype only by fileview, which is not enough sometimes, and trigger unnecessary read-modify-write even for contiguous write(especially for those higher level library, if you choose collective write). Since lustre has client cache and also the overhead of flock and read-modify-write, so I doubt the performance improvements we could get from data-sieving on lustre, although I do not have performance data to prove that.> Maybe an ad_lustre_open should be a place where the _ds_ hints are > set to disable.Yes, we should disable this for stride write in lustre. ad_lustre_open seems a right place to do this. Thanks WangDi> > Marty Barnaby > > Weikuan Yu wrote: >> Andreas Dilger wrote: >> >>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >>> >>>> I''m not actually sure what ROMIO abstract device the multiple CFS >>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >>>> Did you have a recommended option yourself. >>>> >>> The UFS driver is the one used for Lustre if no other one exists. >>> >>> >>>> Besides the fact that most of the adio that were created over the years are >>>> completely obsolete and could be cleaned from ROMIO, what will the new one >>>> for Lustre offer? Particularly with respect to controls via the lfs utility >>>> that I can already get? >>>> >>> There is improved collective IO that aligns the IO on Lustre stripe >>> boundaries. Also the hints given to the MPIIO layer (before open, >>> not after) result in lustre picking a better stripe count/size. >>> >>> >> >> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O >> support. Lockless I/O support was purged out due into my lack of >> confidence in low-level file system support. But it can be revived when >> possible. >> >> -- >> Weikuan Yu <+> 1-865-574-7990 >> http://ft.ornl.gov/~wyu/ >> >> > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Regards, Tom Wangdi -- Sun Lustre Group System Software Engineer http://www.sun.com
Marty, If my understand is right, when multiple clients issue non-collective I/O and if their data buffer is a vector of small non-overlapping file regions, instead of performing ''n'' seeks + read/write ROMIO uses data sieving algorithm. For data-sieving write, first the extent of request is read into big buffer and respective write vectors memcpy''d into big buffer and then single BIG write is performed. Prior to performing data-sieving write, ROMIO locks the portion of the file pertaining to data-sieving buff-size, does seek + write, and then unlocks the file-range. This ensures the file integrity. ROMIO relies on ADIO-FS specific locking (in this case Lustre). So if the underlying file-system does not support fcntl() lock, then you see errors when the extent of the non-collective writes from multiple clients overlap. The easy solution, would be to replace non-collective MPI-IO calls with collective I/O MPI-IO calls. The two phase collective I/O algorithm should ensure file integrity and does not rely on file-locking since each process writes to a big non-overlapping region during the second phase. Or if you have to use non-collective I/O, may be implement ad_lustre fcntl exclusive lock using i) fcntl(EXCL_LOCK) --> open(lock_file, O_CREATE | O_EXCL) + close fcntl(UNLOCK) --> unlink(lock_file) ii) fcntl(EXCL_LOCK) --> MPI_Win_Lock() fcntl(EXCL_LOCK) --> MPI_Win_Unlock() Ofcourse you need to create a one-sided shared buffer in rank 0 when the file is opened MPI_File_Open + buffer destroyed during MPI_File_close() HTH, -Kums ________________________________ From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Marty Barnaby Sent: Thursday, May 08, 2008 12:35 PM Cc: lustre-discuss at clusterfs.com Subject: Re: [Lustre-discuss] mpi-io support To return to this discussion, in recent testing, I have found that writing to a Lustre FS via a higher level library, like PNetCDF, fails because the default for value for romio_ds_write is not disable. This is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c I believe it has something to do with locking issues. I''m not sure how best to handle this, I''d prefer the data sieving default be disable, though I don''t know all the implications there. Maybe an ad_lustre_open should be a place where the _ds_ hints are set to disable. Marty Barnaby Weikuan Yu wrote: Andreas Dilger wrote: On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: I''m not actually sure what ROMIO abstract device the multiple CFS deployments I utilize were defined with. Probably just UFS, or maybe NFS. Did you have a recommended option yourself. The UFS driver is the one used for Lustre if no other one exists. Besides the fact that most of the adio that were created over the years are completely obsolete and could be cleaned from ROMIO, what will the new one for Lustre offer? Particularly with respect to controls via the lfs utility that I can already get? There is improved collective IO that aligns the IO on Lustre stripe boundaries. Also the hints given to the MPIIO layer (before open, not after) result in lustre picking a better stripe count/size. In addition, the one integrated into MPICH2-1.0.7 contains direct I/O support. Lockless I/O support was purged out due into my lack of confidence in low-level file system support. But it can be revived when possible. -- Weikuan Yu <+> 1-865-574-7990 http://ft.ornl.gov/~wyu/ <http://ft.ornl.gov/%7Ewyu/> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080508/e9fa2281/attachment.html
hello, I am having similar struggles with locking on MPI-IO. I am doing a simple strided write, and it fails because of the locking. I''m a bit behind in the discussion, but is there a way to fix (workaround) this problem?? Is this something in my code, or the default driver (this is on lonestar at TACC)? I have even downloaded the most up to date version of MPICH, which I believe has a new Lustre ADIO driver, but I am running into the same issues. Any thoughts would be greatly appreciated!! Phil On Thu, 8 May 2008, Tom.Wang wrote:> Hi > > Marty Barnaby wrote: >> To return to this discussion, in recent testing, I have found that >> writing to a Lustre FS via a higher level library, like PNetCDF, fails >> because the default for value for romio_ds_write is not disable. This >> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c > You can use MPI_Info_set to disable romio_ds_write. What is the fail? > flock? since data-sieving need flock. >> >> I believe it has something to do with locking issues. I''m not sure how >> best to handle this, I''d prefer the data sieving default be disable, >> though I don''t know all the implications there. > I agree data sieving should be disable. And also it check the contiguous > buftype or filetype only by fileview, which is not enough sometimes, and > trigger unnecessary read-modify-write even for contiguous > write(especially for those higher level library, if you choose > collective write). Since lustre has client cache and also the overhead > of flock and read-modify-write, so I doubt the performance improvements > we could get from data-sieving on lustre, although I do not have > performance data to prove that. >> Maybe an ad_lustre_open should be a place where the _ds_ hints are >> set to disable. > Yes, we should disable this for stride write in lustre. ad_lustre_open > seems a right place to do this. > > Thanks > WangDi >> >> Marty Barnaby >> >> >> Weikuan Yu wrote: >>> Andreas Dilger wrote: >>> >>>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >>>> >>>>> I''m not actually sure what ROMIO abstract device the multiple CFS >>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >>>>> Did you have a recommended option yourself. >>>>> >>>> The UFS driver is the one used for Lustre if no other one exists. >>>> >>>> >>>>> Besides the fact that most of the adio that were created over the years are >>>>> completely obsolete and could be cleaned from ROMIO, what will the new one >>>>> for Lustre offer? Particularly with respect to controls via the lfs utility >>>>> that I can already get? >>>>> >>>> There is improved collective IO that aligns the IO on Lustre stripe >>>> boundaries. Also the hints given to the MPIIO layer (before open, >>>> not after) result in lustre picking a better stripe count/size. >>>> >>>> >>> >>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O >>> support. Lockless I/O support was purged out due into my lack of >>> confidence in low-level file system support. But it can be revived when >>> possible. >>> >>> -- >>> Weikuan Yu <+> 1-865-574-7990 >>> http://ft.ornl.gov/~wyu/ >>> >>> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > -- > Regards, > Tom Wangdi > -- > Sun Lustre Group > System Software Engineer > http://www.sun.com > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Phil, If you are having the same problems I''ve had, I would offer to try the advise that some have given below. I am working with several layers of which I am not the owner, but I have the source and can make edits. For me, it is reasonable to call my own, explicit MPI_info_set during initialization, for the hints, romio_ds_write and romio_ds_read changing both their respective values to ''disable''. How these defaults are initialized in the ROMIO code in adio/common/ad_hints.c (for these two, specifically, ''enable'') is the only best documentation I have found on matter. I''ve never seen anything describing all the hints available, and the syntax and semantics for the acceptable values. I don''t fully understand data sieving, but I believe it is an older paradigm, and not applicable to our current, high-performance, large-distribution, parallel FS. My suggestion was that, at least here, with Lustre, and it''s new abstract device routines, the _ds_ be set to disable, so I don''t have to find a place in every new library I deal with to set it explicitly myself. Marty Phil Dickens wrote:> hello, > > I am having similar struggles with locking on MPI-IO. > I am doing a simple strided write, and it fails because > of the locking. I''m a bit behind in the discussion, but > is there a way to fix (workaround) this problem?? Is this > something in my code, or the default driver (this is on > lonestar at TACC)? I have even downloaded the most up to date > version of MPICH, which I believe has a new Lustre ADIO > driver, but I am running into the same issues. > > Any thoughts would be greatly appreciated!! > > Phil > > > On Thu, 8 May 2008, Tom.Wang wrote: > > >> Hi >> >> Marty Barnaby wrote: >> >>> To return to this discussion, in recent testing, I have found that >>> writing to a Lustre FS via a higher level library, like PNetCDF, fails >>> because the default for value for romio_ds_write is not disable. This >>> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c >>> >> You can use MPI_Info_set to disable romio_ds_write. What is the fail? >> flock? since data-sieving need flock. >> >>> I believe it has something to do with locking issues. I''m not sure how >>> best to handle this, I''d prefer the data sieving default be disable, >>> though I don''t know all the implications there. >>> >> I agree data sieving should be disable. And also it check the contiguous >> buftype or filetype only by fileview, which is not enough sometimes, and >> trigger unnecessary read-modify-write even for contiguous >> write(especially for those higher level library, if you choose >> collective write). Since lustre has client cache and also the overhead >> of flock and read-modify-write, so I doubt the performance improvements >> we could get from data-sieving on lustre, although I do not have >> performance data to prove that. >> >>> Maybe an ad_lustre_open should be a place where the _ds_ hints are >>> set to disable. >>> >> Yes, we should disable this for stride write in lustre. ad_lustre_open >> seems a right place to do this. >> >> Thanks >> WangDi >> >>> Marty Barnaby >>> >>> >>> Weikuan Yu wrote: >>> >>>> Andreas Dilger wrote: >>>> >>>> >>>>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >>>>> >>>>> >>>>>> I''m not actually sure what ROMIO abstract device the multiple CFS >>>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >>>>>> Did you have a recommended option yourself. >>>>>> >>>>>> >>>>> The UFS driver is the one used for Lustre if no other one exists. >>>>> >>>>> >>>>> >>>>>> Besides the fact that most of the adio that were created over the years are >>>>>> completely obsolete and could be cleaned from ROMIO, what will the new one >>>>>> for Lustre offer? Particularly with respect to controls via the lfs utility >>>>>> that I can already get? >>>>>> >>>>>> >>>>> There is improved collective IO that aligns the IO on Lustre stripe >>>>> boundaries. Also the hints given to the MPIIO layer (before open, >>>>> not after) result in lustre picking a better stripe count/size. >>>>> >>>>> >>>>> >>>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O >>>> support. Lockless I/O support was purged out due into my lack of >>>> confidence in low-level file system support. But it can be revived when >>>> possible. >>>> >>>> -- >>>> Weikuan Yu <+> 1-865-574-7990 >>>> http://ft.ornl.gov/~wyu/ >>>> >>>> >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> -- >> Regards, >> Tom Wangdi >> -- >> Sun Lustre Group >> System Software Engineer >> http://www.sun.com >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080509/3a8ca828/attachment-0001.html
Hi, Marty and Phil, Since Lustre ADIO driver is new to MPICH2-1.0.7, it may still have bugs. One way to check if it is because of Lustre ADIO is to force ROMIO to use UFS (Unix file system) driver. This can be achieved by adding prefix "ufs:" to the file name. Note that data sieving will still be enabled by default when using UFS. As a pnetcdf developer, I am interested in the problem Marty had. I also run pnetcdf codes on Lustre but so far have not seen a problem that is related to file locking. I wonder if it is possible for you to provide a test code to reporduce the error. Wei-keng On Fri, 9 May 2008, Marty Barnaby wrote:> Phil, > > If you are having the same problems I''ve had, I would offer to try the > advise that some have given below. I am working with several layers of > which I am not the owner, but I have the source and can make edits. For > me, it is reasonable to call my own, explicit MPI_info_set during > initialization, for the hints, romio_ds_write and romio_ds_read changing > both their respective values to ''disable''. How these defaults are > initialized in the ROMIO code in adio/common/ad_hints.c (for these two, > specifically, ''enable'') is the only best documentation I have found on > matter. I''ve never seen anything describing all the hints available, and > the syntax and semantics for the acceptable values. > > I don''t fully understand data sieving, but I believe it is an older > paradigm, and not applicable to our current, high-performance, > large-distribution, parallel FS. My suggestion was that, at least here, > with Lustre, and it''s new abstract device routines, the _ds_ be set to > disable, so I don''t have to find a place in every new library I deal > with to set it explicitly myself. > > > Marty > > > > Phil Dickens wrote: > > hello, > > > > I am having similar struggles with locking on MPI-IO. > > I am doing a simple strided write, and it fails because > > of the locking. I''m a bit behind in the discussion, but > > is there a way to fix (workaround) this problem?? Is this > > something in my code, or the default driver (this is on > > lonestar at TACC)? I have even downloaded the most up to date > > version of MPICH, which I believe has a new Lustre ADIO > > driver, but I am running into the same issues. > > > > Any thoughts would be greatly appreciated!! > > > > Phil > > > > > > On Thu, 8 May 2008, Tom.Wang wrote: > > > > > > > Hi > > > > > > Marty Barnaby wrote: > > > > > > > To return to this discussion, in recent testing, I have found that > > > > writing to a Lustre FS via a higher level library, like PNetCDF, fails > > > > because the default for value for romio_ds_write is not disable. This > > > > is set in the mpich code in the file > > > > /src/mpi/romio/adio/common/ad_hints.c > > > > > > > You can use MPI_Info_set to disable romio_ds_write. What is the fail? > > > flock? since data-sieving need flock. > > > > > > > I believe it has something to do with locking issues. I''m not sure how > > > > best to handle this, I''d prefer the data sieving default be disable, > > > > though I don''t know all the implications there. > > > > > > > I agree data sieving should be disable. And also it check the contiguous > > > buftype or filetype only by fileview, which is not enough sometimes, and > > > trigger unnecessary read-modify-write even for contiguous > > > write(especially for those higher level library, if you choose > > > collective write). Since lustre has client cache and also the overhead > > > of flock and read-modify-write, so I doubt the performance improvements > > > we could get from data-sieving on lustre, although I do not have > > > performance data to prove that. > > > > > > > Maybe an ad_lustre_open should be a place where the _ds_ hints are > > > > set to disable. > > > > > > > Yes, we should disable this for stride write in lustre. ad_lustre_open > > > seems a right place to do this. > > > > > > Thanks > > > WangDi > > > > > > > Marty Barnaby > > > > > > > > > > > > Weikuan Yu wrote: > > > > > > > > > Andreas Dilger wrote: > > > > > > > > > > > > > > > > On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: > > > > > > > > > > > > > > > > > > > I''m not actually sure what ROMIO abstract device the multiple CFS > > > > > > > deployments I utilize were defined with. Probably just UFS, or > > > > > > > maybe NFS. > > > > > > > Did you have a recommended option yourself. > > > > > > > > > > > > > > > > > > > > The UFS driver is the one used for Lustre if no other one exists. > > > > > > > > > > > > > > > > > > > > > > > > > Besides the fact that most of the adio that were created over the > > > > > > > years are > > > > > > > completely obsolete and could be cleaned from ROMIO, what will the > > > > > > > new one > > > > > > > for Lustre offer? Particularly with respect to controls via the > > > > > > > lfs utility > > > > > > > that I can already get? > > > > > > > > > > > > > > > > > > > > There is improved collective IO that aligns the IO on Lustre stripe > > > > > > boundaries. Also the hints given to the MPIIO layer (before open, > > > > > > not after) result in lustre picking a better stripe count/size. > > > > > > > > > > > > > > > > > > > > > > > In addition, the one integrated into MPICH2-1.0.7 contains direct I/O > > > > > support. Lockless I/O support was purged out due into my lack of > > > > > confidence in low-level file system support. But it can be revived > > > > > when > > > > > possible. > > > > > > > > > > -- > > > > > Weikuan Yu <+> 1-865-574-7990 > > > > > http://ft.ornl.gov/~wyu/ > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > >>> _______________________________________________ > > > > Lustre-discuss mailing list > > > > Lustre-discuss at lists.lustre.org > > > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > > > > -- > > > Regards, > > > Tom Wangdi > > > -- > > > Sun Lustre Group > > > System Software Engineer > > > http://www.sun.com > > > > >> _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss at lists.lustre.org > > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > > > > >
Hi, Phil If lustre client is mounted with -o localflock or -o flock, you will not met this problem. Otherwise you could either Use posix write instead of MPI or use MPI_Info_Set to disable data_sieving, ierr = MPI_Info_create(&FILE_INFO_TEMPLATE); ......... ierr = MPI_Info_set(FILE_INFO_TEMPLATE, "romio_ds_write", disable); I do not think you can avoid this in the current release lustre ADIO driver. Thanks Phil Dickens wrote:> hello, > > I am having similar struggles with locking on MPI-IO. > I am doing a simple strided write, and it fails because > of the locking. I''m a bit behind in the discussion, but > is there a way to fix (workaround) this problem?? Is this > something in my code, or the default driver (this is on > lonestar at TACC)? I have even downloaded the most up to date > version of MPICH, which I believe has a new Lustre ADIO > driver, but I am running into the same issues. > > Any thoughts would be greatly appreciated!! > > Phil > > > On Thu, 8 May 2008, Tom.Wang wrote: > > >> Hi >> >> Marty Barnaby wrote: >> >>> To return to this discussion, in recent testing, I have found that >>> writing to a Lustre FS via a higher level library, like PNetCDF, fails >>> because the default for value for romio_ds_write is not disable. This >>> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c >>> >> You can use MPI_Info_set to disable romio_ds_write. What is the fail? >> flock? since data-sieving need flock. >> >>> I believe it has something to do with locking issues. I''m not sure how >>> best to handle this, I''d prefer the data sieving default be disable, >>> though I don''t know all the implications there. >>> >> I agree data sieving should be disable. And also it check the contiguous >> buftype or filetype only by fileview, which is not enough sometimes, and >> trigger unnecessary read-modify-write even for contiguous >> write(especially for those higher level library, if you choose >> collective write). Since lustre has client cache and also the overhead >> of flock and read-modify-write, so I doubt the performance improvements >> we could get from data-sieving on lustre, although I do not have >> performance data to prove that. >> >>> Maybe an ad_lustre_open should be a place where the _ds_ hints are >>> set to disable. >>> >> Yes, we should disable this for stride write in lustre. ad_lustre_open >> seems a right place to do this. >> >> Thanks >> WangDi >> >>> Marty Barnaby >>> >>> >>> Weikuan Yu wrote: >>> >>>> Andreas Dilger wrote: >>>> >>>> >>>>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: >>>>> >>>>> >>>>>> I''m not actually sure what ROMIO abstract device the multiple CFS >>>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. >>>>>> Did you have a recommended option yourself. >>>>>> >>>>>> >>>>> The UFS driver is the one used for Lustre if no other one exists. >>>>> >>>>> >>>>> >>>>>> Besides the fact that most of the adio that were created over the years are >>>>>> completely obsolete and could be cleaned from ROMIO, what will the new one >>>>>> for Lustre offer? Particularly with respect to controls via the lfs utility >>>>>> that I can already get? >>>>>> >>>>>> >>>>> There is improved collective IO that aligns the IO on Lustre stripe >>>>> boundaries. Also the hints given to the MPIIO layer (before open, >>>>> not after) result in lustre picking a better stripe count/size. >>>>> >>>>> >>>>> >>>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O >>>> support. Lockless I/O support was purged out due into my lack of >>>> confidence in low-level file system support. But it can be revived when >>>> possible. >>>> >>>> -- >>>> Weikuan Yu <+> 1-865-574-7990 >>>> http://ft.ornl.gov/~wyu/ >>>> >>>> >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> -- >> Regards, >> Tom Wangdi >> -- >> Sun Lustre Group >> System Software Engineer >> http://www.sun.com >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Regards, Tom Wangdi -- Sun Lustre Group System Software Engineer http://www.sun.com
On May 09, 2008 09:41 -0400, Phil Dickens wrote:> I am having similar struggles with locking on MPI-IO. > I am doing a simple strided write, and it fails because > of the locking. I''m a bit behind in the discussion, but > is there a way to fix (workaround) this problem?? Is this > something in my code, or the default driver (this is on > lonestar at TACC)? I have even downloaded the most up to date > version of MPICH, which I believe has a new Lustre ADIO > driver, but I am running into the same issues. > > Any thoughts would be greatly appreciated!!One possibility is to mount the clients with "-o localflock", leaving all of the locking internal to Lustre. This in essence provides single-node flock (i.e. coherent on that node, but not across all clients). The other alternative is "-o flock", which is coherent locking across all clients, but has a noticable performance impact and may affect stability, depending on the version of lustre being used (newer is better of course). I''m not positive of the internals of the MPI-IO code, whether it depends on flock providing a barrier across nodes, or if it does this only for e.g. NFS not keeping writes coherent so they don''t clobber the same page when writing. Tom is the expert here...> On Thu, 8 May 2008, Tom.Wang wrote: > > > Hi > > > > Marty Barnaby wrote: > >> To return to this discussion, in recent testing, I have found that > >> writing to a Lustre FS via a higher level library, like PNetCDF, fails > >> because the default for value for romio_ds_write is not disable. This > >> is set in the mpich code in the file /src/mpi/romio/adio/common/ad_hints.c > > You can use MPI_Info_set to disable romio_ds_write. What is the fail? > > flock? since data-sieving need flock. > >> > >> I believe it has something to do with locking issues. I''m not sure how > >> best to handle this, I''d prefer the data sieving default be disable, > >> though I don''t know all the implications there. > > I agree data sieving should be disable. And also it check the contiguous > > buftype or filetype only by fileview, which is not enough sometimes, and > > trigger unnecessary read-modify-write even for contiguous > > write(especially for those higher level library, if you choose > > collective write). Since lustre has client cache and also the overhead > > of flock and read-modify-write, so I doubt the performance improvements > > we could get from data-sieving on lustre, although I do not have > > performance data to prove that. > >> Maybe an ad_lustre_open should be a place where the _ds_ hints are > >> set to disable. > > Yes, we should disable this for stride write in lustre. ad_lustre_open > > seems a right place to do this. > > > > Thanks > > WangDi > >> > >> Marty Barnaby > >> > >> > >> Weikuan Yu wrote: > >>> Andreas Dilger wrote: > >>> > >>>> On Mar 11, 2008 16:10 -0600, Marty Barnaby wrote: > >>>> > >>>>> I''m not actually sure what ROMIO abstract device the multiple CFS > >>>>> deployments I utilize were defined with. Probably just UFS, or maybe NFS. > >>>>> Did you have a recommended option yourself. > >>>>> > >>>> The UFS driver is the one used for Lustre if no other one exists. > >>>> > >>>> > >>>>> Besides the fact that most of the adio that were created over the years are > >>>>> completely obsolete and could be cleaned from ROMIO, what will the new one > >>>>> for Lustre offer? Particularly with respect to controls via the lfs utility > >>>>> that I can already get? > >>>>> > >>>> There is improved collective IO that aligns the IO on Lustre stripe > >>>> boundaries. Also the hints given to the MPIIO layer (before open, > >>>> not after) result in lustre picking a better stripe count/size. > >>>> > >>>> > >>> > >>> In addition, the one integrated into MPICH2-1.0.7 contains direct I/O > >>> support. Lockless I/O support was purged out due into my lack of > >>> confidence in low-level file system support. But it can be revived when > >>> possible. > >>> > >>> -- > >>> Weikuan Yu <+> 1-865-574-7990 > >>> http://ft.ornl.gov/~wyu/ > >>> > >>> > >> > >> ------------------------------------------------------------------------ > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > > > > > > -- > > Regards, > > Tom Wangdi > > -- > > Sun Lustre Group > > System Software Engineer > > http://www.sun.com > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Hi, Andreas Dilger wrote:> On May 09, 2008 09:41 -0400, Phil Dickens wrote: > >> I am having similar struggles with locking on MPI-IO. >> I am doing a simple strided write, and it fails because >> of the locking. I''m a bit behind in the discussion, but >> is there a way to fix (workaround) this problem?? Is this >> something in my code, or the default driver (this is on >> lonestar at TACC)? I have even downloaded the most up to date >> version of MPICH, which I believe has a new Lustre ADIO >> driver, but I am running into the same issues. >> >> Any thoughts would be greatly appreciated!! >> > One possibility is to mount the clients with "-o localflock", leaving all > of the locking internal to Lustre. This in essence provides single-node > flock (i.e. coherent on that node, but not across all clients). The other > alternative is "-o flock", which is coherent locking across all clients, > but has a noticable performance impact and may affect stability, depending > on the version of lustre being used (newer is better of course). > > I''m not positive of the internals of the MPI-IO code, whether it depends > on flock providing a barrier across nodes, or if it does this only for > e.g. NFS not keeping writes coherent so they don''t clobber the same page > when writing. > > Tom is the expert here... >It uses flock to keep the coherence of read-modify-write for data_sieving. You can use ierr = MPI_Info_create(&FILE_INFO_TEMPLATE); ............. ierr = MPI_Info_set(FILE_INFO_TEMPLATE, "romio_ds_write", disable); to disable data_sieving in your application to avoid flock for stride_write. Thanks WangDi> > >-- Regards, Tom Wangdi -- Sun Lustre Group System Software Engineer http://www.sun.com
Many thanks to Marty, Wei-Keng, Tom and Andreas for their very helpful support (I hope I didn''t leave anyone out!). Thanks very much, and I think I can now address the problem. Phil On Fri, 9 May 2008, Tom.Wang wrote:> Hi, > > Andreas Dilger wrote: >> On May 09, 2008 09:41 -0400, Phil Dickens wrote: >> >>> I am having similar struggles with locking on MPI-IO. >>> I am doing a simple strided write, and it fails because >>> of the locking. I''m a bit behind in the discussion, but >>> is there a way to fix (workaround) this problem?? Is this >>> something in my code, or the default driver (this is on >>> lonestar at TACC)? I have even downloaded the most up to date >>> version of MPICH, which I believe has a new Lustre ADIO >>> driver, but I am running into the same issues. >>> >>> Any thoughts would be greatly appreciated!! >>> >> One possibility is to mount the clients with "-o localflock", leaving all >> of the locking internal to Lustre. This in essence provides single-node >> flock (i.e. coherent on that node, but not across all clients). The other >> alternative is "-o flock", which is coherent locking across all clients, >> but has a noticable performance impact and may affect stability, depending >> on the version of lustre being used (newer is better of course). >> >> I''m not positive of the internals of the MPI-IO code, whether it depends >> on flock providing a barrier across nodes, or if it does this only for >> e.g. NFS not keeping writes coherent so they don''t clobber the same page >> when writing. >> >> Tom is the expert here... >> > It uses flock to keep the coherence of read-modify-write for data_sieving. > > You can use > > ierr = MPI_Info_create(&FILE_INFO_TEMPLATE); > > ............. > ierr = MPI_Info_set(FILE_INFO_TEMPLATE, "romio_ds_write", disable); > > to disable data_sieving in your application to avoid flock for stride_write. > > Thanks > WangDi >> >> > > > -- > Regards, > Tom Wangdi -- > Sun Lustre Group > System Software Engineer http://www.sun.com >
Hi, An optimized Lustre ADIO collective write driver is developed by ORNL and SUN Lustre group. We improve the relationship between collective MPI-IO and the Lustre striping. The recent FLASH I/O benchmarking results on Jaguar show that our new driver gives significant performance improvements in the following graph. In this test, we compare ufs independent driver, ufs optimized collective driver to lustre adio driver. At present, the patch for mpich2-1.0.7 is worked out and maintained in HEAD branch. It should be released soon. Thanks, LiuYing Jim Williams wrote:> When running an earlier version of lustre (1.4.6) we had partial mpi-io support. > Since upgrading to 1.4.11, that seems to have disappeared. The version that > we''re running is a local compile & 1.4.6 was provided by Sun when we deployed > the machine. After searching the list archives I see references to the adio > driver in the contrib tree & I suspect that the person who built the version > that we''re running didn''t include it. Is that the only missing component or do > I need to do anything else to get mpi-io to work? Thanks in advance. >-- Best regards, LiuYing System Software Engineer, Lustre Group Sun Microsystems ( China ) Co. Limited -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081028/e82a1e34/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: flashio_performance.png Type: image/png Size: 34048 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081028/e82a1e34/attachment-0001.png