Hi everyone, I''m not sure, whether the lustre or the MPI forum is the right place for my question. The question is about the ROMIO optimization on Lustre, In one SC''08 paper, http://users.eecs.northwestern.edu/~wkliao/PAPERS/fd_sc08_revised.pdf<https://mail.ttu.edu/owa/redir.aspx?C=yUmbVUH4hUWLFEWFA2GcoiKOEhnhitAIatZfGT92-aN2MTXitjDjPgfE9EfJkJF9q3XAaOQ_iME.&URL=http%3a%2f%2fusers.eecs.northwestern.edu%2f%7ewkliao%2fPAPERS%2ffd_sc08_revised.pdf> , it''s said that the way ROMIO assigns the file domains to I/O aggregators will not make two aggregators access the same OST. In my understanding, this means, the data locality on Lustre layer has been taken care of in the ROMIO, such that the aggregators will not have competition on the same OST. My question is "is this optimization used in all current lustre system, e.g., Hopper at NERSC?" Thanks, Jailin Liu Ph. D student Texas Tech University _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Sat, Sep 21, 2013 at 11:21:19PM -0500, Jaln wrote:> Hi everyone, > > I''m not sure, whether the lustre or the MPI forum is the right place for my > question.both, i guess :>> The question is about the ROMIO optimization on Lustre, > In one SC''08 paper, > http://users.eecs.northwestern.edu/~wkliao/PAPERS/fd_sc08_revised.pdf<https://mail.ttu.edu/owa/redir.aspx?C=yUmbVUH4hUWLFEWFA2GcoiKOEhnhitAIatZfGT92-aN2MTXitjDjPgfE9EfJkJF9q3XAaOQ_iME.&URL=http%3a%2f%2fusers.eecs.northwestern.edu%2f%7ewkliao%2fPAPERS%2ffd_sc08_revised.pdf> > , it''s said that the way ROMIO assigns the file domains to I/O aggregators > will not make two aggregators access the same OST. > > In my understanding, this means, the data locality on Lustre layer has been > taken care of in the ROMIO, such that the aggregators will not > have competition on the same OST. > > My question is "is this optimization used in all current lustre system, > e.g., Hopper at NERSC?"Wei-keng never contributed the specific ROMIO optimizations he discussed in the SC 08 paper, but his work did spur a lot of community discussion and contributions. Emoly Lu contributed a bunch of Lustre ADIO driver work, which Pascal Deveze and Martin Pokorny improved upon. MPICH-1.3 and newer contain these improvements. David Knaak from Cray implemented his own improvements. Cray''s MPI-IO is based on ROMIO but the cray modifications are proprietary. MPT-3.2 and newer contain lustre-specific optimizations. The community has been quiet with respect to Lustre MPI-IO work since then. I hope that''s because everything "just works". ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
Interesting, Thanks Rob, So I can assume the Hopper( a cray XE6 with MPT 3.2) contains the lustre-specific optimizations? Does it work both for read and write? Jailin On Sun, Sep 22, 2013 at 2:00 PM, Rob Latham <robl-zhzY/AYb9nBYTrM/R70HSA@public.gmane.org> wrote:> On Sat, Sep 21, 2013 at 11:21:19PM -0500, Jaln wrote: > > Hi everyone, > > > > I''m not sure, whether the lustre or the MPI forum is the right place for > my > > question. > > both, i guess :> > > > The question is about the ROMIO optimization on Lustre, > > In one SC''08 paper, > > http://users.eecs.northwestern.edu/~wkliao/PAPERS/fd_sc08_revised.pdf< > https://mail.ttu.edu/owa/redir.aspx?C=yUmbVUH4hUWLFEWFA2GcoiKOEhnhitAIatZfGT92-aN2MTXitjDjPgfE9EfJkJF9q3XAaOQ_iME.&URL=http%3a%2f%2fusers.eecs.northwestern.edu%2f%7ewkliao%2fPAPERS%2ffd_sc08_revised.pdf > > > > , it''s said that the way ROMIO assigns the file domains to I/O > aggregators > > will not make two aggregators access the same OST. > > > > In my understanding, this means, the data locality on Lustre layer has > been > > taken care of in the ROMIO, such that the aggregators will not > > have competition on the same OST. > > > > My question is "is this optimization used in all current lustre system, > > e.g., Hopper at NERSC?" > > Wei-keng never contributed the specific ROMIO optimizations he discussed in > the SC 08 paper, but his work did spur a lot of community discussion > and contributions. > > Emoly Lu contributed a bunch of Lustre ADIO driver work, which Pascal > Deveze and Martin Pokorny improved upon. MPICH-1.3 and newer contain > these improvements. > > David Knaak from Cray implemented his own improvements. Cray''s MPI-IO > is based on ROMIO but the cray modifications are proprietary. MPT-3.2 > and newer contain lustre-specific optimizations. > > The community has been quiet with respect to Lustre MPI-IO work since > then. I hope that''s because everything "just works". > > ==rob > > -- > Rob Latham > Mathematics and Computer Science Division > Argonne National Lab, IL USA >-- Genius only means hard-working all one''s life _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Sun, Sep 22, 2013 at 08:41:21PM -0500, Jaln wrote:> Interesting, Thanks Rob, > So I can assume the Hopper( a cray XE6 with MPT 3.2) contains the > lustre-specific optimizations?Hopper does. Just to be clear, don''t ask for 3.2: run the newest possible MPT. You''ll have to experiment a bit with the settings described in the intro_mpi man page. In my experience, things that are documented as the default are not actually the default. It''s quite frustrating, but the hopper staff is quite good at answering site-specific MPI-IO questions.> Does it work both for read and write?I don''t know for sure (it''s a Cray, and I have not see the source code), but most of the focus was on writes since in the write path there are lustre-level locks to deal with. In the read path, lustre-level locks don''t really come into play. ==rob> Jailin > > > On Sun, Sep 22, 2013 at 2:00 PM, Rob Latham <robl-zhzY/AYb9nBYTrM/R70HSA@public.gmane.org> wrote: > > > On Sat, Sep 21, 2013 at 11:21:19PM -0500, Jaln wrote: > > > Hi everyone, > > > > > > I''m not sure, whether the lustre or the MPI forum is the right place for > > my > > > question. > > > > both, i guess :> > > > > > The question is about the ROMIO optimization on Lustre, > > > In one SC''08 paper, > > > http://users.eecs.northwestern.edu/~wkliao/PAPERS/fd_sc08_revised.pdf< > > https://mail.ttu.edu/owa/redir.aspx?C=yUmbVUH4hUWLFEWFA2GcoiKOEhnhitAIatZfGT92-aN2MTXitjDjPgfE9EfJkJF9q3XAaOQ_iME.&URL=http%3a%2f%2fusers.eecs.northwestern.edu%2f%7ewkliao%2fPAPERS%2ffd_sc08_revised.pdf > > > > > > , it''s said that the way ROMIO assigns the file domains to I/O > > aggregators > > > will not make two aggregators access the same OST. > > > > > > In my understanding, this means, the data locality on Lustre layer has > > been > > > taken care of in the ROMIO, such that the aggregators will not > > > have competition on the same OST. > > > > > > My question is "is this optimization used in all current lustre system, > > > e.g., Hopper at NERSC?" > > > > Wei-keng never contributed the specific ROMIO optimizations he discussed in > > the SC 08 paper, but his work did spur a lot of community discussion > > and contributions. > > > > Emoly Lu contributed a bunch of Lustre ADIO driver work, which Pascal > > Deveze and Martin Pokorny improved upon. MPICH-1.3 and newer contain > > these improvements. > > > > David Knaak from Cray implemented his own improvements. Cray''s MPI-IO > > is based on ROMIO but the cray modifications are proprietary. MPT-3.2 > > and newer contain lustre-specific optimizations. > > > > The community has been quiet with respect to Lustre MPI-IO work since > > then. I hope that''s because everything "just works". > > > > ==rob > > > > -- > > Rob Latham > > Mathematics and Computer Science Division > > Argonne National Lab, IL USA > > > > >-- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA