Hi all, I am a student of applied Computer Science at the University of Heidelberg and I am currently working on my master thesis about distributed file systems on networks capable of Remote Direct Memory Access. I work at the Computer Architecture Group, which develops a High Speed Interconnect for High Performance Computing called Extoll. I have experience in developing Linux kernel modules, as I work on Extolls driver stack, including a kernel level API to access the Extoll device. Our aim is to implement a LND which supports Extoll. So I started to build the current lustre-master and the patched kernel from scratch, which worked very well, thanks to the good documentation online. After consulting the Lustre Manual, some guides and papers including "The Lustre Storage Architecture" and "Understanding Lustre Filesystem Internals" I did not find an comprehensive overview for the API a LND has to implement. Although the function names are easy to find, its not clear what the expected behaviour of the function is, or with which ranges of input data they have to deal. Also, the Doxygen generated from the LNet source files was not very helpful. So my question is: where do I start to implement a LND? And: how can I test a LND without recompiling the whole Lustre RPM? Thanks Tobias Groschup
On Wed, Aug 07, 2013 at 12:48:04PM +0200, Tobias Groschup wrote:> ...... > After consulting the Lustre Manual, some guides and papers including > "The Lustre Storage Architecture" and "Understanding Lustre Filesystem > Internals" I did not find an comprehensive overview for the API a LND > has to implement. Although the function names are easy to find, its not > clear what the expected behaviour of the function is, or with which > ranges of input data they have to deal.There''s no document on the LND API as far as I know.> Also, the Doxygen generated from the LNet source files was not very helpful.That documents the external LNet API, not the LND API.> So my question is: where do I start to implement a LND?I''d suggest to read lnet/lnet/lo.c to get a basic understanding of the LND API. That''s the implementation of a simple loopback LND, similar to the Linux lo if device. If you''re not already familiar with the LNet API, it''d be helpful to understand its semantics, which is well documented by the LNet Doxygen comments. It''s a difficult task to build a new LND. So far we have two 3rd party LNDs contributed to Lustre. If you search the list archives for mxlnd or gnilnd, you''d find some previous discussions on the LND API.> And: how can I test a LND without recompiling the whole Lustre RPM?You can test it with the LNet selftest tool. You''d only need to compile and replace the LND.ko to test it. - Isaac
Thanks for the pointers! The code of the loopback and Myrinet LND helped a great deal, as did more reading on the LNet API. After reading, I think that it is possible to implement an, albeit simple, LND in the limits of a master thesis. If I have further questions about the development of a LND (and I am sure there will be more questions): is the lustre-devel mailing list the right place to ask? If not, where could I get help? Thanks again Tobias
On Wed, Aug 14, 2013 at 02:20:26PM +0200, Tobias Groschup wrote:> Thanks for the pointers! > > The code of the loopback and Myrinet LND helped a great deal, as did > more reading on the LNet API. > After reading, I think that it is possible to implement an, albeit > simple, LND in the limits of a master thesis.The current ones have grown complex due to the LND protocol fixes over the years and the constraint that LND wire protocol must be backward compatible. There''s also quirks in underlying protocol/hardware that each LND had to deal with. A brand new one should not come even close in complexity.> If I have further questions about the development of a LND (and I am > sure there will be more questions): is the lustre-devel mailing list > the right place to ask? If not, where could I get help?Please also CC: https://lists.01.org/mailman/listinfo/hpdd-discuss - Isaac