thr3ads.net - Lustre devel - [Lustre-devel] Ponderings on Lustre development [Apr 2006]

If this information is useful, please help other people find it:
Share via:
Jonathan Day
2006-Apr-20 22:02 UTC
[Lustre-devel] Ponderings on Lustre development

Hi,

Here would seem to be a good place to ask my somewhat
arcane questions on Lustre development. These are
intended not only as a "is this officially planned",
but also an "is anyone else coding this" and also an
"if I were to work on this, am I more likely to see
interest from others, or just gentlemen with a white
jacket for me to try on".

First up, underlying network protocols. At present, if
I understand it correctly, Lustre uses TCP as the
underlying mechanism. These days, Linux sports other
reliable TCP-like protocols (such as DCCP), but it is
unclear if there would be any benefit to Lustre using
them. How hard would it be to add support for TCP-like
protocols? Is there any current work on determining if
there would be any obvious benefit in providing such
support?

Associated with this is something I noted in the docs,
that there''s a lot of RDMA and other fairly hefty
network activity. This is "obvious" enough. However,
it must get very complicated if many machines attempt
to access relatively nearby regions of disk. Other
than the group I''m working with, I know of nobody who
is doing RDMA over reliable multicast - it''s all
unicast.

The upshot of this is that it would be hard to escape
exponential degradation of performance for cases where
you have a lot of reads in parallel. It would need to
be in parallel as any staggering in the access would
mean only one node would listen to the multicast
anyway. As such, both the problem and the solution are
special cases, and thus are simply not going to
coincide that often. The question would then be one of
"how often do these cases coincide in real world
situations?" and "assuming there to be enough
incidents to make the solution interesting, would the
added weight outweigh any benefits it might have for
Lustre?"

Now onto the even dodgier ground of third-party
extensions to Linux. There are two that look like they
might be of interest, in a Lustre-enabled world. The
first is ABISS - the Active Block I/O Scheduling
System - which purports to provide soft real-time and
prioritized best-effort block device access, with a
better scheduler than CFQ (which I notice Lustre
patches). It might be interesting to see if ABISS''
scheduler works better for Lustre than CFQ.

The second extension is SGI''s "Scheduled Transfer
Protocol". At the time it was developed, nobody could
think of a use for it that couldn''t be done better by
other means, so they dropped it. However, it occurs to
me that a clustered filesystem would be a case in
which pre-negotiation would have some value. There
may, therefore, be code that could be salvaged from
STP and placed into Lustre, or merely ideas that could
be recycled from one of their papers on the protocol.

Finally, there''s optimization and bug-hunting. (Bugs?
No, no bugs here! :) Sandia has an interesting project
called DAKOTA, for analyzing and profiling massively
parallel projects. Web100 does TCP instrumentation.
kTAU provides yet another set of kernel profiling
tools. It would seem that providing hooks to one or
more of these projects would allow both internal and
external developers to compile a greater range of
statistics on what happens inside the Lustre
framework. (Linux Trace Toolkit would have been
interesting, but it would require serious voodoo to
get its corpse re-animated.)


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com
Lustre devel - Apr 2006 - Ponderings on Lustre development

[Lustre-devel] Ponderings on Lustre development