Hi All, I have a small lustre setup having 1 MDS, 2 OSS and all of the three machine also have lustre client mounted. Once the lustre client knows about the stripe information of a file, will it directly communicate to OSS? If yes, is there any optimization possible if lustre client learns that the data is in the same machine (when acting as OST) ? Thanks, J
> I have a small lustre setup having 1 MDS, 2 OSS and all of the > three machine also have lustre client mounted.Note that running the Lustre client on a Lustre server is not recommnended because there can be a resource deadlock between the client and server modules (involving the cache IIRC). However I suspect that this problem occurs only on OSSes, and I suspect that it is negligible on the MDS. Which makes me suspect that running some types of workloads on the MDS as a client may give some advantages.> Once the lustre client knows about the stripe information of a > file, will it directly communicate to OSS?After fetching the file metadata from (one of) the MDS(es) Lustre clients always communicate directly with the OSS(es) involved. That''s the whole point of having distinct metadata and data servers. A coarse way of understanding Lustre and similar filesystem types is to imagine that Lustre clients can "mount" invidual files (or strips of a file) from an OSS, and the MDS is the server with the automount maps.> If yes, is there any optimization possible if lustre client > learns that the data is in the same machine (when acting as > OST) ?Not directly. RPCs may be rather faster locally than over a network interface.
On Wed, Feb 15, 2012 at 5:00 PM, Peter Grandi <pg_lus at lus.for.sabi.co.uk> wrote:>> I have a small lustre setup having 1 MDS, 2 OSS and all of the >> three machine also have lustre client mounted. > > Note that running the Lustre client on a Lustre server is not > recommnended because there can be a resource deadlock between > the client and server modules (involving the cache IIRC). > > However I suspect that this problem occurs only on OSSes, and I > suspect that it is negligible on the MDS. > > Which makes me suspect that running some types of workloads on > the MDS as a client may give some advantages. > >> Once the lustre client knows about the stripe information of a >> file, will it directly communicate to OSS? > > After fetching the file metadata from (one of) the MDS(es) > Lustre clients always communicate directly with the OSS(es) > involved. ?That''s the whole point of having distinct metadata > and data servers. >Right, I agree.> A coarse way of understanding Lustre and similar filesystem > types is to imagine that Lustre clients can "mount" invidual > files (or strips of a file) from an OSS, and the MDS is the > server with the automount maps. > >> If yes, is there any optimization possible if lustre client >> learns that the data is in the same machine (when acting as >> OST) ? > > Not directly. RPCs may be rather faster locally than over a > network interface.Hmm, but it will still involve multiple memory copy operations to transfer data from _local_ client to _local_ OST, correct? Can that be avoided? or The client is smart enough to share the memory when it knows the client and OST are the same machine. Thanks J
Andreas Dilger
2012-Feb-15 19:10 UTC
[Lustre-discuss] Lustre client and OST on same machine
On 2012-02-15, at 3:58 AM, Jack David wrote:> On Wed, Feb 15, 2012 at 5:00 PM, Peter Grandi <pg_lus at lus.for.sabi.co.uk> wrote: >>> Once the lustre client knows about the stripe information of a >>> file, will it directly communicate to OSS? >>> >>> If yes, is there any optimization possible if lustre client >>> learns that the data is in the same machine (when acting as >>> OST) ? >> >> Not directly. RPCs may be rather faster locally than over a >> network interface. > > Hmm, but it will still involve multiple memory copy operations to > transfer data from _local_ client to _local_ OST, correct? Can that be > avoided? or The client is smart enough to share the memory when it > knows the client and OST are the same machine.There is still a memory copy, but it is internal to the LNET code for the lo at 0 interface and is not being passed over e.g. the TCP loopback or ethernet lo interface. It would be desirable to have a more efficient OSC->OSD interface for local IO, but as yet there hasn''t been any work in this direction. Cheers, Andreas -- Andreas Dilger Whamcloud, Inc. Principal Lustre Engineer http://www.whamcloud.com/