> We have installed and are currently testing a central Infiniband lustre
> filesystem shared between two different clusters. The lustre OSS and MDS
> are running on some nodes of a specific I/O server cluster.
>
> The infiniband fabric is a bit exotic as it interconnects several
> clusters with the Lustre cluster. We want to optimize infiniband routes
> between the Lustre clients and the OSS nodes. We saw that routes between
> 2 nodes generated by the subnet manager are different for each each
> direction.
>
> So we need to understand how lustre IO read/write requests between Lustre
> clients and OSS are "translated" in Infiniband requests. What
are the
> Infiniband low level protocols used by the driver ?
The lustre/lnet IB drivers (o2iblnd is preferred) use RC queuepairs between
every pair of nodes. Each QP is configured with 16 buffers for receiving
small (up to 4K) messages and a credit flow control protocol ensures that
we never send a message unless a buffer is posted to receive it. Bulk data
(i.e. anything that can''t fit into a "small" message) is sent
via RDMA,
using message passing just to set up the RDMA and signal completion. So a
> What kind of IB requests are issued when a Lustre client make a
"READ" or
> WRITE operations have you some documentation available ?
Client -> Server: lustre WRITE RPC request message
Client <- Server: RDMA setup message
Client -> Server: RDMA + completion message
Client <- Server: lustre WRITE RPC reply message
Someone else might know if/where the lustre RPC is documented - I''m
afraid
I don''t have any documentation for the IB LNDs to offer you.
>
> thanks for your help
>
> Philippe Gregoire.
>