thr3ads.net - Lustre discuss - [Lustre-discuss] poor lustre wan performance [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Dardo D Kleiner - CONTRACTOR

2009-Nov-10 13:02 UTC

[Lustre-discuss] poor lustre wan performance

(Cross posting here in the hopes of finding a wider audience)

SLES11 x86_64/Lustre 1.8.1.1

options ko2iblnd map_on_demand=31 peer_credits=128 credits=256
concurrent_sends=256 \
	ntx=512 fmr_pool_size=512 fmr_flush_trigger=384 fmr_cache=1

Local write performance is ok, 300-400 MB/sec - however with any substantial
latency
performance tanks (10-30 MB/sec).  Only thing I can see that''s relevant
is that although
the rpc sizes are good, the number of write rpcs in flight never goes above 1,
e.g.
with 30 ms latency:

snapshot_time:         1257795116.522124 (secs.usecs)
read RPCs in flight:  0
write RPCs in flight: 1
pending write pages:  256
pending read pages:   0

                         read                    write
pages per rpc         rpcs   % cum % |       rpcs   % cum %
1:                       0   0   0   |          0   0   0
2:                       0   0   0   |          0   0   0
4:                       0   0   0   |          0   0   0
8:                       0   0   0   |          0   0   0
16:                      0   0   0   |          0   0   0
32:                      0   0   0   |          0   0   0
64:                      0   0   0   |          0   0   0
128:                     0   0   0   |          0   0   0
256:                     0   0   0   |        100 100 100

                         read                    write
rpcs in flight        rpcs   % cum % |       rpcs   % cum %
0:                       0   0   0   |        100 100 100

                         read                    write
offset                rpcs   % cum % |       rpcs   % cum %
0:                       0   0   0   |        100 100 100

At this point it clearly doesn''t matter if I mess with
max_rpcs_in_flight which used
to be a way to mitigate the high BDP.

Are there new parameters and/or tunings for ko2iblnd we''re supposed to
be using?  Did
something change with 1.8.1.1 in this regard - I''m trying to determine
if it was our
move to SLES11 or something else?  Our operational deployment is not yet at this
latest version but are wary to upgrade since I''ve indicated
I''m having problems.

Any suggestions greatly appreciated...

- Dardo

Isaac Huang

2009-Nov-10 15:25 UTC

head link

[Lustre-discuss] poor lustre wan performance

On Tue, Nov 10, 2009 at 08:02:03AM -0500, Dardo D Kleiner - CONTRACTOR
wrote:> ......
> At this point it clearly doesn''t matter if I mess with
max_rpcs_in_flight which used
> to be a way to mitigate the high BDP.
> 
> Are there new parameters and/or tunings for ko2iblnd we''re
supposed to be using?  Did
> something change with 1.8.1.1 in this regard - I''m trying to
determine if it was our
> move to SLES11 or something else?  Our operational deployment is not yet at
this
> latest version but are wary to upgrade since I''ve indicated
I''m having problems.
As far as I know, there''s been no change to the ko2iblnd since the
map_on_demand feature which you''re already using.

Though it might seem likely that somehow Lustre services were not
creating enough RPCs to keep the pipe full, it''d be easy and quick to
run a LNet selftest to narrow it down (if you haven''t done so
already). If a bulk write test from the same client nodes to the same
server nodes results in similar figures, then we''d certainly look at
LNet/ko2iblnd.

Thanks,
Isaac

Lustre discuss - Nov 2009 - poor lustre wan performance

[Lustre-discuss] poor lustre wan performance

[Lustre-discuss] poor lustre wan performance