thr3ads.net - Lustre discuss - [Lustre-discuss] ost error messages [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Michael Di Domenico

2010-Jul-20 20:30 UTC

[Lustre-discuss] ost error messages

I''m seeing two error messages in my logs im not entirely sure how to
interpret

- clients
os redhat v5.4 x86_64
kernel 2.6.18-164.11.1_lustre
ofed 1.5.0 (qlogic version)
lustre 1.8.1.1

- servers
os redhat v5.3 x86_64
kernel 2.6.18-128.7.1_lustre
ofed 1.4.2 (qlogic version)
lustre 1.8.1.1

first error

lustre: 8305:0 (client.c:1383:prlrpc_expire_one_require()) @@@ Request
x1339802879282312 sent from fs1-ost0002 to nid 1.1.1.98 at o2ib 7s ago
has timed out (limit 7s)
req at ffff8101c7fb6800 x1339802879282312/t0
o106->@NET_0x5000064843162_UUID:15/16 lens 296/424 e 0 to 1 dl
1279653812 ref 27185 fl Rpc:/2/0 rc 0/0

Now it''s true the OST and the client do have infiniband cards, but the
client does not have a NID setup for IB and the lustre client is not
set to mount over o2ib only TCP, so i''m not sure why there are any
RPC''s destined for IB.  the OSS does have IB NID''s

On the same OSS, I''m also seeing

LustreError: 8429:0:(filter.c:2520:filter_grant_sanity_check())
filter_statfs: tot_granted 155983872 != fo_tot_granted 1959129600
LustreError: 8429:0:(filter.c:2523:filter_grant_sanity_check())
filter_statfs: tot_pending 0 != fo_tot_pending 3145728

I''m not sure if they''re related or not.  Only one OSS server
is doing
this and I have 10 in production.

second error (different OSS)

LustreError: 8219:0:(ost_handler.c:1038:ost_brw_write()) client csum
a5908b2, server csum c68fd7de
LustreError: 168-f: fs1-ost000b: BAD WRITE CHECKSUM: changed in
transit before arrival at OST from 12345-1.1.1.25 at o2ib inum
33358052/386095879 object 1779/0 extent [192495232-1942499327]

these two lines will repeat a few times with different csum values,
but from the same host.

I''m also seeing processes deadlock (no time accumulation and not
killable with -9) when reading files from the filesystem.  I haven''t
narrowed it down but it seems to happen more with striped files then
non-striped

Any thoughts?

Thanks
- Michael

Lustre discuss - Jul 2010 - ost error messages

[Lustre-discuss] ost error messages