Good Morning Folks, We''re getting some weird LustreError entries on a few OSTs in our cluster but no real disruption of service. Any ideas what might cause such things? n0003: LustreError: 137-5: UUID ''lrc-OST0000_UUID'' is not available for connect (no target) n0003: LustreError: Skipped 2 previous similar messages n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) @@@ processing error (-19) req at ffff8102db286000 x1230507/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1261726242 ref 1 fl Interpret:/0/0 rc -19/0 n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 3 previous similar messages There are no further messages concerning this OST and the FS is still in production accessing the OST with ease. Are these clients having a problem or OSSs? ---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
Andreas Dilger
2010-Jan-08 19:38 UTC
[Lustre-discuss] odd ost disconnects during production
On 2010-01-08, at 12:19, John White wrote:> We''re getting some weird LustreError entries on a few OSTs in our > cluster but no real disruption of service. Any ideas what might > cause such things? > > n0003: LustreError: 137-5: UUID ''lrc-OST0000_UUID'' is not available > for connect (no target) > n0003: LustreError: Skipped 2 previous similar messages > n0003: LustreError: 11954:0:(ldlm_lib.c: > 1863:target_send_reply_msg()) @@@ processing error (-19) > req at ffff8102db286000 x1230507/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 > dl 1261726242 ref 1 fl Interpret:/0/0 rc -19/0 > n0003: LustreError: 11954:0:(ldlm_lib.c: > 1863:target_send_reply_msg()) Skipped 3 previous similar messages > > > There are no further messages concerning this OST and the FS is > still in production accessing the OST with ease. Are these clients > having a problem or OSSs?Do you have failover configured? It seems possible that the client is trying the backup OSS, which indeed doesn''t have that OST configured, then tries the primary OSS and is successful. Unfortunately, the "o8-><?>@<?>" is supposed to say where the "o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message is slightly incorrect (i.e. a minor code bug) because it has no connection from which to get this information. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Ah, yes, we do have failover configured. Thanks for the explanation. On Jan 8, 2010, at 11:38 AM, Andreas Dilger wrote:> On 2010-01-08, at 12:19, John White wrote: >> We''re getting some weird LustreError entries on a few OSTs in our cluster but no real disruption of service. Any ideas what might cause such things? >> >> n0003: LustreError: 137-5: UUID ''lrc-OST0000_UUID'' is not available for connect (no target) >> n0003: LustreError: Skipped 2 previous similar messages >> n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) @@@ processing error (-19) req at ffff8102db286000 x1230507/t0 o8-><?>@<?>:0/0 lens 304/0 e 0 to 0 dl 1261726242 ref 1 fl Interpret:/0/0 rc -19/0 >> n0003: LustreError: 11954:0:(ldlm_lib.c:1863:target_send_reply_msg()) Skipped 3 previous similar messages >> >> >> There are no further messages concerning this OST and the FS is still in production accessing the OST with ease. Are these clients having a problem or OSSs? > > > Do you have failover configured? It seems possible that the client is trying the backup OSS, which indeed doesn''t have that OST configured, then tries the primary OSS and is successful. > > Unfortunately, the "o8-><?>@<?>" is supposed to say where the "o8" (OST_CONNECT) RPC is being sent, but I suspect the debug message is slightly incorrect (i.e. a minor code bug) because it has no connection from which to get this information. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720