Andrus, Brian Contractor
2010-Dec-03 01:56 UTC
[Lustre-discuss] target_send_reply_msg errors
I am seeing a TON of messages like: LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-107) req at ffff8105cbdd2c00 x1348234052426436/t0 o400-><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1291340852 ref 1 fl Interpret:H/0/0 rc -107/0 LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) Skipped 864 previous similar messages LustreError: 9363:0:(mgs_handler.c:554:mgs_handle()) lustre_mgs: operation 400 on unconnected MGS LustreError: 9363:0:(mgs_handler.c:554:mgs_handle()) Skipped 352 previous similar messages Any ideas what that may be and how to remedy it? Brian Andrus -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101202/b3a55747/attachment.html
Hello! Essentially your client(s) got disconnected from MGS for some reason (somewhere earlier in MGS logs you should see something about that). Now the clients did not know they were disconnected and discover this sad fact next time they try to talk to MGS (sending their periodic PINGs once in a while. operation 400 == PING) Bye, Oleg On Dec 2, 2010, at 8:56 PM, Andrus, Brian Contractor wrote:> I am seeing a TON of messages like: > > LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-107) req at ffff8105cbdd2c00 x1348234052426436/t0 o400-><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1291340852 ref 1 fl Interpret:H/0/0 rc -107/0 > LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) Skipped 864 previous similar messages > LustreError: 9363:0:(mgs_handler.c:554:mgs_handle()) lustre_mgs: operation 400 on unconnected MGS > LustreError: 9363:0:(mgs_handler.c:554:mgs_handle()) Skipped 352 previous similar messages > > > Any ideas what that may be and how to remedy it? > > Brian Andrus > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Thu, 2010-12-02 at 17:56 -0800, Andrus, Brian Contractor wrote:> I am seeing a TON of messages like: > > LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-107) req at ffff8105cbdd2c00 x1348234052426436/t0 o400-><?>@<?>:0/0 lens 192/0 e 0 to 0 dl 1291340852 ref 1 fl Interpret:H/0/0 rc -107/0 > LustreError: 19122:0:(ldlm_lib.c:1892:target_send_reply_msg()) Skipped 864 previous similar messages > LustreError: 9363:0:(mgs_handler.c:554:mgs_handle()) lustre_mgs: operation 400 on unconnected MGS > LustreError: 9363:0:(mgs_handler.c:554:mgs_handle()) Skipped 352 previous similar messages > > Any ideas what that may be and how to remedy it?Check that ''lctl get_param timeout'' gives you the same value on the servers and the clients. We''ve seen this quite often -- when using a separate MGS, the timeout doesn''t get set automatically from the parameters like it does for the MDT and OSTs. We added it to our filesystem start scripts and things are much happier. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office