Michael Bloom
2013-Sep-15 14:51 UTC
Lustre 2.4 MDT: LustreError: Communicating with 0@lo: operation mds_connect failed with -11
I''m a Lustre newbie who just joined this list. I''d appreciate any help on the following Lustre 2.4 issue I''m running into: Every time I mount the MDT, the mount appears to succeed but /var/log/messages contains the message: "LustreError: 11-0: lustre-MDT0000-lwp-MDT0000: Communicating with 0@lo, operation mds_connect failed with -11". The MDT uses 4 local drives in a RAID10 configuration. Each OSS has their own RAID10 of 36 drives each. The OSS''s mount correctly without any errors. I''ve seen this error mentioned in countless Google searches. One obscure reply suggested this was a problem fixed in 2.5. All other references were with respect to pre-2.4 releases where the message indicated there was probably an error somewhere in the connection''s configuration. Is this a real error? I see the code that probably generates this in client.c. In abbreviated form, the code is: LCONSOLE_ERROR_MSG(0x11, "%s Communicating with %s") in ptlrpc_check_status(). There''s another in mdt_obd_connect() where -EAGAIN [set to -11 in lustre_errno.h) is returned if the stack isn''t ready to handle requests as indicated by the return code from obd_health_check(). My environment is this: MDT, OSS0, and OSS1 are all on 3 separate nodes running Centos 6.4 and connected by Infiniband Mellanox HBAs. Running this in a VM with the MDT and a single OSS on one node in a VM using TCP did not exhibit this problem. Thanks in advance for any help you can provide. Michael _______________________________________________ Lustre-devel mailing list Lustre-devel-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-devel