Götz Waschk
2009-Feb-09 15:52 UTC
[Lustre-discuss] problem mounting two different lustre instances
Hello everyone, I have a problem mounting two different lustre instances on one client. Both lustre instances are configured with o2ib networking for the local clients and tcp for remote clients. So I have two MGS instances, 141.34.228.39 at tcp0 is the remote lustre, 192.168.224.2 at o2ib0 is the local one. My client has this in modprobe.conf: options lnet networks=o2ib,tcp I''m trying to mount the remote network with mount -t lustre 141.34.228.39 at tcp0:/atlas /scratch/lustre-1.6/atlas and the command just hangs, the error is this: LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 req at 00000100dfc2ac00 x7/t0 o38->atlas-MDT0000_UUID at 192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234194365 ref 2 fl Rpc:/0/0 rc 0/0 I can mount the local lustre just fine: mount -t lustre 192.168.224.2 at o2ib0:/lhcb /lustre/lhcb/ On the other client I have reversed the network list in modprobe.conf: options lnet networks=tcp,o2ib Now I can mount both lustre instances, but both seem to use the tcp network, even the one that is local and should use o2ib. On the local MGS: lctl list_nids 192.168.224.2 at o2ib 141.34.218.7 at tcp On my client: lctl which_nid 192.168.224.2 at o2ib 141.34.218.7 at tcp 141.34.218.7 at tcp What can I do? Regards, G?tz Waschk
Isaac Huang
2009-Feb-10 00:44 UTC
[Lustre-discuss] problem mounting two different lustre instances
On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote:> Hello everyone, > ..... > My client has this in modprobe.conf: > options lnet networks=o2ib,tcp > I''m trying to mount the remote network with > mount -t lustre 141.34.228.39 at tcp0:/atlas /scratch/lustre-1.6/atlas > and the command just hangs, the error is this: > LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, > status -113 req at 00000100dfc2ac00 x7/t0The outgoing message failed with -113 (EHOSTUNREACH). What does "lctl list_nids" say on the client? Also, please: echo +neterror > /proc/sys/lnet/printk So that more network errors would go onto console. Isaac
Götz Waschk
2009-Feb-10 08:38 UTC
[Lustre-discuss] problem mounting two different lustre instances
On Tue, Feb 10, 2009 at 1:44 AM, Isaac Huang <He.Huang at sun.com> wrote:> On Mon, Feb 09, 2009 at 04:52:20PM +0100, G?tz Waschk wrote: >> My client has this in modprobe.conf: >> options lnet networks=o2ib,tcp >> I''m trying to mount the remote network with >> mount -t lustre 141.34.228.39 at tcp0:/atlas /scratch/lustre-1.6/atlas >> and the command just hangs, the error is this: >> LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, >> status -113 ?req at 00000100dfc2ac00 x7/t0Hi Isaac,> The outgoing message failed with -113 (EHOSTUNREACH). What does "lctl > list_nids" say on the client?on that client, the output is: 192.168.224.23 at o2ib 141.34.216.38 at tcp> Also, please: > echo +neterror > /proc/sys/lnet/printk > So that more network errors would go onto console.OK, after the next mount attempt I have this in the log now: Lustre: OBD class driver, http://www.lustre.org/ Lustre Version: 1.6.6 Build Version: 1.6.6-19700101010000-PRISTINE-.usr.src.redhat.BUILD.lustre-1.6.6.kernel-2.6.9-78.0.13.ELsmp Lustre: Added LNI 192.168.224.23 at o2ib [8/64] Lustre: Added LNI 141.34.216.38 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; http://www.lustre.org/ Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) 192.168.22.32 at o2ib: ROUTE ERROR -22 Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) Deleting messages for 192.168.22.32 at o2ib: connection failed LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 req at 0000010037eeac00 x7/t0 o38->atlas-MDT0000_UUID at 192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255029 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: 9263:0:(client.c:1199:ptlrpc_expire_one_request()) @@@ network error (sent at 1234255024, 0s ago) req at 0000010037eeac00 x7/t0 o38->atlas-MDT0000_UUID at 192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255029 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: Request x7 sent from atlas-MDT0000-mdc-00000107fc2ee400 to NID 192.168.22.32 at o2ib 0s ago has timed out (limit 5s). Lustre: 9264:0:(import.c:410:import_select_connection()) atlas-MDT0000-mdc-00000107fc2ee400: tried all connections, increasing latency to 5s Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) 192.168.22.32 at o2ib: ROUTE ERROR -22 Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) Deleting messages for 192.168.22.32 at o2ib: connection failed LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 req at 000001080325a400 x10/t0 o38->atlas-MDT0000_UUID at 192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255054 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: 9263:0:(client.c:1199:ptlrpc_expire_one_request()) @@@ network error (sent at 1234255049, 0s ago) req at 000001080325a400 x10/t0 o38->atlas-MDT0000_UUID at 192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 dl 1234255054 ref 1 fl Rpc:/0/0 rc 0/0 Lustre: Request x10 sent from atlas-MDT0000-mdc-00000107fc2ee400 to NID 192.168.22.32 at o2ib 0s ago has timed out (limit 5s). Regards, G?tz -- AL I:40: Do what thou wilt shall be the whole of the Law.
Johann Lombardi
2009-Feb-10 12:45 UTC
[Lustre-discuss] problem mounting two different lustre instances
On Tue, Feb 10, 2009 at 09:38:13AM +0100, G?tz Waschk wrote:> Lustre: Added LNI 192.168.224.23 at o2ib [8/64] > Lustre: Added LNI 141.34.216.38 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; http://www.lustre.org/ > Lustre: 2887:0:(o2iblnd_cb.c:2704:kiblnd_cm_callback()) > 192.168.22.32 at o2ib: ROUTE ERROR -22 > Lustre: 2887:0:(o2iblnd_cb.c:2118:kiblnd_peer_connect_failed()) > Deleting messages for 192.168.22.32 at o2ib: connection failed > LustreError: 2887:0:(events.c:66:request_out_callback()) @@@ type 4, > status -113 req at 0000010037eeac00 x7/t0 > o38->atlas-MDT0000_UUID at 192.168.22.32@o2ib:12/10 lens 240/400 e 0 to 5 > dl 1234255029 ref 2 fl Rpc:/0/0 rc 0/038 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22.32 at o2ib, whereas i think it should use tcp to access the lustre filesystem of the remote cluster, is my understanding of your configuration correct? Johann
Götz Waschk
2009-Feb-10 14:13 UTC
[Lustre-discuss] problem mounting two different lustre instances
On Tue, Feb 10, 2009 at 1:45 PM, Johann Lombardi <johann at sun.com> wrote:> 38 = MDS_CONNECT. The client tries to reach the MDT via 192.168.22.32 at o2ib, > whereas i think it should use tcp to access the lustre filesystem of the remote > cluster, is my understanding of your configuration correct?That''s right, it should use 141.34.228.39 at tcp0 instead. Regards, G?tz -- AL I:40: Do what thou wilt shall be the whole of the Law.
Götz Waschk
2009-Feb-19 12:54 UTC
[Lustre-discuss] problem mounting two different lustre instances
Hi everyone, should I open a bug about this at bugzilla? Regards, G?tz Waschk