Hello everybody. I am now trying to make an OSC mount a Lustre filesystem from MDS located in another TCP network, but it refuses with the following error: mount.lustre: mount 10.3.0.102@tcp:/SANDBOX at /mnt/lustre failed: Cannot send after transport endpoint shutdown If then I check LNET routing using "lctl show_route" command it shows me the following: net tcp hops 1 gw 10.4.0.105@tcp1 down "down" status appears only after first mount attempt after reboot, standing "up" before. What am I doing wrong? Thanks in advance! I have attached a drawing which explains the topology. Machines from my Lustre emvironment have the following network configurations. ==== MDS. ifconfig: eth0 Link encap:Ethernet HWaddr 00:50:56:B9:04:8A inet addr:10.3.0.102 Bcast:10.3.0.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb9:48a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:4510 errors:0 dropped:0 overruns:0 frame:0 TX packets:4439 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:666698 (651.0 KiB) TX bytes:695697 (679.3 KiB) lctl list_nids: 10.3.0.102@tcp lctl route_show: net tcp1 hops 1 gw 10.3.0.105@tcp up ==== OSS1. ifconfig: eth0 Link encap:Ethernet HWaddr 00:50:56:B9:79:51 inet addr:10.3.0.103 Bcast:10.3.0.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb9:7951/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2482 errors:0 dropped:0 overruns:0 frame:0 TX packets:2398 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:388187 (379.0 KiB) TX bytes:365254 (356.6 KiB) lctl list_nids: 10.3.0.103@tcp lctl route_show: net tcp1 hops 1 gw 10.3.0.105@tcp up ==== OSS2. ifconfig: eth0 Link encap:Ethernet HWaddr 00:50:56:B9:22:76 inet addr:10.3.0.104 Bcast:10.3.0.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb9:2276/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2522 errors:0 dropped:0 overruns:0 frame:0 TX packets:2407 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:394006 (384.7 KiB) TX bytes:364467 (355.9 KiB) lctl list_nids: 10.3.0.104@tcp lctl route_show: net tcp1 hops 1 gw 10.3.0.105@tcp up ==== router. ifconfig: eth0 Link encap:Ethernet HWaddr 00:50:56:B9:07:B2 inet addr:10.3.0.105 Bcast:10.3.0.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb9:7b2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:291 errors:0 dropped:0 overruns:0 frame:0 TX packets:249 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:51645 (50.4 KiB) TX bytes:50121 (48.9 KiB) eth1 Link encap:Ethernet HWaddr 00:50:56:B9:7E:CA inet addr:10.4.0.105 Bcast:10.4.0.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb9:7eca/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:41 errors:0 dropped:0 overruns:0 frame:0 TX packets:15 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2474 (2.4 KiB) TX bytes:906 (906.0 b) lctl list_nids: 10.3.0.105@tcp 10.4.0.105@tcp1 lctl show_route: <nothing here> ==== OSC. ifconfig: eth0 Link encap:Ethernet HWaddr 00:50:56:B9:6C:1A inet addr:10.4.0.101 Bcast:10.4.0.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:feb9:6c1a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:204 errors:0 dropped:0 overruns:0 frame:0 TX packets:187 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:43784 (42.7 KiB) TX bytes:39666 (38.7 KiB) lctl list_nids: 10.4.0.101@tcp1 lctl show_route: net tcp hops 1 gw 10.4.0.105@tcp1 up ==== -- Всеволод Никоноров, ОИТТиС, НИКИЭТ <v.nikonorov-U0i0rYjVlwqHXe+LvDLADg@public.gmane.org> _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hello Vsevolod , To set the route up , you may use : lctl set_route 10.4.0.105@tcp1 up Even if it''s ''==== obsolete ( DANGEROUS ) ===='' it works fine on Lustre2.1.4 ;) ----- Mail original ----- De: " Vsevolod Nikonorov " <v. nikonorov @ nikiet .ru> À: lustre-discuss@ lists .lustre. org Envoyé: Mercredi 14 Août 2013 15:38:40 Objet: [Lustre-discuss] Understanging LNET routing Hello everybody . I am now trying to make an OSC mount a Lustre filesystem from MDS located in another TCP network , but it refuses with the following error : mount .lustre: mount 10.3.0.102@ tcp :/ SANDBOX at / mnt /lustre failed : Cannot send after transport endpoint shutdown If then I check LNET routing using " lctl show_route" command it shows me the following : net tcp hops 1 gw 10.4.0.105@tcp1 down " down " status appears only after first mount attempt after reboot , standing " up " before . What am I doing wrong ? Thanks in advance ! I have attached a drawing which explains the topology . Machines from my Lustre emvironment have the following network configurations. ===== MDS . ifconfig : eth0 Link encap :Ethernet HWaddr 00:50:56:B9:04:8A inet addr :10.3.0.102 Bcast :10.3.0.255 Mask :255.255.255.0 inet6 addr : fe80::250:56ff:feb9:48a/64 Scope : Link UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 RX packets :4510 errors :0 dropped :0 overruns :0 frame :0 TX packets :4439 errors :0 dropped :0 overruns :0 carrier:0 collisions:0 txqueuelen :1000 RX bytes :666698 (651.0 KiB ) TX bytes :695697 (679.3 KiB ) lctl list _nids: 10.3.0.102@ tcp lctl route_show: net tcp1 hops 1 gw 10.3.0.105@ tcp up ===== OSS1. ifconfig : eth0 Link encap :Ethernet HWaddr 00:50:56:B9:79:51 inet addr :10.3.0.103 Bcast :10.3.0.255 Mask :255.255.255.0 inet6 addr : fe80::250:56ff:feb9:7951/64 Scope : Link UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 RX packets :2482 errors :0 dropped :0 overruns :0 frame :0 TX packets :2398 errors :0 dropped :0 overruns :0 carrier:0 collisions:0 txqueuelen :1000 RX bytes :388187 (379.0 KiB ) TX bytes :365254 (356.6 KiB ) lctl list _nids: 10.3.0.103@ tcp lctl route_show: net tcp1 hops 1 gw 10.3.0.105@ tcp up ===== OSS2. ifconfig : eth0 Link encap :Ethernet HWaddr 00:50:56:B9:22:76 inet addr :10.3.0.104 Bcast :10.3.0.255 Mask :255.255.255.0 inet6 addr : fe80::250:56ff:feb9:2276/64 Scope : Link UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 RX packets :2522 errors :0 dropped :0 overruns :0 frame :0 TX packets :2407 errors :0 dropped :0 overruns :0 carrier:0 collisions:0 txqueuelen :1000 RX bytes :394006 (384.7 KiB ) TX bytes :364467 (355.9 KiB ) lctl list _nids: 10.3.0.104@ tcp lctl route_show: net tcp1 hops 1 gw 10.3.0.105@ tcp up ===== router. ifconfig : eth0 Link encap :Ethernet HWaddr 00:50:56:B9:07:B2 inet addr :10.3.0.105 Bcast :10.3.0.255 Mask :255.255.255.0 inet6 addr : fe80::250:56ff:feb9:7b2/64 Scope : Link UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 RX packets :291 errors :0 dropped :0 overruns :0 frame :0 TX packets :249 errors :0 dropped :0 overruns :0 carrier:0 collisions:0 txqueuelen :1000 RX bytes :51645 (50.4 KiB ) TX bytes :50121 (48.9 KiB ) eth1 Link encap :Ethernet HWaddr 00:50:56:B9:7E: CA inet addr :10.4.0.105 Bcast :10.4.0.255 Mask :255.255.255.0 inet6 addr : fe80::250:56ff:feb9:7eca/64 Scope : Link UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 RX packets :41 errors :0 dropped :0 overruns :0 frame :0 TX packets :15 errors :0 dropped :0 overruns :0 carrier:0 collisions:0 txqueuelen :1000 RX bytes :2474 (2.4 KiB ) TX bytes :906 (906.0 b) lctl list _nids: 10.3.0.105@ tcp 10.4.0.105@tcp1 lctl show_route: < nothing here > ===== OSC . ifconfig : eth0 Link encap :Ethernet HWaddr 00:50:56:B9:6C:1A inet addr :10.4.0.101 Bcast :10.4.0.255 Mask :255.255.255.0 inet6 addr : fe80::250:56ff:feb9:6c1a/64 Scope : Link UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 RX packets :204 errors :0 dropped :0 overruns :0 frame :0 TX packets :187 errors :0 dropped :0 overruns :0 carrier:0 collisions:0 txqueuelen :1000 RX bytes :43784 (42.7 KiB ) TX bytes :39666 (38.7 KiB ) lctl list _nids: 10.4.0.101@tcp1 lctl show_route: net tcp hops 1 gw 10.4.0.105@tcp1 up ===== -- Всеволод Никоноров , ОИТТиС , НИКИЭТ <v. nikonorov @ nikiet .ru> _______________________________________________ Lustre-discuss mailing list Lustre-discuss@ lists .lustre. org http :// lists .lustre. org / mailman / listinfo /lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Thank you for the advice, my general problem was the default setting of iptables firewall in Centos 6.4 - when I reset them to all-permit state, my route on an OSC stopped falling to "down". But still routing do not work properly - mount request on OSC just hangs. Although I see some traffic in INPUT and OUTPUT chains of my router, here's a fragment of it: Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth1 SRC=10.4.0.105 DST=10.4.0.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=23517 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth1 OUT= MAC=00:50:56:b9:7e:ca:00:50:56:b9:6c:1a:08:00 SRC=10.4.0.101 DST=10.4.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=50793 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth1 OUT= MAC=00:50:56:b9:7e:ca:00:50:56:b9:6c:1a:08:00 SRC=10.4.0.101 DST=10.4.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=50794 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth1 SRC=10.4.0.105 DST=10.4.0.101 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=23518 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth0 SRC=10.3.0.105 DST=10.3.0.102 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=599 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth0 OUT= MAC=00:50:56:b9:07:b2:00:50:56:b9:04:8a:08:00 SRC=10.3.0.102 DST=10.3.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=51416 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN=eth0 OUT= MAC=00:50:56:b9:07:b2:00:50:56:b9:04:8a:08:00 SRC=10.3.0.102 DST=10.3.0.105 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=51417 DF PROTO=TCP SPT=988 DPT=1021 WINDOW=114 RES=0x00 ACK URGP=0 Aug 15 06:59:59 test-lustre-router1 kernel: IN= OUT=eth0 SRC=10.3.0.105 DST=10.3.0.102 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=600 DF PROTO=TCP SPT=1021 DPT=988 WINDOW=115 RES=0x00 ACK URGP=0 I believe this is some Lustre trafic, though I do not know the protocol and cannot understand what is wrong. Is Lustre routing something to do with TCP/IP routing? Should I set net.ipv4.ip_forward to 1 in sysctl.conf? Should I do some IP masquerade for Lustre routing to work properly? On Wed, 14 Aug 2013 16:16:19 +0200 (CEST) Hervé Toureille <toureille@cines.fr> wrote:> Hello Vsevolod , > To set the route up , you may use : > > lctl set_route 10.4.0.105@tcp1 up > > Even if it's '==== obsolete ( DANGEROUS ) ====' it works fine on Lustre2.1.4 ;) > > > > > ----- Mail original ----- > > De: " Vsevolod Nikonorov " <v. nikonorov @ nikiet .ru> > À: lustre-discuss@ lists .lustre. org > Envoyé: Mercredi 14 Août 2013 15:38:40 > Objet: [Lustre-discuss] Understanging LNET routing > > Hello everybody . > > I am now trying to make an OSC mount a Lustre filesystem from MDS located in another TCP network , but it refuses with the following error : > > mount .lustre: mount 10.3.0.102@ tcp :/ SANDBOX at / mnt /lustre failed : Cannot send after transport endpoint shutdown > > If then I check LNET routing using " lctl show_route" command it shows me the following : > > net tcp hops 1 gw 10.4.0.105@tcp1 down > > " down " status appears only after first mount attempt after reboot , standing " up " before . > > What am I doing wrong ? Thanks in advance ! > > > > I have attached a drawing which explains the topology . > Machines from my Lustre emvironment have the following network configurations. > > ===== > > MDS . > > ifconfig : > > eth0 Link encap :Ethernet HWaddr 00:50:56:B9:04:8A > inet addr :10.3.0.102 Bcast :10.3.0.255 Mask :255.255.255.0 > inet6 addr : fe80::250:56ff:feb9:48a/64 Scope : Link > UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 > RX packets :4510 errors :0 dropped :0 overruns :0 frame :0 > TX packets :4439 errors :0 dropped :0 overruns :0 carrier:0 > collisions:0 txqueuelen :1000 > RX bytes :666698 (651.0 KiB ) TX bytes :695697 (679.3 KiB ) > > lctl list _nids: > > 10.3.0.102@ tcp > > lctl route_show: > > net tcp1 hops 1 gw 10.3.0.105@ tcp up > > ===== > > OSS1. > > ifconfig : > > eth0 Link encap :Ethernet HWaddr 00:50:56:B9:79:51 > inet addr :10.3.0.103 Bcast :10.3.0.255 Mask :255.255.255.0 > inet6 addr : fe80::250:56ff:feb9:7951/64 Scope : Link > UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 > RX packets :2482 errors :0 dropped :0 overruns :0 frame :0 > TX packets :2398 errors :0 dropped :0 overruns :0 carrier:0 > collisions:0 txqueuelen :1000 > RX bytes :388187 (379.0 KiB ) TX bytes :365254 (356.6 KiB ) > > lctl list _nids: > > 10.3.0.103@ tcp > > lctl route_show: > > net tcp1 hops 1 gw 10.3.0.105@ tcp up > > ===== > > OSS2. > > ifconfig : > > eth0 Link encap :Ethernet HWaddr 00:50:56:B9:22:76 > inet addr :10.3.0.104 Bcast :10.3.0.255 Mask :255.255.255.0 > inet6 addr : fe80::250:56ff:feb9:2276/64 Scope : Link > UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 > RX packets :2522 errors :0 dropped :0 overruns :0 frame :0 > TX packets :2407 errors :0 dropped :0 overruns :0 carrier:0 > collisions:0 txqueuelen :1000 > RX bytes :394006 (384.7 KiB ) TX bytes :364467 (355.9 KiB ) > > lctl list _nids: > > 10.3.0.104@ tcp > > lctl route_show: > > net tcp1 hops 1 gw 10.3.0.105@ tcp up > > ===== > > router. > > ifconfig : > > eth0 Link encap :Ethernet HWaddr 00:50:56:B9:07:B2 > inet addr :10.3.0.105 Bcast :10.3.0.255 Mask :255.255.255.0 > inet6 addr : fe80::250:56ff:feb9:7b2/64 Scope : Link > UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 > RX packets :291 errors :0 dropped :0 overruns :0 frame :0 > TX packets :249 errors :0 dropped :0 overruns :0 carrier:0 > collisions:0 txqueuelen :1000 > RX bytes :51645 (50.4 KiB ) TX bytes :50121 (48.9 KiB ) > > eth1 Link encap :Ethernet HWaddr 00:50:56:B9:7E: CA > inet addr :10.4.0.105 Bcast :10.4.0.255 Mask :255.255.255.0 > inet6 addr : fe80::250:56ff:feb9:7eca/64 Scope : Link > UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 > RX packets :41 errors :0 dropped :0 overruns :0 frame :0 > TX packets :15 errors :0 dropped :0 overruns :0 carrier:0 > collisions:0 txqueuelen :1000 > RX bytes :2474 (2.4 KiB ) TX bytes :906 (906.0 b) > > lctl list _nids: > > 10.3.0.105@ tcp > 10.4.0.105@tcp1 > > lctl show_route: > > < nothing here > > > ===== > > OSC . > > ifconfig : > > eth0 Link encap :Ethernet HWaddr 00:50:56:B9:6C:1A > inet addr :10.4.0.101 Bcast :10.4.0.255 Mask :255.255.255.0 > inet6 addr : fe80::250:56ff:feb9:6c1a/64 Scope : Link > UP BROADCAST RUNNING MULTICAST MTU :1500 Metric :1 > RX packets :204 errors :0 dropped :0 overruns :0 frame :0 > TX packets :187 errors :0 dropped :0 overruns :0 carrier:0 > collisions:0 txqueuelen :1000 > RX bytes :43784 (42.7 KiB ) TX bytes :39666 (38.7 KiB ) > > lctl list _nids: > > 10.4.0.101@tcp1 > > lctl show_route: > > net tcp hops 1 gw 10.4.0.105@tcp1 up > > ===== > > -- > Всеволод Никоноров , > ОИТТиС , НИКИЭТ > > <v. nikonorov @ nikiet .ru> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@ lists .lustre. org > http :// lists .lustre. org / mailman / listinfo /lustre-discuss >-- Всеволод Никоноров, ОИТТиС, НИКИЭТ <v.nikonorov@nikiet.ru> -- Всеволод Никоноров, ОИТТиС, НИКИЭТ <v.nikonorov@nikiet.ru> _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Thu, Aug 15, 2013 at 04:09:45PM +0400, Vsevolod Nikonorov wrote:> ...... > Is Lustre routing something to do with TCP/IP routing? Should I set net.ipv4.ip_forward to 1 in sysctl.conf? Should I do some IP masquerade for Lustre routing to work properly?No. - Isaac
Is it possible to route Lustre traffic between two TCP networks? Manual states that "the routers would enable LNet forwarding since their NIDs are specified in the 'routes' parameters as being routers." How can I query the router if it consideres itself a router? On Thu, 15 Aug 2013 15:36:26 -0600 Isaac Huang <he.huang@intel.com> wrote:> On Thu, Aug 15, 2013 at 04:09:45PM +0400, Vsevolod Nikonorov wrote: > > ...... > > Is Lustre routing something to do with TCP/IP routing? Should I set net.ipv4.ip_forward to 1 in sysctl.conf? Should I do some IP masquerade for Lustre routing to work properly? > > No. > > - Isaac-- Всеволод Никоноров, ОИТТиС, НИКИЭТ <v.nikonorov@nikiet.ru> _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Mon, 2013-08-19 at 16:50 +0400, Vsevolod Nikonorov wrote:> Is it possible to route Lustre traffic between two TCP networks?Yes, and you have multiple ways to achieve that goal. If you are using TCP, then it routes just like any other IP traffic -- you may not even need Lustre routers. If you are trying to change LNET types -- say from RDMA over IB to TCP -- you would run an LNET router to handle the protocol translation. There are other reasons to run LNET routers, such as maintaining separation between networks or fabrics, but in many cases those are more administrative or performance based than absolute technical requirements.> Manual states that > > "the routers would enable LNet forwarding since their NIDs are > specified in the ''routes'' parameters as being routers." > > How can I query the router if it consideres itself a router?Look in /proc/sys/lnet/routes; if you see "Routing enabled" then the router knows it is a router. Hope this helps, -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office
Thanks, seems like my router is a router indeed. Nevertheless I am unable to mount Lustre filesystem: "mount.lustre 10.3.0.102@tcp:/SANDBOX /mnt/lustre" just hangs, though meanwhile Lustre client communicates with Lustre router (which may be seen using rules like "iptables -I INPUT -p tcp ! --dport 22 -j LOG"), and Lustre router communicates with my MDS. /var/log/messages on MDS does not contain a single line about Lustre activity, though mdt volume is formatted and mounted properly. I thing I need to increase debugging level at least on MDS, how can it be done? On Mon, 19 Aug 2013 14:37:11 -0400 David Dillow <dillowda@ornl.gov> wrote:> On Mon, 2013-08-19 at 16:50 +0400, Vsevolod Nikonorov wrote: > > Is it possible to route Lustre traffic between two TCP networks? > > Yes, and you have multiple ways to achieve that goal. If you are using > TCP, then it routes just like any other IP traffic -- you may not even > need Lustre routers. > > If you are trying to change LNET types -- say from RDMA over IB to TCP > -- you would run an LNET router to handle the protocol translation. > > There are other reasons to run LNET routers, such as maintaining > separation between networks or fabrics, but in many cases those are more > administrative or performance based than absolute technical > requirements. > > > Manual states that > > > > "the routers would enable LNet forwarding since their NIDs are > > specified in the 'routes' parameters as being routers." > > > > How can I query the router if it consideres itself a router? > > Look in /proc/sys/lnet/routes; if you see "Routing enabled" then the > router knows it is a router. > > Hope this helps, > -- > Dave Dillow > National Center for Computational Science > Oak Ridge National Laboratory > (865) 241-6602 office > >-- Всеволод Никоноров, ОИТТиС, НИКИЭТ <v.nikonorov@nikiet.ru> _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Please check that the Lustre client can ping the MGS, over LNet. [root@compute1 ~]# ping lustre-18-mgs PING lustre-18-mgs.nitrox.net (192.168.10.120) 56(84) bytes of data. 64 bytes from lustre-18-mgs.nitrox.net (192.168.10.120): icmp_seq=1 ttl=64 time=0.377 ms --- lustre-18-mgs.nitrox.net ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.377/0.377/0.377/0.000 ms [root@compute1 ~]# lctl ping 192.168.10.120@tcp1 12345-0@lo 12345-192.168.10.120@tcp1 [root@compute1 ~]# Once you can ping, you should be able to mount. If mounting fails, please send the mount command error message and the pertinent log entries (client and MGS). -- Brett Lee Sr. Systems Engineer Intel High Performance Data Division> -----Original Message----- > From: lustre-discuss-bounces@lists.lustre.org [mailto:lustre-discuss- > bounces@lists.lustre.org] On Behalf Of Vsevolod Nikonorov > Sent: Tuesday, August 20, 2013 12:10 AM > To: lustre-discuss@lists.lustre.org > Subject: Re: [Lustre-discuss] Understanging LNET routing > > Thanks, seems like my router is a router indeed. Nevertheless I am unable to > mount Lustre filesystem: "mount.lustre 10.3.0.102@tcp:/SANDBOX > /mnt/lustre" just hangs, though meanwhile Lustre client communicates with > Lustre router (which may be seen using rules like "iptables -I INPUT -p tcp ! -- > dport 22 -j LOG"), and Lustre router communicates with my MDS. > /var/log/messages on MDS does not contain a single line about Lustre > activity, though mdt volume is formatted and mounted properly. I thing I > need to increase debugging level at least on MDS, how can it be done? > > On Mon, 19 Aug 2013 14:37:11 -0400 > David Dillow <dillowda@ornl.gov> wrote: > > > On Mon, 2013-08-19 at 16:50 +0400, Vsevolod Nikonorov wrote: > > > Is it possible to route Lustre traffic between two TCP networks? > > > > Yes, and you have multiple ways to achieve that goal. If you are using > > TCP, then it routes just like any other IP traffic -- you may not even > > need Lustre routers. > > > > If you are trying to change LNET types -- say from RDMA over IB to TCP > > -- you would run an LNET router to handle the protocol translation. > > > > There are other reasons to run LNET routers, such as maintaining > > separation between networks or fabrics, but in many cases those are > > more administrative or performance based than absolute technical > > requirements. > > > > > Manual states that > > > > > > "the routers would enable LNet forwarding since their NIDs are > > > specified in the 'routes' parameters as being routers." > > > > > > How can I query the router if it consideres itself a router? > > > > Look in /proc/sys/lnet/routes; if you see "Routing enabled" then the > > router knows it is a router. > > > > Hope this helps, > > -- > > Dave Dillow > > National Center for Computational Science Oak Ridge National > > Laboratory > > (865) 241-6602 office > > > > > > > -- > Всеволод Никоноров, > ОИТТиС, НИКИЭТ > > <v.nikonorov@nikiet.ru> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
I have discovered that MDS log contains errors during communication with NIDs that did not exist. Most likely that was a consequence of several interconfused configurations, so I just recreated the whole test stand, and now I've got LNET routing working. Thanks everybody for helping! On Tue, 20 Aug 2013 06:46:54 +0000 "Lee, Brett" <brett.lee@intel.com> wrote:> Please check that the Lustre client can ping the MGS, over LNet. > > [root@compute1 ~]# ping lustre-18-mgs > PING lustre-18-mgs.nitrox.net (192.168.10.120) 56(84) bytes of data. > 64 bytes from lustre-18-mgs.nitrox.net (192.168.10.120): icmp_seq=1 ttl=64 time=0.377 ms > > --- lustre-18-mgs.nitrox.net ping statistics --- > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > rtt min/avg/max/mdev = 0.377/0.377/0.377/0.000 ms > > > [root@compute1 ~]# lctl ping 192.168.10.120@tcp1 > 12345-0@lo > 12345-192.168.10.120@tcp1 > [root@compute1 ~]# > > Once you can ping, you should be able to mount. If mounting fails, please send the mount command error message and the pertinent log entries (client and MGS). > > -- > Brett Lee > Sr. Systems Engineer > Intel High Performance Data Division > > > > -----Original Message----- > > From: lustre-discuss-bounces@lists.lustre.org [mailto:lustre-discuss- > > bounces@lists.lustre.org] On Behalf Of Vsevolod Nikonorov > > Sent: Tuesday, August 20, 2013 12:10 AM > > To: lustre-discuss@lists.lustre.org > > Subject: Re: [Lustre-discuss] Understanging LNET routing > > > > Thanks, seems like my router is a router indeed. Nevertheless I am unable to > > mount Lustre filesystem: "mount.lustre 10.3.0.102@tcp:/SANDBOX > > /mnt/lustre" just hangs, though meanwhile Lustre client communicates with > > Lustre router (which may be seen using rules like "iptables -I INPUT -p tcp ! -- > > dport 22 -j LOG"), and Lustre router communicates with my MDS. > > /var/log/messages on MDS does not contain a single line about Lustre > > activity, though mdt volume is formatted and mounted properly. I thing I > > need to increase debugging level at least on MDS, how can it be done? > > > > On Mon, 19 Aug 2013 14:37:11 -0400 > > David Dillow <dillowda@ornl.gov> wrote: > > > > > On Mon, 2013-08-19 at 16:50 +0400, Vsevolod Nikonorov wrote: > > > > Is it possible to route Lustre traffic between two TCP networks? > > > > > > Yes, and you have multiple ways to achieve that goal. If you are using > > > TCP, then it routes just like any other IP traffic -- you may not even > > > need Lustre routers. > > > > > > If you are trying to change LNET types -- say from RDMA over IB to TCP > > > -- you would run an LNET router to handle the protocol translation. > > > > > > There are other reasons to run LNET routers, such as maintaining > > > separation between networks or fabrics, but in many cases those are > > > more administrative or performance based than absolute technical > > > requirements. > > > > > > > Manual states that > > > > > > > > "the routers would enable LNet forwarding since their NIDs are > > > > specified in the 'routes' parameters as being routers." > > > > > > > > How can I query the router if it consideres itself a router? > > > > > > Look in /proc/sys/lnet/routes; if you see "Routing enabled" then the > > > router knows it is a router. > > > > > > Hope this helps, > > > -- > > > Dave Dillow > > > National Center for Computational Science Oak Ridge National > > > Laboratory > > > (865) 241-6602 office > > > > > > > > > > > > -- > > Всеволод Никоноров, > > ОИТТиС, НИКИЭТ > > > > <v.nikonorov@nikiet.ru> > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Всеволод Никоноров, ОИТТиС, НИКИЭТ <v.nikonorov@nikiet.ru> _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss