Hello, on our cluster all computenodes are equiped with 2 interfaces. eth0 is 1GBit and myri10ge is 10GBit. We have three maschines for MDS, OSS1 and OSS2. Their lnet configuration looks like: options lnet networks="tcp0(myri10ge),tcp1(eth0)" Client nodes are configured with: options lnet networks="tcp0(myri10ge)" When node001 mounts the lustre fs, it uses the 10gbit ethernet as configured. Thats verified by network monitoring. But the peer_list command executed on the MDS lists a peer to node001 over the 1GB ethernet: lctl > network tcp0 lctl > peer_list 12345-192.168.42.1 at tcp [0]0.0.0.0->0.0.0.0:0 #0 <- thats node001 12345-192.168.10.15 at tcp [1]node009-10g->node015-10g:988 #3 <- thats oss1 12345-192.168.10.16 at tcp [1]node009-10g->node016-10g:988 #3 <- thats oss2 where 192.168.10.0/24 is the 10gbit network and 192.168.42.0/24 the 1gbit. What am I doing wrong? Is there a configuration issue? And what does [0]0.0.0.0->0.0.0.0:0 #0 in line 1 mean? Thanks for any hints. Bastian -- Bastian Tweddell Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6586 Fax: +49-2461-61-6656 WWW: http://www.fz-juelich.de/jsc/ JSC is the coordinator of the John von Neumann Institute for Computing and member of the Gauss Centre for Supercomputing ------------------------------------------------------------------- ------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzende des Aufsichtsrats: MinDir''in Baerbel Brumme-Bothe Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr. Harald Bolt, Dr. Sebastian M. Schmidt ------------------------------------------------------------------- -------------------------------------------------------------------
Hi Bastian, When I test Lustre over myri10ge, I do not use "myri10ge" as the network interface name. I use the actual ethX that myri10ge is providing: options lnet networks="tcp0(eth2),tcp1(eth0)" Scott On Jun 20, 2008, at 10:17 AM, Bastian Tweddell wrote:> Hello, > > on our cluster all computenodes are equiped with 2 interfaces. eth0 > is > 1GBit and myri10ge is 10GBit. We have three maschines for MDS, OSS1 > and OSS2. Their lnet configuration looks like: > > options lnet networks="tcp0(myri10ge),tcp1(eth0)" > > Client nodes are configured with: > > options lnet networks="tcp0(myri10ge)" > > When node001 mounts the lustre fs, it uses the 10gbit ethernet as > configured. Thats verified by network monitoring. But the peer_list > command executed on the MDS lists a peer to node001 over the 1GB > ethernet: > > lctl > network tcp0 > lctl > peer_list > 12345-192.168.42.1 at tcp [0]0.0.0.0->0.0.0.0:0 #0 <- > thats node001 > 12345-192.168.10.15 at tcp [1]node009-10g->node015-10g:988 #3 <- > thats oss1 > 12345-192.168.10.16 at tcp [1]node009-10g->node016-10g:988 #3 <- > thats oss2 > > where 192.168.10.0/24 is the 10gbit network and 192.168.42.0/24 the > 1gbit. > > What am I doing wrong? Is there a configuration issue? > And what does [0]0.0.0.0->0.0.0.0:0 #0 in line 1 mean? > > Thanks for any hints. > > Bastian > > > > -- > Bastian Tweddell > Juelich Supercomputing Centre > Institute for Advanced Simulation > Forschungszentrum Juelich GmbH > 52425 Juelich, Germany > > Phone: +49-2461-61-6586 > Fax: +49-2461-61-6656 > WWW: http://www.fz-juelich.de/jsc/ > > JSC is the coordinator of the > John von Neumann Institute for Computing > and member of the > Gauss Centre for Supercomputing > > > ------------------------------------------------------------------- > ------------------------------------------------------------------- > Forschungszentrum Juelich GmbH > 52425 Juelich > > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzende des Aufsichtsrats: MinDir''in Baerbel Brumme-Bothe > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), > Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr. Harald Bolt, > Dr. Sebastian M. Schmidt > ------------------------------------------------------------------- > ------------------------------------------------------------------- > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Jun 25, 2008 20:04 -0400, Scott Atchley wrote:> When I test Lustre over myri10ge, I do not use "myri10ge" as the > network interface name. I use the actual ethX that myri10ge is > providing: > > options lnet networks="tcp0(eth2),tcp1(eth0)"If you are using tcpX(ethX), then you are only using TCP for the Lustre Network transport instead of the RDMA MX transport and it isn''t as fast or efficient as it could be.> On Jun 20, 2008, at 10:17 AM, Bastian Tweddell wrote: > > > Hello, > > > > on our cluster all computenodes are equiped with 2 interfaces. eth0 > > is > > 1GBit and myri10ge is 10GBit. We have three maschines for MDS, OSS1 > > and OSS2. Their lnet configuration looks like: > > > > options lnet networks="tcp0(myri10ge),tcp1(eth0)" > > > > Client nodes are configured with: > > > > options lnet networks="tcp0(myri10ge)" > > > > When node001 mounts the lustre fs, it uses the 10gbit ethernet as > > configured. Thats verified by network monitoring. But the peer_list > > command executed on the MDS lists a peer to node001 over the 1GB > > ethernet: > > > > lctl > network tcp0 > > lctl > peer_list > > 12345-192.168.42.1 at tcp [0]0.0.0.0->0.0.0.0:0 #0 <- > > thats node001 > > 12345-192.168.10.15 at tcp [1]node009-10g->node015-10g:988 #3 <- > > thats oss1 > > 12345-192.168.10.16 at tcp [1]node009-10g->node016-10g:988 #3 <- > > thats oss2 > > > > where 192.168.10.0/24 is the 10gbit network and 192.168.42.0/24 the > > 1gbit. > > > > What am I doing wrong? Is there a configuration issue? > > And what does [0]0.0.0.0->0.0.0.0:0 #0 in line 1 mean? > > > > Thanks for any hints. > > > > Bastian > > > > > > > > -- > > Bastian Tweddell > > Juelich Supercomputing Centre > > Institute for Advanced Simulation > > Forschungszentrum Juelich GmbH > > 52425 Juelich, Germany > > > > Phone: +49-2461-61-6586 > > Fax: +49-2461-61-6656 > > WWW: http://www.fz-juelich.de/jsc/ > > > > JSC is the coordinator of the > > John von Neumann Institute for Computing > > and member of the > > Gauss Centre for Supercomputing > > > > > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > Forschungszentrum Juelich GmbH > > 52425 Juelich > > > > Sitz der Gesellschaft: Juelich > > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > > Vorsitzende des Aufsichtsrats: MinDir''in Baerbel Brumme-Bothe > > Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), > > Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr. Harald Bolt, > > Dr. Sebastian M. Schmidt > > ------------------------------------------------------------------- > > ------------------------------------------------------------------- > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On 25.Jun.08 23:28 -0600, Andreas Dilger wrote:> On Jun 25, 2008 20:04 -0400, Scott Atchley wrote: > > When I test Lustre over myri10ge, I do not use "myri10ge" as the > > network interface name. I use the actual ethX that myri10ge is > > providing: > > > > options lnet networks="tcp0(eth2),tcp1(eth0)"Ok, right, that is the same I am doing. I configured udev to name the myri10ge interface ''myri10ge''.> > If you are using tcpX(ethX), then you are only using TCP for the > Lustre Network transport instead of the RDMA MX transport and it > isn''t as fast or efficient as it could be.I do not think, that myri10ge supports RDMA MX, does it? But my question still is: Why does lctl peer_list shows a connection between MDS and lustre client, which uses the management network, although the client is configured to use myri10ge network only? I recently upgraded to 1.6.5, which has the same result. # On MDS lctl > network tcp0 lctl > peer_list 12345-192.168.42.1 at tcp [0]0.0.0.0->0.0.0.0:0 #0 # On Client 1 lctl > network tcp0 lctl > peer_list 12345-192.168.10.9 at tcp [1]node001-10g->node009-10g:988 #3 Client1 has a correct entry regarding the MDS, while the MDS resolves the source of Client1 incorrectly. This might be a misconfiguration, but I can not figure out where. -- Bastian Tweddell Juelich Supercomputing Centre Institute for Advanced Simulation Forschungszentrum Juelich GmbH 52425 Juelich, Germany Phone: +49-2461-61-6586 Fax: +49-2461-61-6656 WWW: http://www.fz-juelich.de/jsc/ JSC is the coordinator of the John von Neumann Institute for Computing and member of the Gauss Centre for Supercomputing ------------------------------------------------------------------- ------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzende des Aufsichtsrats: MinDir''in Baerbel Brumme-Bothe Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr. Harald Bolt, Dr. Sebastian M. Schmidt ------------------------------------------------------------------- -------------------------------------------------------------------
On Jun 26, 2008, at 1:28 AM, Andreas Dilger wrote:> On Jun 25, 2008 20:04 -0400, Scott Atchley wrote: >> When I test Lustre over myri10ge, I do not use "myri10ge" as the >> network interface name. I use the actual ethX that myri10ge is >> providing: >> >> options lnet networks="tcp0(eth2),tcp1(eth0)" > > If you are using tcpX(ethX), then you are only using TCP for the > Lustre Network transport instead of the RDMA MX transport and it > isn''t as fast or efficient as it could be.Andreas, It depends if the NIC on the other end is a Myricom NIC or not. If it is a Myricom 10G NIC, then he can run MX. If the switch fabric is Ethernet, however, it depends on the switches as to whether they can handle MX over Ethernet (MXoE require flow control on and jumbo frames). If the other NIC is _not_ a Myricom NIC, then using our standard Ethernet driver and SOCKLND on top is the easiest route. Scott
On Jun 26, 2008, at 9:03 AM, Bastian Tweddell wrote:> On 25.Jun.08 23:28 -0600, Andreas Dilger wrote: >> On Jun 25, 2008 20:04 -0400, Scott Atchley wrote: >>> When I test Lustre over myri10ge, I do not use "myri10ge" as the >>> network interface name. I use the actual ethX that myri10ge is >>> providing: >>> >>> options lnet networks="tcp0(eth2),tcp1(eth0)" > Ok, right, that is the same I am doing. I configured udev to name > the myri10ge interface ''myri10ge''.Ok. :-)>> If you are using tcpX(ethX), then you are only using TCP for the >> Lustre Network transport instead of the RDMA MX transport and it >> isn''t as fast or efficient as it could be. > I do not think, that myri10ge supports RDMA MX, does it?No. You need a MX license which allows you to download the MX driver. I assume that you are using non-Myrinet switches since you are using the Ethernet driver. Before buying MX, it would be best to contact Myricom at sales at myri dot com to determine if MX would work in your fabric.> But my question still is: Why does lctl peer_list shows a connection > between MDS and lustre client, which uses the management network, > although the client is configured to use myri10ge network only?I''ll let Andreas and the Sun folks handle it from here. :-) Scott