Brock Palen
2010-Oct-21 13:37 UTC
[Lustre-discuss] controlling which eth interface lustre uses
We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 10.164.0.10 In modprobe.conf I have: options lnet networks=tcp0(eth4) lctl list_nids 10.164.0.166 at tcp From a host I run: lctl which_nid oss4 10.164.0.166 at tcp But yet I still see traffic over eth0 the 1Gb management network, might higher than I would expect (upto 100MB/s) The management interface is oss4-gb So If I do from a client: lctl which_nid oss4-gb 10.164.0.10 at tcp Why If I have netwroks=tcp0(eth4) and that list_nids showa only the 10Gb interface, do I have so much traffic over the 1Gb interface? There is some traffic on the 10Gb interface, but I would like to tell lustre ''don''t use the 1Gb interface''. Thanks! Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985
Joe Landman
2010-Oct-21 13:48 UTC
[Lustre-discuss] controlling which eth interface lustre uses
On 10/21/2010 09:37 AM, Brock Palen wrote:> We recently added a new oss, it has 1 1Gb interface and 1 10Gb > interface, > > The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 > 10.164.0.10They look like they are on the same subnet if you are using /24 ...> > In modprobe.conf I have: > > options lnet networks=tcp0(eth4) > > lctl list_nids 10.164.0.166 at tcp > >> From a host I run: > > lctl which_nid oss4 10.164.0.166 at tcp > > But yet I still see traffic over eth0 the 1Gb management network, > might higher than I would expect (upto 100MB/s) The management > interface is oss4-gb So If I do from a client: > > lctl which_nid oss4-gb 10.164.0.10 at tcp > > Why If I have netwroks=tcp0(eth4) and that list_nids showa only the > 10Gb interface, do I have so much traffic over the 1Gb interface? > There is some traffic on the 10Gb interface, but I would like to tell > lustre ''don''t use the 1Gb interface''.If they are on the same subnet, its possible that the 1GbE sees the arp response first. And then its pretty much guaranteed to have the traffic go out that port. If your subnets are different, this shouldn''t be the issue.> > Thanks! > > Brock Palen www.umich.edu/~brockp Center for Advanced Computing > brockp at umich.edu (734)936-1985 > > > > _______________________________________________ Lustre-discuss > mailing list Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Brock Palen
2010-Oct-21 13:51 UTC
[Lustre-discuss] controlling which eth interface lustre uses
On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:> On 10/21/2010 09:37 AM, Brock Palen wrote: >> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >> interface, >> >> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >> 10.164.0.10 > > They look like they are on the same subnet if you are using /24 ...You are correct Both interfaces are on the same subnet: [root at oss4-gb ~]# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.164.0.0 * 255.255.248.0 U 0 0 0 eth0 10.164.0.0 * 255.255.248.0 U 0 0 0 eth4 169.254.0.0 * 255.255.0.0 U 0 0 0 eth4 default 10.164.0.1 0.0.0.0 UG 0 0 0 eth0 There is no way to mask the lustre service away from the 1Gb interface?> >> >> In modprobe.conf I have: >> >> options lnet networks=tcp0(eth4) >> >> lctl list_nids 10.164.0.166 at tcp >> >>> From a host I run: >> >> lctl which_nid oss4 10.164.0.166 at tcp >> >> But yet I still see traffic over eth0 the 1Gb management network, >> might higher than I would expect (upto 100MB/s) The management >> interface is oss4-gb So If I do from a client: >> >> lctl which_nid oss4-gb 10.164.0.10 at tcp >> >> Why If I have netwroks=tcp0(eth4) and that list_nids showa only the >> 10Gb interface, do I have so much traffic over the 1Gb interface? >> There is some traffic on the 10Gb interface, but I would like to tell >> lustre ''don''t use the 1Gb interface''. > > If they are on the same subnet, its possible that the 1GbE sees the arp > response first. And then its pretty much guaranteed to have the traffic > go out that port. > > If your subnets are different, this shouldn''t be the issue. > >> >> Thanks! >> >> Brock Palen www.umich.edu/~brockp Center for Advanced Computing >> brockp at umich.edu (734)936-1985 >> >> >> >> _______________________________________________ Lustre-discuss >> mailing list Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/jackrabbit > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Bob Ball
2010-Oct-21 13:59 UTC
[Lustre-discuss] controlling which eth interface lustre uses
Why do you need both active? If one is a backup to the other, then bond them as a primary/backup pair, meaning only one will be active at at a time, ie, your designated primary (unless it goes down). bob On 10/21/2010 9:51 AM, Brock Palen wrote:> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: > >> On 10/21/2010 09:37 AM, Brock Palen wrote: >>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>> interface, >>> >>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >>> 10.164.0.10 >> They look like they are on the same subnet if you are using /24 ... > You are correct > > Both interfaces are on the same subnet: > > [root at oss4-gb ~]# route > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use Iface > 10.164.0.0 * 255.255.248.0 U 0 0 0 eth0 > 10.164.0.0 * 255.255.248.0 U 0 0 0 eth4 > 169.254.0.0 * 255.255.0.0 U 0 0 0 eth4 > default 10.164.0.1 0.0.0.0 UG 0 0 0 eth0 > > There is no way to mask the lustre service away from the 1Gb interface? > >>> In modprobe.conf I have: >>> >>> options lnet networks=tcp0(eth4) >>> >>> lctl list_nids 10.164.0.166 at tcp >>> >>>> From a host I run: >>> lctl which_nid oss4 10.164.0.166 at tcp >>> >>> But yet I still see traffic over eth0 the 1Gb management network, >>> might higher than I would expect (upto 100MB/s) The management >>> interface is oss4-gb So If I do from a client: >>> >>> lctl which_nid oss4-gb 10.164.0.10 at tcp >>> >>> Why If I have netwroks=tcp0(eth4) and that list_nids showa only the >>> 10Gb interface, do I have so much traffic over the 1Gb interface? >>> There is some traffic on the 10Gb interface, but I would like to tell >>> lustre ''don''t use the 1Gb interface''. >> If they are on the same subnet, its possible that the 1GbE sees the arp >> response first. And then its pretty much guaranteed to have the traffic >> go out that port. >> >> If your subnets are different, this shouldn''t be the issue. >> >>> Thanks! >>> >>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing >>> brockp at umich.edu (734)936-1985 >>> >>> >>> >>> _______________________________________________ Lustre-discuss >>> mailing list Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics Inc. >> email: landman at scalableinformatics.com >> web : http://scalableinformatics.com >> http://scalableinformatics.com/jackrabbit >> phone: +1 734 786 8423 x121 >> fax : +1 866 888 3112 >> cell : +1 734 612 4615 >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Charles Taylor
2010-Oct-21 14:04 UTC
[Lustre-discuss] controlling which eth interface lustre uses
On Oct 21, 2010, at 9:51 AM, Brock Palen wrote:> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: > >> On 10/21/2010 09:37 AM, Brock Palen wrote: >>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>> interface, >>> >>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >>> 10.164.0.10 >> >> They look like they are on the same subnet if you are using /24 ... > > You are correct > > Both interfaces are on the same subnet: > > [root at oss4-gb ~]# route > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref > Use Iface > 10.164.0.0 * 255.255.248.0 U 0 > 0 0 eth0 > 10.164.0.0 * 255.255.248.0 U 0 > 0 0 eth4 > 169.254.0.0 * 255.255.0.0 U 0 > 0 0 eth4 > default 10.164.0.1 0.0.0.0 UG 0 > 0 0 eth0 > > There is no way to mask the lustre service away from the 1Gb > interface?We struggle with this as well but have not found a way to enforce it. You would think that lustre would honor the NID for incoming *and* outgoing traffic but apparently the standard linux routing table determines the outbound path and lnet is out of the picture. Thus, you end up having to assign separate subnets, shut down your eth0 (in this case) interface, or use static routes to fine tune the routing decisions (where possible). We wish that the outgoing decision could be made on the basis of the *NID* but that might be too intrusive with regard to the linux kernel''s network stack so I can understand, somewhat, why it is not that way. Still, it is somewhat counter-intuitive to go through all the trouble of having the LNET layer and assigning NIDs only to have them disregarded for outbound traffic. Perhaps there is a way around this that we don''t know about. Regards, Charlie Taylor UF HPC Center
Brock Palen
2010-Oct-21 14:29 UTC
[Lustre-discuss] controlling which eth interface lustre uses
> Why do you need both active? If one is a backup to the other, then bond > them as a primary/backup pair, meaning only one will be active at at a > time, ie, your designated primary (unless it goes down).We could do this, the 10Gb drivers have been such a pain for us we wanted to have a ''back door'' management network to get to the box should we have issues with the 10Gb driver. Oddly I ran: ifconfig eth0 down and I could nolonger ping the box over the eth4 interface, I had to power cycle it form management. Very odd.> > bob > > On 10/21/2010 9:51 AM, Brock Palen wrote: >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: >> >>> On 10/21/2010 09:37 AM, Brock Palen wrote: >>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>>> interface, >>>> >>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >>>> 10.164.0.10 >>> They look like they are on the same subnet if you are using /24 ... >> You are correct >> >> Both interfaces are on the same subnet: >> >> [root at oss4-gb ~]# route >> Kernel IP routing table >> Destination Gateway Genmask Flags Metric Ref Use Iface >> 10.164.0.0 * 255.255.248.0 U 0 0 0 eth0 >> 10.164.0.0 * 255.255.248.0 U 0 0 0 eth4 >> 169.254.0.0 * 255.255.0.0 U 0 0 0 eth4 >> default 10.164.0.1 0.0.0.0 UG 0 0 0 eth0 >> >> There is no way to mask the lustre service away from the 1Gb interface? >> >>>> In modprobe.conf I have: >>>> >>>> options lnet networks=tcp0(eth4) >>>> >>>> lctl list_nids 10.164.0.166 at tcp >>>> >>>>> From a host I run: >>>> lctl which_nid oss4 10.164.0.166 at tcp >>>> >>>> But yet I still see traffic over eth0 the 1Gb management network, >>>> might higher than I would expect (upto 100MB/s) The management >>>> interface is oss4-gb So If I do from a client: >>>> >>>> lctl which_nid oss4-gb 10.164.0.10 at tcp >>>> >>>> Why If I have netwroks=tcp0(eth4) and that list_nids showa only the >>>> 10Gb interface, do I have so much traffic over the 1Gb interface? >>>> There is some traffic on the 10Gb interface, but I would like to tell >>>> lustre ''don''t use the 1Gb interface''. >>> If they are on the same subnet, its possible that the 1GbE sees the arp >>> response first. And then its pretty much guaranteed to have the traffic >>> go out that port. >>> >>> If your subnets are different, this shouldn''t be the issue. >>> >>>> Thanks! >>>> >>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing >>>> brockp at umich.edu (734)936-1985 >>>> >>>> >>>> >>>> _______________________________________________ Lustre-discuss >>>> mailing list Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> -- >>> Joseph Landman, Ph.D >>> Founder and CEO >>> Scalable Informatics Inc. >>> email: landman at scalableinformatics.com >>> web : http://scalableinformatics.com >>> http://scalableinformatics.com/jackrabbit >>> phone: +1 734 786 8423 x121 >>> fax : +1 866 888 3112 >>> cell : +1 734 612 4615 >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Wojciech Turek
2010-Oct-21 14:34 UTC
[Lustre-discuss] controlling which eth interface lustre uses
Maybe I am missing a point here but can you explain me why would you need to have two NICs in one host on the same subnet? If you need additional access route to your host why not to configure eth0 on different subnet? On 21 October 2010 15:29, Brock Palen <brockp at umich.edu> wrote:> > > > Why do you need both active? If one is a backup to the other, then bond > > them as a primary/backup pair, meaning only one will be active at at a > > time, ie, your designated primary (unless it goes down). > > We could do this, the 10Gb drivers have been such a pain for us we wanted > to have a ''back door'' management network to get to the box should we have > issues with the 10Gb driver. > > Oddly I ran: > > ifconfig eth0 down > > and I could nolonger ping the box over the eth4 interface, I had to power > cycle it form management. Very odd. > > > > > bob > > > > On 10/21/2010 9:51 AM, Brock Palen wrote: > >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: > >> > >>> On 10/21/2010 09:37 AM, Brock Palen wrote: > >>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb > >>>> interface, > >>>> > >>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 > >>>> 10.164.0.10 > >>> They look like they are on the same subnet if you are using /24 ... > >> You are correct > >> > >> Both interfaces are on the same subnet: > >> > >> [root at oss4-gb ~]# route > >> Kernel IP routing table > >> Destination Gateway Genmask Flags Metric Ref Use > Iface > >> 10.164.0.0 * 255.255.248.0 U 0 0 0 > eth0 > >> 10.164.0.0 * 255.255.248.0 U 0 0 0 > eth4 > >> 169.254.0.0 * 255.255.0.0 U 0 0 0 > eth4 > >> default 10.164.0.1 0.0.0.0 UG 0 0 0 > eth0 > >> > >> There is no way to mask the lustre service away from the 1Gb interface? > >> > >>>> In modprobe.conf I have: > >>>> > >>>> options lnet networks=tcp0(eth4) > >>>> > >>>> lctl list_nids 10.164.0.166 at tcp > >>>> > >>>>> From a host I run: > >>>> lctl which_nid oss4 10.164.0.166 at tcp > >>>> > >>>> But yet I still see traffic over eth0 the 1Gb management network, > >>>> might higher than I would expect (upto 100MB/s) The management > >>>> interface is oss4-gb So If I do from a client: > >>>> > >>>> lctl which_nid oss4-gb 10.164.0.10 at tcp > >>>> > >>>> Why If I have netwroks=tcp0(eth4) and that list_nids showa only the > >>>> 10Gb interface, do I have so much traffic over the 1Gb interface? > >>>> There is some traffic on the 10Gb interface, but I would like to tell > >>>> lustre ''don''t use the 1Gb interface''. > >>> If they are on the same subnet, its possible that the 1GbE sees the arp > >>> response first. And then its pretty much guaranteed to have the > traffic > >>> go out that port. > >>> > >>> If your subnets are different, this shouldn''t be the issue. > >>> > >>>> Thanks! > >>>> > >>>> Brock Palen www.umich.edu/~brockp <http://www.umich.edu/%7Ebrockp>Center for Advanced Computing > >>>> brockp at umich.edu (734)936-1985 > >>>> > >>>> > >>>> > >>>> _______________________________________________ Lustre-discuss > >>>> mailing list Lustre-discuss at lists.lustre.org > >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >>> > >>> -- > >>> Joseph Landman, Ph.D > >>> Founder and CEO > >>> Scalable Informatics Inc. > >>> email: landman at scalableinformatics.com > >>> web : http://scalableinformatics.com > >>> http://scalableinformatics.com/jackrabbit > >>> phone: +1 734 786 8423 x121 > >>> fax : +1 866 888 3112 > >>> cell : +1 734 612 4615 > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> Lustre-discuss at lists.lustre.org > >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >>> > >>> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > >> > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/c50099fd/attachment-0001.html
Brian J. Murrell
2010-Oct-21 14:35 UTC
[Lustre-discuss] controlling which eth interface lustre uses
On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote:> > We could do this, the 10Gb drivers have been such a pain for us we wanted to have a ''back door'' management network to get to the box should we have issues with the 10Gb driver.If you really do want two separate networks, one for Lustre and one for management, they why not configure them as separate networks with different subnets? Anything else is just going to confuse the routing engine. I think "at best" two interfaces on the same subnet is going to cause indeterminate behaviour. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/5b8a6e7b/attachment.bin
Brock Palen
2010-Oct-21 14:39 UTC
[Lustre-discuss] controlling which eth interface lustre uses
On Oct 21, 2010, at 10:35 AM, Brian J. Murrell wrote:> On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: >> >> We could do this, the 10Gb drivers have been such a pain for us we wanted to have a ''back door'' management network to get to the box should we have issues with the 10Gb driver. > > If you really do want two separate networks, one for Lustre and one for > management, they why not configure them as separate networks with > different subnets? Anything else is just going to confuse the routing > engine. > > I think "at best" two interfaces on the same subnet is going to cause > indeterminate behaviour.We settled on disabling the eth0 interface and hope the 10Gb driver will not give us any more trouble. We don''t currently have a dedicated management network, it was passed over being setup for just a single host.> > b. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Bob Ball
2010-Oct-21 14:41 UTC
[Lustre-discuss] controlling which eth interface lustre uses
OK, quick startup on bonding, as we use it for our OSS here. We have 2 NICs we bond (SL5.5, an RHEL variant), eth1 at 1Gb and eth2 at 10Gb using Myricom hardware. 10.10.1.2 is the network gateway, a convenient arp target that should always be up. [root at umdist04 network-scripts]# cat ifcfg-bond0 DEVICE=bond0 IPADDR=10.10.2.24 NETMASK=255.255.252.0 BOOTPROTO=static ONBOOT=yes VLAN=no MTU=1500 [root at umdist04 network-scripts]# cat ifcfg-eth1 DEVICE=eth1 ONBOOT=no BOOTPROTO=none MTU=1500 MASTER=bond0 SLAVE=yes [root at umdist04 network-scripts]# cat ifcfg-eth2 DEVICE=eth2 BOOTPROTO=none ONBOOT=no MTU=1500 MASTER=bond0 SLAVE=yes [root at umdist04 etc]# cat modprobe.conf ... alias eth1 bnx2 alias eth2 myri10ge ... alias bond0 bonding options bond0 mode=1 arp_interval=250 arp_ip_target=10.10.1.2 primary=eth2 options lnet networks=tcp0(bond0) ... You can check /proc/net/bonding/bond0 afterwards for information. bob On 10/21/2010 9:59 AM, Bob Ball wrote:> Why do you need both active? If one is a backup to the other, then bond > them as a primary/backup pair, meaning only one will be active at at a > time, ie, your designated primary (unless it goes down). > > bob > > On 10/21/2010 9:51 AM, Brock Palen wrote: >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: >> >>> On 10/21/2010 09:37 AM, Brock Palen wrote: >>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>>> interface, >>>> >>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >>>> 10.164.0.10 >>> They look like they are on the same subnet if you are using /24 ... >> You are correct >> >> Both interfaces are on the same subnet: >> >> [root at oss4-gb ~]# route >> Kernel IP routing table >> Destination Gateway Genmask Flags Metric Ref Use Iface >> 10.164.0.0 * 255.255.248.0 U 0 0 0 eth0 >> 10.164.0.0 * 255.255.248.0 U 0 0 0 eth4 >> 169.254.0.0 * 255.255.0.0 U 0 0 0 eth4 >> default 10.164.0.1 0.0.0.0 UG 0 0 0 eth0 >> >> There is no way to mask the lustre service away from the 1Gb interface? >> >>>> In modprobe.conf I have: >>>> >>>> options lnet networks=tcp0(eth4) >>>> >>>> lctl list_nids 10.164.0.166 at tcp >>>> >>>>> From a host I run: >>>> lctl which_nid oss4 10.164.0.166 at tcp >>>> >>>> But yet I still see traffic over eth0 the 1Gb management network, >>>> might higher than I would expect (upto 100MB/s) The management >>>> interface is oss4-gb So If I do from a client: >>>> >>>> lctl which_nid oss4-gb 10.164.0.10 at tcp >>>> >>>> Why If I have netwroks=tcp0(eth4) and that list_nids showa only the >>>> 10Gb interface, do I have so much traffic over the 1Gb interface? >>>> There is some traffic on the 10Gb interface, but I would like to tell >>>> lustre ''don''t use the 1Gb interface''. >>> If they are on the same subnet, its possible that the 1GbE sees the arp >>> response first. And then its pretty much guaranteed to have the traffic >>> go out that port. >>> >>> If your subnets are different, this shouldn''t be the issue. >>> >>>> Thanks! >>>> >>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing >>>> brockp at umich.edu (734)936-1985 >>>> >>>> >>>> >>>> _______________________________________________ Lustre-discuss >>>> mailing list Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> -- >>> Joseph Landman, Ph.D >>> Founder and CEO >>> Scalable Informatics Inc. >>> email: landman at scalableinformatics.com >>> web : http://scalableinformatics.com >>> http://scalableinformatics.com/jackrabbit >>> phone: +1 734 786 8423 x121 >>> fax : +1 866 888 3112 >>> cell : +1 734 612 4615 >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Joe Landman
2010-Oct-21 14:41 UTC
[Lustre-discuss] controlling which eth interface lustre uses
On 10/21/2010 10:29 AM, Brock Palen wrote:> > >> Why do you need both active? If one is a backup to the other, then >> bond them as a primary/backup pair, meaning only one will be active >> at at a time, ie, your designated primary (unless it goes down). > > We could do this, the 10Gb drivers have been such a pain for us we > wanted to have a ''back door'' management network to get to the box > should we have issues with the 10Gb driver. > > Oddly I ran: > > ifconfig eth0 down > > and I could nolonger ping the box over the eth4 interface, I had to > power cycle it form management. Very odd. >Hmmm ... what 1GbE and 10GbE NICs? Which kernel? We maintain kernel RPMs and tarballs for our customers, and if one of ours will work for you, you are welcome to it. When we set up clusters and/or storage clusters, we typically (completely) isolate the (management and storage fabric) nets from each other. We don''t recommend putting interfaces on the same subnet unless there is a clear intention to channel bond. You may be able to tell the box to ignore arps on the eth0 net, and then hand edit the arp table (arp -s ...) to force a connection. However, this is somewhat convoluted and a management pain. For out of band work, a kvm over IP could be helpful. Does the box support kvm over ip from IPMI? If not, you could get a drop in unit that does this (we use these for older less capable nodes when needed).>> >> bob >> >> On 10/21/2010 9:51 AM, Brock Palen wrote: >>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: >>> >>>> On 10/21/2010 09:37 AM, Brock Palen wrote: >>>>> We recently added a new oss, it has 1 1Gb interface and 1 >>>>> 10Gb interface, >>>>> >>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface >>>>> is eth0 10.164.0.10 >>>> They look like they are on the same subnet if you are using /24 >>>> ... >>> You are correct >>> >>> Both interfaces are on the same subnet: >>> >>> [root at oss4-gb ~]# route Kernel IP routing table Destination >>> Gateway Genmask Flags Metric Ref Use Iface >>> 10.164.0.0 * 255.255.248.0 U 0 0 >>> 0 eth0 10.164.0.0 * 255.255.248.0 U 0 >>> 0 0 eth4 169.254.0.0 * 255.255.0.0 U >>> 0 0 0 eth4 default 10.164.0.1 0.0.0.0 >>> UG 0 0 0 eth0 >>> >>> There is no way to mask the lustre service away from the 1Gb >>> interface? >>> >>>>> In modprobe.conf I have: >>>>> >>>>> options lnet networks=tcp0(eth4) >>>>> >>>>> lctl list_nids 10.164.0.166 at tcp >>>>> >>>>>> From a host I run: >>>>> lctl which_nid oss4 10.164.0.166 at tcp >>>>> >>>>> But yet I still see traffic over eth0 the 1Gb management >>>>> network, might higher than I would expect (upto 100MB/s) The >>>>> management interface is oss4-gb So If I do from a client: >>>>> >>>>> lctl which_nid oss4-gb 10.164.0.10 at tcp >>>>> >>>>> Why If I have netwroks=tcp0(eth4) and that list_nids showa >>>>> only the 10Gb interface, do I have so much traffic over the >>>>> 1Gb interface? There is some traffic on the 10Gb interface, >>>>> but I would like to tell lustre ''don''t use the 1Gb >>>>> interface''. >>>> If they are on the same subnet, its possible that the 1GbE sees >>>> the arp response first. And then its pretty much guaranteed to >>>> have the traffic go out that port. >>>> >>>> If your subnets are different, this shouldn''t be the issue. >>>> >>>>> Thanks! >>>>> >>>>> Brock Palen www.umich.edu/~brockp Center for Advanced >>>>> Computing brockp at umich.edu (734)936-1985 >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list Lustre-discuss at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics >>>> Inc. email: landman at scalableinformatics.com web : >>>> http://scalableinformatics.com >>>> http://scalableinformatics.com/jackrabbit phone: +1 734 786 >>>> 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 >>>> _______________________________________________ Lustre-discuss >>>> mailing list Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>> _______________________________________________ Lustre-discuss >>> mailing list Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ Lustre-discuss >> mailing list Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > _______________________________________________ Lustre-discuss > mailing list Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Lundgren, Andrew
2010-Oct-21 15:07 UTC
[Lustre-discuss] controlling which eth interface lustre uses
Just as a FYI, you can set most of the bonding options in the ifcfg-bond0 file. IE: BONDING_OPTS="arp_ip_target=10.248.58.254 arp_interval=500 mode=active-backup primary=eth0" Then your modprobe.conf only needs: alias bond0 bonding -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of Bob Ball Sent: Thursday, October 21, 2010 8:41 AM To: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] controlling which eth interface lustre uses OK, quick startup on bonding, as we use it for our OSS here. We have 2 NICs we bond (SL5.5, an RHEL variant), eth1 at 1Gb and eth2 at 10Gb using Myricom hardware. 10.10.1.2 is the network gateway, a convenient arp target that should always be up. [root at umdist04 network-scripts]# cat ifcfg-bond0 DEVICE=bond0 IPADDR=10.10.2.24 NETMASK=255.255.252.0 BOOTPROTO=static ONBOOT=yes VLAN=no MTU=1500 [root at umdist04 network-scripts]# cat ifcfg-eth1 DEVICE=eth1 ONBOOT=no BOOTPROTO=none MTU=1500 MASTER=bond0 SLAVE=yes [root at umdist04 network-scripts]# cat ifcfg-eth2 DEVICE=eth2 BOOTPROTO=none ONBOOT=no MTU=1500 MASTER=bond0 SLAVE=yes [root at umdist04 etc]# cat modprobe.conf ... alias eth1 bnx2 alias eth2 myri10ge ... alias bond0 bonding options bond0 mode=1 arp_interval=250 arp_ip_target=10.10.1.2 primary=eth2 options lnet networks=tcp0(bond0) ... You can check /proc/net/bonding/bond0 afterwards for information. bob On 10/21/2010 9:59 AM, Bob Ball wrote:> Why do you need both active? If one is a backup to the other, then bond > them as a primary/backup pair, meaning only one will be active at at a > time, ie, your designated primary (unless it goes down). > > bob > > On 10/21/2010 9:51 AM, Brock Palen wrote: >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: >> >>> On 10/21/2010 09:37 AM, Brock Palen wrote: >>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>>> interface, >>>> >>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >>>> 10.164.0.10 >>> They look like they are on the same subnet if you are using /24 ... >> You are correct >> >> Both interfaces are on the same subnet: >> >> [root at oss4-gb ~]# route >> Kernel IP routing table >> Destination Gateway Genmask Flags Metric Ref Use Iface >> 10.164.0.0 * 255.255.248.0 U 0 0 0 eth0 >> 10.164.0.0 * 255.255.248.0 U 0 0 0 eth4 >> 169.254.0.0 * 255.255.0.0 U 0 0 0 eth4 >> default 10.164.0.1 0.0.0.0 UG 0 0 0 eth0 >> >> There is no way to mask the lustre service away from the 1Gb interface? >> >>>> In modprobe.conf I have: >>>> >>>> options lnet networks=tcp0(eth4) >>>> >>>> lctl list_nids 10.164.0.166 at tcp >>>> >>>>> From a host I run: >>>> lctl which_nid oss4 10.164.0.166 at tcp >>>> >>>> But yet I still see traffic over eth0 the 1Gb management network, >>>> might higher than I would expect (upto 100MB/s) The management >>>> interface is oss4-gb So If I do from a client: >>>> >>>> lctl which_nid oss4-gb 10.164.0.10 at tcp >>>> >>>> Why If I have netwroks=tcp0(eth4) and that list_nids showa only the >>>> 10Gb interface, do I have so much traffic over the 1Gb interface? >>>> There is some traffic on the 10Gb interface, but I would like to tell >>>> lustre ''don''t use the 1Gb interface''. >>> If they are on the same subnet, its possible that the 1GbE sees the arp >>> response first. And then its pretty much guaranteed to have the traffic >>> go out that port. >>> >>> If your subnets are different, this shouldn''t be the issue. >>> >>>> Thanks! >>>> >>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing >>>> brockp at umich.edu (734)936-1985 >>>> >>>> >>>> >>>> _______________________________________________ Lustre-discuss >>>> mailing list Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> -- >>> Joseph Landman, Ph.D >>> Founder and CEO >>> Scalable Informatics Inc. >>> email: landman at scalableinformatics.com >>> web : http://scalableinformatics.com >>> http://scalableinformatics.com/jackrabbit >>> phone: +1 734 786 8423 x121 >>> fax : +1 866 888 3112 >>> cell : +1 734 612 4615 >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Christopher J.Walker
2010-Oct-21 16:38 UTC
[Lustre-discuss] controlling which eth interface lustre uses
Charles Taylor wrote:> On Oct 21, 2010, at 9:51 AM, Brock Palen wrote: > >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote: >> >>> On 10/21/2010 09:37 AM, Brock Palen wrote: >>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb >>>> interface, >>>> >>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb interface is eth0 >>>> 10.164.0.10 >>> They look like they are on the same subnet if you are using /24 ... >> You are correct >> >> Both interfaces are on the same subnet: >> >> [root at oss4-gb ~]# route >> Kernel IP routing table >> Destination Gateway Genmask Flags Metric Ref >> Use Iface >> 10.164.0.0 * 255.255.248.0 U 0 >> 0 0 eth0 >> 10.164.0.0 * 255.255.248.0 U 0 >> 0 0 eth4 >> 169.254.0.0 * 255.255.0.0 U 0 >> 0 0 eth4 >> default 10.164.0.1 0.0.0.0 UG 0 >> 0 0 eth0 >> >> There is no way to mask the lustre service away from the 1Gb >> interface? > > We struggle with this as well but have not found a way to enforce > it. You would think that lustre would honor the NID for incoming > *and* outgoing traffic but apparently the standard linux routing table > determines the outbound path and lnet is out of the picture. Thus, > you end up having to assign separate subnets, shut down your eth0 (in > this case) interface, or use static routes to fine tune the routing > decisions (where possible). > > We wish that the outgoing decision could be made on the basis of the > *NID* but that might be too intrusive with regard to the linux > kernel''s network stack so I can understand, somewhat, why it is not > that way. Still, it is somewhat counter-intuitive to go through all > the trouble of having the LNET layer and assigning NIDs only to have > them disregarded for outbound traffic. > > Perhaps there is a way around this that we don''t know about.Source based routing. You need both to make sure that each interface ignores arp requests to the other IP, and that traffic from the 10Gig IP is routed out of that card. This is the way I solved the problem: #!/bin/sh # Script to use policy based routing to ensure lustre traffic goes in and out from eth2. # First make sure that eth0 and eth2 only respond to arp requests for their own ip echo 1 >/proc/sys/net/ipv4/conf/all/arp_ignore # Now add a source based route - if the route is from the ip address of eth2, then send traffic via it ip route add 10.1.0.0/16 dev eth2 tab 2 ip rule add from $(ifconfig eth2 | awk ''BEGIN {FS="[ :]+"};/inet addr/{print $4}'') tab 2 priority 600 Having said this, I don''t think it''s what I''d set up now. I''d use IPMI to get a serial console on the machine as my back door and/or use LACP bonding (can''t remember which mode). If you do this, and IPMI shares the same physical port as eth0, then it is probably best to use eth1 as the failover link[1]. Chris [1] We had a brief try with IPMI with eth0 and eth1 bonded - DHCP packets got out, but the replies didn''t get back. Presumably the switch is sending the reply to eth1 rather than eth0 (swapping the physical cables around was suggested, but we didn''t try this).