thr3ads.net - Lustre discuss - [Lustre-discuss] controlling which eth interface lustre uses [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Brock Palen

2010-Oct-21 13:37 UTC

[Lustre-discuss] controlling which eth interface lustre uses

We recently added a new oss, it has 1 1Gb interface and 1 10Gb interface, 

The 10Gb interface is eth4 10.164.0.166
The 1Gb   interface is eth0 10.164.0.10

In modprobe.conf I have:

options lnet networks=tcp0(eth4)

lctl list_nids
10.164.0.166 at tcp

From a host I run:

lctl which_nid oss4
10.164.0.166 at tcp

But yet I still see traffic over eth0 the 1Gb management network, might higher
than I would expect (upto 100MB/s) The management interface is oss4-gb  So If I
do from a client:

lctl which_nid oss4-gb
10.164.0.10 at tcp

Why If I have netwroks=tcp0(eth4)  and that list_nids showa only the 10Gb
interface, do I have so much traffic over the 1Gb interface?  There is some
traffic on the 10Gb interface, but I would like to tell lustre
''don''t use the 1Gb interface''.

Thanks!

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985

Joe Landman

2010-Oct-21 13:48 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

On 10/21/2010 09:37 AM, Brock Palen wrote:> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
> interface,
>
> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
> 10.164.0.10
They look like they are on the same subnet if you are using /24 ...
>
> In modprobe.conf I have:
>
> options lnet networks=tcp0(eth4)
>
> lctl list_nids 10.164.0.166 at tcp
>
>> From a host I run:
>
> lctl which_nid oss4 10.164.0.166 at tcp
>
> But yet I still see traffic over eth0 the 1Gb management network,
> might higher than I would expect (upto 100MB/s) The management
> interface is oss4-gb  So If I do from a client:
>
> lctl which_nid oss4-gb 10.164.0.10 at tcp
>
> Why If I have netwroks=tcp0(eth4)  and that list_nids showa only the
> 10Gb interface, do I have so much traffic over the 1Gb interface?
> There is some traffic on the 10Gb interface, but I would like to tell
> lustre ''don''t use the 1Gb interface''.
If they are on the same subnet, its possible that the 1GbE sees the arp 
response first.  And then its pretty much guaranteed to have the traffic 
go out that port.

If your subnets are different, this shouldn''t be the issue.
>
> Thanks!
>
> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
> brockp at umich.edu (734)936-1985
>
>
>
> _______________________________________________ Lustre-discuss
> mailing list Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Brock Palen

2010-Oct-21 13:51 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
> On 10/21/2010 09:37 AM, Brock Palen wrote:
>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>> interface,
>> 
>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
>> 10.164.0.10
> 
> They look like they are on the same subnet if you are using /24 ...
You are correct 

Both interfaces are on the same subnet:

[root at oss4-gb ~]# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.164.0.0      *               255.255.248.0   U     0      0        0 eth0
10.164.0.0      *               255.255.248.0   U     0      0        0 eth4
169.254.0.0     *               255.255.0.0     U     0      0        0 eth4
default         10.164.0.1      0.0.0.0         UG    0      0        0 eth0

There is no way to mask the lustre service away from the 1Gb interface?  
> 
>> 
>> In modprobe.conf I have:
>> 
>> options lnet networks=tcp0(eth4)
>> 
>> lctl list_nids 10.164.0.166 at tcp
>> 
>>> From a host I run:
>> 
>> lctl which_nid oss4 10.164.0.166 at tcp
>> 
>> But yet I still see traffic over eth0 the 1Gb management network,
>> might higher than I would expect (upto 100MB/s) The management
>> interface is oss4-gb  So If I do from a client:
>> 
>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>> 
>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa only the
>> 10Gb interface, do I have so much traffic over the 1Gb interface?
>> There is some traffic on the 10Gb interface, but I would like to tell
>> lustre ''don''t use the 1Gb interface''.
> 
> If they are on the same subnet, its possible that the 1GbE sees the arp 
> response first.  And then its pretty much guaranteed to have the traffic 
> go out that port.
> 
> If your subnets are different, this shouldn''t be the issue.
> 
>> 
>> Thanks!
>> 
>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
>> brockp at umich.edu (734)936-1985
>> 
>> 
>> 
>> _______________________________________________ Lustre-discuss
>> mailing list Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> 
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
>        http://scalableinformatics.com/jackrabbit
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
>

Bob Ball

2010-Oct-21 13:59 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

Why do you need both active?  If one is a backup to the other, then bond 
them as a primary/backup pair, meaning only one will be active at at a 
time, ie, your designated primary (unless it goes down).

bob

On 10/21/2010 9:51 AM, Brock Palen wrote:> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>
>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>>> interface,
>>>
>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
>>> 10.164.0.10
>> They look like they are on the same subnet if you are using /24 ...
> You are correct
>
> Both interfaces are on the same subnet:
>
> [root at oss4-gb ~]# route
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface
> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth0
> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth4
> 169.254.0.0     *               255.255.0.0     U     0      0        0
eth4
> default         10.164.0.1      0.0.0.0         UG    0      0        0
eth0
>
> There is no way to mask the lustre service away from the 1Gb interface?
>
>>> In modprobe.conf I have:
>>>
>>> options lnet networks=tcp0(eth4)
>>>
>>> lctl list_nids 10.164.0.166 at tcp
>>>
>>>>  From a host I run:
>>> lctl which_nid oss4 10.164.0.166 at tcp
>>>
>>> But yet I still see traffic over eth0 the 1Gb management network,
>>> might higher than I would expect (upto 100MB/s) The management
>>> interface is oss4-gb  So If I do from a client:
>>>
>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>>>
>>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa only
the
>>> 10Gb interface, do I have so much traffic over the 1Gb interface?
>>> There is some traffic on the 10Gb interface, but I would like to
tell
>>> lustre ''don''t use the 1Gb interface''.
>> If they are on the same subnet, its possible that the 1GbE sees the arp
>> response first.  And then its pretty much guaranteed to have the
traffic
>> go out that port.
>>
>> If your subnets are different, this shouldn''t be the issue.
>>
>>> Thanks!
>>>
>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
>>> brockp at umich.edu (734)936-1985
>>>
>>>
>>>
>>> _______________________________________________ Lustre-discuss
>>> mailing list Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>> -- 
>> Joseph Landman, Ph.D
>> Founder and CEO
>> Scalable Informatics Inc.
>> email: landman at scalableinformatics.com
>> web  : http://scalableinformatics.com
>>         http://scalableinformatics.com/jackrabbit
>> phone: +1 734 786 8423 x121
>> fax  : +1 866 888 3112
>> cell : +1 734 612 4615
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Charles Taylor

2010-Oct-21 14:04 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

On Oct 21, 2010, at 9:51 AM, Brock Palen wrote:
> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>
>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>>> interface,
>>>
>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is eth0
>>> 10.164.0.10
>>
>> They look like they are on the same subnet if you are using /24 ...
>
> You are correct
>
> Both interfaces are on the same subnet:
>
> [root at oss4-gb ~]# route
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref     
> Use Iface
> 10.164.0.0      *               255.255.248.0   U     0       
> 0        0 eth0
> 10.164.0.0      *               255.255.248.0   U     0       
> 0        0 eth4
> 169.254.0.0     *               255.255.0.0     U     0       
> 0        0 eth4
> default         10.164.0.1      0.0.0.0         UG    0       
> 0        0 eth0
>
> There is no way to mask the lustre service away from the 1Gb  
> interface?
We struggle with this as well but have not found a way to enforce  
it.   You would think that lustre would honor the NID for incoming  
*and* outgoing traffic but apparently the standard linux routing table  
determines the outbound path and lnet is out of the picture.     Thus,  
you end up having to assign separate subnets, shut down your eth0 (in  
this case) interface, or use static routes to fine tune the routing  
decisions (where possible).

We wish that the outgoing decision could be made on the basis of the  
*NID* but that might be too intrusive with regard to the linux  
kernel''s network stack so I can understand, somewhat, why it is not  
that way.   Still, it is somewhat counter-intuitive to go through all  
the trouble of having the LNET layer and assigning NIDs only to have  
them disregarded for outbound traffic.

Perhaps there is a way around this that we don''t know about.

Regards,

Charlie Taylor
UF HPC Center

Brock Palen

2010-Oct-21 14:29 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

> Why do you need both active?  If one is a backup to the other, then bond 
> them as a primary/backup pair, meaning only one will be active at at a 
> time, ie, your designated primary (unless it goes down).
We could do this, the 10Gb drivers have been such a pain for us we wanted to
have a ''back door'' management network to get to the box should
we have issues with the 10Gb driver.

Oddly I ran:

ifconfig eth0 down 

and I could nolonger ping the box over the eth4 interface, I had to power cycle
it form management.  Very odd.
> 
> bob
> 
> On 10/21/2010 9:51 AM, Brock Palen wrote:
>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>> 
>>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>>>> interface,
>>>> 
>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is
eth0
>>>> 10.164.0.10
>>> They look like they are on the same subnet if you are using /24 ...
>> You are correct
>> 
>> Both interfaces are on the same subnet:
>> 
>> [root at oss4-gb ~]# route
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface
>> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth0
>> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth4
>> 169.254.0.0     *               255.255.0.0     U     0      0        0
eth4
>> default         10.164.0.1      0.0.0.0         UG    0      0        0
eth0
>> 
>> There is no way to mask the lustre service away from the 1Gb interface?
>> 
>>>> In modprobe.conf I have:
>>>> 
>>>> options lnet networks=tcp0(eth4)
>>>> 
>>>> lctl list_nids 10.164.0.166 at tcp
>>>> 
>>>>> From a host I run:
>>>> lctl which_nid oss4 10.164.0.166 at tcp
>>>> 
>>>> But yet I still see traffic over eth0 the 1Gb management
network,
>>>> might higher than I would expect (upto 100MB/s) The management
>>>> interface is oss4-gb  So If I do from a client:
>>>> 
>>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>>>> 
>>>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa
only the
>>>> 10Gb interface, do I have so much traffic over the 1Gb
interface?
>>>> There is some traffic on the 10Gb interface, but I would like
to tell
>>>> lustre ''don''t use the 1Gb
interface''.
>>> If they are on the same subnet, its possible that the 1GbE sees the
arp
>>> response first.  And then its pretty much guaranteed to have the
traffic
>>> go out that port.
>>> 
>>> If your subnets are different, this shouldn''t be the
issue.
>>> 
>>>> Thanks!
>>>> 
>>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
>>>> brockp at umich.edu (734)936-1985
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________ Lustre-discuss
>>>> mailing list Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> 
>>> -- 
>>> Joseph Landman, Ph.D
>>> Founder and CEO
>>> Scalable Informatics Inc.
>>> email: landman at scalableinformatics.com
>>> web  : http://scalableinformatics.com
>>>        http://scalableinformatics.com/jackrabbit
>>> phone: +1 734 786 8423 x121
>>> fax  : +1 866 888 3112
>>> cell : +1 734 612 4615
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> 
>>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
>

Wojciech Turek

2010-Oct-21 14:34 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

Maybe I am missing a point here but can you explain me why would you need to
have two NICs in one host on the same subnet?
If you need additional access route to your host why not to configure eth0
on different subnet?

On 21 October 2010 15:29, Brock Palen <brockp at umich.edu> wrote:
>
>
> > Why do you need both active?  If one is a backup to the other, then
bond
> > them as a primary/backup pair, meaning only one will be active at at a
> > time, ie, your designated primary (unless it goes down).
>
> We could do this, the 10Gb drivers have been such a pain for us we wanted
> to have a ''back door'' management network to get to the
box should we have
> issues with the 10Gb driver.
>
> Oddly I ran:
>
> ifconfig eth0 down
>
> and I could nolonger ping the box over the eth4 interface, I had to power
> cycle it form management.  Very odd.
>
> >
> > bob
> >
> > On 10/21/2010 9:51 AM, Brock Palen wrote:
> >> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
> >>
> >>> On 10/21/2010 09:37 AM, Brock Palen wrote:
> >>>> We recently added a new oss, it has 1 1Gb interface and 1
10Gb
> >>>> interface,
> >>>>
> >>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb  
interface is eth0
> >>>> 10.164.0.10
> >>> They look like they are on the same subnet if you are using
/24 ...
> >> You are correct
> >>
> >> Both interfaces are on the same subnet:
> >>
> >> [root at oss4-gb ~]# route
> >> Kernel IP routing table
> >> Destination     Gateway         Genmask         Flags Metric Ref  
Use
> Iface
> >> 10.164.0.0      *               255.255.248.0   U     0      0    
0
> eth0
> >> 10.164.0.0      *               255.255.248.0   U     0      0    
0
> eth4
> >> 169.254.0.0     *               255.255.0.0     U     0      0    
0
> eth4
> >> default         10.164.0.1      0.0.0.0         UG    0      0    
0
> eth0
> >>
> >> There is no way to mask the lustre service away from the 1Gb
interface?
> >>
> >>>> In modprobe.conf I have:
> >>>>
> >>>> options lnet networks=tcp0(eth4)
> >>>>
> >>>> lctl list_nids 10.164.0.166 at tcp
> >>>>
> >>>>> From a host I run:
> >>>> lctl which_nid oss4 10.164.0.166 at tcp
> >>>>
> >>>> But yet I still see traffic over eth0 the 1Gb management
network,
> >>>> might higher than I would expect (upto 100MB/s) The
management
> >>>> interface is oss4-gb  So If I do from a client:
> >>>>
> >>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
> >>>>
> >>>> Why If I have netwroks=tcp0(eth4)  and that list_nids
showa only the
> >>>> 10Gb interface, do I have so much traffic over the 1Gb
interface?
> >>>> There is some traffic on the 10Gb interface, but I would
like to tell
> >>>> lustre ''don''t use the 1Gb
interface''.
> >>> If they are on the same subnet, its possible that the 1GbE
sees the arp
> >>> response first.  And then its pretty much guaranteed to have
the
> traffic
> >>> go out that port.
> >>>
> >>> If your subnets are different, this shouldn''t be the
issue.
> >>>
> >>>> Thanks!
> >>>>
> >>>> Brock Palen www.umich.edu/~brockp
<http://www.umich.edu/%7Ebrockp>Center for Advanced Computing
> >>>> brockp at umich.edu (734)936-1985
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
Lustre-discuss
> >>>> mailing list Lustre-discuss at lists.lustre.org
> >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>>
> >>> --
> >>> Joseph Landman, Ph.D
> >>> Founder and CEO
> >>> Scalable Informatics Inc.
> >>> email: landman at scalableinformatics.com
> >>> web  : http://scalableinformatics.com
> >>>        http://scalableinformatics.com/jackrabbit
> >>> phone: +1 734 786 8423 x121
> >>> fax  : +1 866 888 3112
> >>> cell : +1 734 612 4615
> >>> _______________________________________________
> >>> Lustre-discuss mailing list
> >>> Lustre-discuss at lists.lustre.org
> >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>>
> >>>
> >> _______________________________________________
> >> Lustre-discuss mailing list
> >> Lustre-discuss at lists.lustre.org
> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >>
> >>
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/c50099fd/attachment-0001.html

Brian J. Murrell

2010-Oct-21 14:35 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: > 
> We could do this, the 10Gb drivers have been such a pain for us we wanted
to have a ''back door'' management network to get to the box
should we have issues with the 10Gb driver.
If you really do want two separate networks, one for Lustre and one for
management, they why not configure them as separate networks with
different subnets?  Anything else is just going to confuse the routing
engine.

I think "at best" two interfaces on the same subnet is going to cause
indeterminate behaviour.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101021/5b8a6e7b/attachment.bin

Brock Palen

2010-Oct-21 14:39 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

On Oct 21, 2010, at 10:35 AM, Brian J. Murrell wrote:
> On Thu, 2010-10-21 at 10:29 -0400, Brock Palen wrote: 
>> 
>> We could do this, the 10Gb drivers have been such a pain for us we
wanted to have a ''back door'' management network to get to the
box should we have issues with the 10Gb driver.
> 
> If you really do want two separate networks, one for Lustre and one for
> management, they why not configure them as separate networks with
> different subnets?  Anything else is just going to confuse the routing
> engine.
> 
> I think "at best" two interfaces on the same subnet is going to
cause
> indeterminate behaviour.
We settled on disabling the eth0 interface and hope the 10Gb driver will not
give us any more trouble.
We don''t currently have a dedicated management network, it was passed
over being setup for just a single host.


> 
> b.
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Bob Ball

2010-Oct-21 14:41 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

OK, quick startup on bonding, as we use it for our OSS here.

We have 2 NICs we bond (SL5.5, an RHEL variant), eth1 at 1Gb and eth2 at 
10Gb using Myricom hardware.  10.10.1.2 is the network gateway, a 
convenient arp target that should always be up.

[root at umdist04 network-scripts]# cat ifcfg-bond0
DEVICE=bond0
IPADDR=10.10.2.24
NETMASK=255.255.252.0
BOOTPROTO=static
ONBOOT=yes
VLAN=no
MTU=1500

[root at umdist04 network-scripts]# cat ifcfg-eth1
DEVICE=eth1
ONBOOT=no
BOOTPROTO=none
MTU=1500
MASTER=bond0
SLAVE=yes

[root at umdist04 network-scripts]# cat ifcfg-eth2
DEVICE=eth2
BOOTPROTO=none
ONBOOT=no
MTU=1500
MASTER=bond0
SLAVE=yes

[root at umdist04 etc]# cat modprobe.conf
...
alias eth1 bnx2
alias eth2 myri10ge
...
alias bond0 bonding
options bond0 mode=1 arp_interval=250 arp_ip_target=10.10.1.2 primary=eth2
options lnet networks=tcp0(bond0)
...

You can check /proc/net/bonding/bond0 afterwards for information.

bob


On 10/21/2010 9:59 AM, Bob Ball wrote:> Why do you need both active?  If one is a backup to the other, then bond
> them as a primary/backup pair, meaning only one will be active at at a
> time, ie, your designated primary (unless it goes down).
>
> bob
>
> On 10/21/2010 9:51 AM, Brock Palen wrote:
>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>>
>>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>>>> interface,
>>>>
>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is
eth0
>>>> 10.164.0.10
>>> They look like they are on the same subnet if you are using /24 ...
>> You are correct
>>
>> Both interfaces are on the same subnet:
>>
>> [root at oss4-gb ~]# route
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface
>> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth0
>> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth4
>> 169.254.0.0     *               255.255.0.0     U     0      0        0
eth4
>> default         10.164.0.1      0.0.0.0         UG    0      0        0
eth0
>>
>> There is no way to mask the lustre service away from the 1Gb interface?
>>
>>>> In modprobe.conf I have:
>>>>
>>>> options lnet networks=tcp0(eth4)
>>>>
>>>> lctl list_nids 10.164.0.166 at tcp
>>>>
>>>>>    From a host I run:
>>>> lctl which_nid oss4 10.164.0.166 at tcp
>>>>
>>>> But yet I still see traffic over eth0 the 1Gb management
network,
>>>> might higher than I would expect (upto 100MB/s) The management
>>>> interface is oss4-gb  So If I do from a client:
>>>>
>>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>>>>
>>>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa
only the
>>>> 10Gb interface, do I have so much traffic over the 1Gb
interface?
>>>> There is some traffic on the 10Gb interface, but I would like
to tell
>>>> lustre ''don''t use the 1Gb
interface''.
>>> If they are on the same subnet, its possible that the 1GbE sees the
arp
>>> response first.  And then its pretty much guaranteed to have the
traffic
>>> go out that port.
>>>
>>> If your subnets are different, this shouldn''t be the
issue.
>>>
>>>> Thanks!
>>>>
>>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
>>>> brockp at umich.edu (734)936-1985
>>>>
>>>>
>>>>
>>>> _______________________________________________ Lustre-discuss
>>>> mailing list Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> -- 
>>> Joseph Landman, Ph.D
>>> Founder and CEO
>>> Scalable Informatics Inc.
>>> email: landman at scalableinformatics.com
>>> web  : http://scalableinformatics.com
>>>          http://scalableinformatics.com/jackrabbit
>>> phone: +1 734 786 8423 x121
>>> fax  : +1 866 888 3112
>>> cell : +1 734 612 4615
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Joe Landman

2010-Oct-21 14:41 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

On 10/21/2010 10:29 AM, Brock Palen wrote:>
>
>> Why do you need both active?  If one is a backup to the other, then
>> bond them as a primary/backup pair, meaning only one will be active
>> at at a time, ie, your designated primary (unless it goes down).
>
> We could do this, the 10Gb drivers have been such a pain for us we
> wanted to have a ''back door'' management network to get to
the box
> should we have issues with the 10Gb driver.
>
> Oddly I ran:
>
> ifconfig eth0 down
>
> and I could nolonger ping the box over the eth4 interface, I had to
> power cycle it form management.  Very odd.
>
Hmmm ... what 1GbE and 10GbE NICs?  Which kernel?  We maintain kernel 
RPMs and tarballs for our customers, and if one of ours will work for 
you, you are welcome to it.

When we set up clusters and/or storage clusters, we typically 
(completely) isolate the (management and storage fabric) nets from each 
other.  We don''t recommend putting interfaces on the same subnet unless
there is a clear intention to channel bond.

You may be able to tell the box to ignore arps on the eth0 net, and then 
hand edit the arp table (arp -s ...) to force a connection.  However, 
this is somewhat convoluted and a management pain.

For out of band work, a kvm over IP could be helpful.  Does the box 
support kvm over ip from IPMI?  If not, you could get a drop in unit 
that does this (we use these for older less capable nodes when needed).


>>
>> bob
>>
>> On 10/21/2010 9:51 AM, Brock Palen wrote:
>>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>>>
>>>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>>>> We recently added a new oss, it has 1 1Gb interface and 1
>>>>> 10Gb interface,
>>>>>
>>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface
>>>>> is eth0 10.164.0.10
>>>> They look like they are on the same subnet if you are using /24
>>>> ...
>>> You are correct
>>>
>>> Both interfaces are on the same subnet:
>>>
>>> [root at oss4-gb ~]# route Kernel IP routing table Destination
>>> Gateway         Genmask         Flags Metric Ref    Use Iface
>>> 10.164.0.0      *               255.255.248.0   U     0      0
>>> 0 eth0 10.164.0.0      *               255.255.248.0   U     0
>>> 0        0 eth4 169.254.0.0     *               255.255.0.0     U
>>> 0      0        0 eth4 default         10.164.0.1      0.0.0.0
>>> UG    0      0        0 eth0
>>>
>>> There is no way to mask the lustre service away from the 1Gb
>>> interface?
>>>
>>>>> In modprobe.conf I have:
>>>>>
>>>>> options lnet networks=tcp0(eth4)
>>>>>
>>>>> lctl list_nids 10.164.0.166 at tcp
>>>>>
>>>>>> From a host I run:
>>>>> lctl which_nid oss4 10.164.0.166 at tcp
>>>>>
>>>>> But yet I still see traffic over eth0 the 1Gb management
>>>>> network, might higher than I would expect (upto 100MB/s)
The
>>>>> management interface is oss4-gb  So If I do from a client:
>>>>>
>>>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>>>>>
>>>>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa
>>>>> only the 10Gb interface, do I have so much traffic over the
>>>>> 1Gb interface? There is some traffic on the 10Gb interface,
>>>>> but I would like to tell lustre ''don''t
use the 1Gb
>>>>> interface''.
>>>> If they are on the same subnet, its possible that the 1GbE sees
>>>> the arp response first.  And then its pretty much guaranteed to
>>>> have the traffic go out that port.
>>>>
>>>> If your subnets are different, this shouldn''t be the
issue.
>>>>
>>>>> Thanks!
>>>>>
>>>>> Brock Palen www.umich.edu/~brockp Center for Advanced
>>>>> Computing brockp at umich.edu (734)936-1985
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Lustre-discuss mailing list Lustre-discuss at
lists.lustre.org
>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>> -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics
>>>> Inc. email: landman at scalableinformatics.com web  :
>>>> http://scalableinformatics.com
>>>> http://scalableinformatics.com/jackrabbit phone: +1 734 786
>>>> 8423 x121 fax  : +1 866 888 3112 cell : +1 734 612 4615
>>>> _______________________________________________ Lustre-discuss
>>>> mailing list Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>>
>>>>
>>> _______________________________________________ Lustre-discuss
>>> mailing list Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>> _______________________________________________ Lustre-discuss
>> mailing list Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>
> _______________________________________________ Lustre-discuss
> mailing list Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Lundgren, Andrew

2010-Oct-21 15:07 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

Just as a FYI, you can set most of the bonding options in the ifcfg-bond0 file.

IE:

BONDING_OPTS="arp_ip_target=10.248.58.254 arp_interval=500
mode=active-backup primary=eth0"

Then your modprobe.conf only needs:

alias bond0 bonding

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of Bob Ball
Sent: Thursday, October 21, 2010 8:41 AM
To: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] controlling which eth interface lustre uses

OK, quick startup on bonding, as we use it for our OSS here.

We have 2 NICs we bond (SL5.5, an RHEL variant), eth1 at 1Gb and eth2 at 
10Gb using Myricom hardware.  10.10.1.2 is the network gateway, a 
convenient arp target that should always be up.

[root at umdist04 network-scripts]# cat ifcfg-bond0
DEVICE=bond0
IPADDR=10.10.2.24
NETMASK=255.255.252.0
BOOTPROTO=static
ONBOOT=yes
VLAN=no
MTU=1500

[root at umdist04 network-scripts]# cat ifcfg-eth1
DEVICE=eth1
ONBOOT=no
BOOTPROTO=none
MTU=1500
MASTER=bond0
SLAVE=yes

[root at umdist04 network-scripts]# cat ifcfg-eth2
DEVICE=eth2
BOOTPROTO=none
ONBOOT=no
MTU=1500
MASTER=bond0
SLAVE=yes

[root at umdist04 etc]# cat modprobe.conf
...
alias eth1 bnx2
alias eth2 myri10ge
...
alias bond0 bonding
options bond0 mode=1 arp_interval=250 arp_ip_target=10.10.1.2 primary=eth2
options lnet networks=tcp0(bond0)
...

You can check /proc/net/bonding/bond0 afterwards for information.

bob


On 10/21/2010 9:59 AM, Bob Ball wrote:> Why do you need both active?  If one is a backup to the other, then bond
> them as a primary/backup pair, meaning only one will be active at at a
> time, ie, your designated primary (unless it goes down).
>
> bob
>
> On 10/21/2010 9:51 AM, Brock Palen wrote:
>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>>
>>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>>>> interface,
>>>>
>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is
eth0
>>>> 10.164.0.10
>>> They look like they are on the same subnet if you are using /24 ...
>> You are correct
>>
>> Both interfaces are on the same subnet:
>>
>> [root at oss4-gb ~]# route
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref    Use
Iface
>> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth0
>> 10.164.0.0      *               255.255.248.0   U     0      0        0
eth4
>> 169.254.0.0     *               255.255.0.0     U     0      0        0
eth4
>> default         10.164.0.1      0.0.0.0         UG    0      0        0
eth0
>>
>> There is no way to mask the lustre service away from the 1Gb interface?
>>
>>>> In modprobe.conf I have:
>>>>
>>>> options lnet networks=tcp0(eth4)
>>>>
>>>> lctl list_nids 10.164.0.166 at tcp
>>>>
>>>>>    From a host I run:
>>>> lctl which_nid oss4 10.164.0.166 at tcp
>>>>
>>>> But yet I still see traffic over eth0 the 1Gb management
network,
>>>> might higher than I would expect (upto 100MB/s) The management
>>>> interface is oss4-gb  So If I do from a client:
>>>>
>>>> lctl which_nid oss4-gb 10.164.0.10 at tcp
>>>>
>>>> Why If I have netwroks=tcp0(eth4)  and that list_nids showa
only the
>>>> 10Gb interface, do I have so much traffic over the 1Gb
interface?
>>>> There is some traffic on the 10Gb interface, but I would like
to tell
>>>> lustre ''don''t use the 1Gb
interface''.
>>> If they are on the same subnet, its possible that the 1GbE sees the
arp
>>> response first.  And then its pretty much guaranteed to have the
traffic
>>> go out that port.
>>>
>>> If your subnets are different, this shouldn''t be the
issue.
>>>
>>>> Thanks!
>>>>
>>>> Brock Palen www.umich.edu/~brockp Center for Advanced Computing
>>>> brockp at umich.edu (734)936-1985
>>>>
>>>>
>>>>
>>>> _______________________________________________ Lustre-discuss
>>>> mailing list Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>> -- 
>>> Joseph Landman, Ph.D
>>> Founder and CEO
>>> Scalable Informatics Inc.
>>> email: landman at scalableinformatics.com
>>> web  : http://scalableinformatics.com
>>>          http://scalableinformatics.com/jackrabbit
>>> phone: +1 734 786 8423 x121
>>> fax  : +1 866 888 3112
>>> cell : +1 734 612 4615
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Christopher J.Walker

2010-Oct-21 16:38 UTC

head link

[Lustre-discuss] controlling which eth interface lustre uses

Charles Taylor wrote:> On Oct 21, 2010, at 9:51 AM, Brock Palen wrote:
> 
>> On Oct 21, 2010, at 9:48 AM, Joe Landman wrote:
>>
>>> On 10/21/2010 09:37 AM, Brock Palen wrote:
>>>> We recently added a new oss, it has 1 1Gb interface and 1 10Gb
>>>> interface,
>>>>
>>>> The 10Gb interface is eth4 10.164.0.166 The 1Gb   interface is
eth0
>>>> 10.164.0.10
>>> They look like they are on the same subnet if you are using /24 ...
>> You are correct
>>
>> Both interfaces are on the same subnet:
>>
>> [root at oss4-gb ~]# route
>> Kernel IP routing table
>> Destination     Gateway         Genmask         Flags Metric Ref     
>> Use Iface
>> 10.164.0.0      *               255.255.248.0   U     0       
>> 0        0 eth0
>> 10.164.0.0      *               255.255.248.0   U     0       
>> 0        0 eth4
>> 169.254.0.0     *               255.255.0.0     U     0       
>> 0        0 eth4
>> default         10.164.0.1      0.0.0.0         UG    0       
>> 0        0 eth0
>>
>> There is no way to mask the lustre service away from the 1Gb  
>> interface?
> 
> We struggle with this as well but have not found a way to enforce  
> it.   You would think that lustre would honor the NID for incoming  
> *and* outgoing traffic but apparently the standard linux routing table  
> determines the outbound path and lnet is out of the picture.     Thus,  
> you end up having to assign separate subnets, shut down your eth0 (in  
> this case) interface, or use static routes to fine tune the routing  
> decisions (where possible).
> 
> We wish that the outgoing decision could be made on the basis of the  
> *NID* but that might be too intrusive with regard to the linux  
> kernel''s network stack so I can understand, somewhat, why it is
not
> that way.   Still, it is somewhat counter-intuitive to go through all  
> the trouble of having the LNET layer and assigning NIDs only to have  
> them disregarded for outbound traffic.
> 
> Perhaps there is a way around this that we don''t know about.
Source based routing. You need both to make sure that each interface 
ignores arp requests to the other IP, and that traffic from the 10Gig IP 
is routed out of that card.

This is the way I solved the problem:


#!/bin/sh
# Script to use policy based routing to ensure lustre traffic goes in 
and out from eth2.
# First make sure that eth0 and eth2 only respond to arp requests for 
their own ip
echo  1 >/proc/sys/net/ipv4/conf/all/arp_ignore

# Now add a source based route - if the route is from the ip address of 
eth2, then send traffic via it
ip route add 10.1.0.0/16 dev eth2 tab 2
ip rule add from $(ifconfig eth2 | awk ''BEGIN {FS="[
:]+"};/inet
addr/{print $4}'') tab 2 priority 600


Having said this, I don''t think it''s what I''d set up
now.
I''d use IPMI to get a serial console on the machine as my back door 
and/or use LACP bonding (can''t remember which mode). If you do this,
and
IPMI shares the same physical port as eth0, then it is probably best to 
use eth1 as the failover link[1].


Chris
[1] We had a brief try with IPMI with eth0 and eth1 bonded - DHCP 
packets got out, but the replies didn''t get back. Presumably the switch
is sending the reply to eth1 rather than eth0 (swapping the physical 
cables around was suggested, but we didn''t try this).

Lustre discuss - Oct 2010 - controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses

[Lustre-discuss] controlling which eth interface lustre uses