thr3ads.net - Linux Ethernet Bridging - [Bridge] linux bridge does not forward arp reply back packets in a vmware vm [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Stephen Hemminger

2017-Dec-16 01:47 UTC

[Bridge] linux bridge does not forward arp reply back packets in a vmware vm

On Fri, 15 Dec 2017 18:29:58 +0200
Adrian P <adrian27oradea at gmail.com> wrote:
> On Fri, Dec 15, 2017 at 5:55 PM, Stephen Hemminger
> <stephen at networkplumber.org> wrote:
> > On Fri, 15 Dec 2017 15:37:39 +0200
> > Adrian P <adrian27oradea at gmail.com> wrote:
> >  
> >> Hello,
> >>
> >> I have a strange issue with a linux bridge created by
> >> openstack-neutron (pike release). This linux bridge is hosted in a
> >> vmware VM running latest CentOS 7, with a single network interface
in
> >> promiscuous mode.
> >>
> >> From openstack neutron perspective, the networking setup is
simple: a
> >> single flat external provider network, with a single cirros VM
> >> instance connected to it.
> >>
> >> Therefore, in the linux bridge running in the vmware host, I have
3 interfaces:
> >>
> >> # brctl show
> >> bridge name     bridge id               STP enabled     interfaces
> >> brq025a9a94-58          8000.005056a6b378       no             
ens160
> >>                                                        
tap2eb4cad6-cd
> >>    <----- neutron DHCP agent tap interface
> >>                                                        
tap6d31a191-9f
> >>    <----- cirros VM instance tap interface
> >>
> >> The ens160 is the "physical" CentOS 7 host interface,
that is in
> >> promiscuous mode.
> >>
> >> The  tap2eb4cad6-cd tap interface is the neutron DHCP agent
interface,
> >> and the tap6d31a191-9f tap interface is used by the cirros VM
> >> instance.
> >>
> >> The problem is the following:
> >>
> >> With a tcpdump, I am able to see the arp request (ARP, Request
who-has
> >> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM
instance on
> >> tap interface tap6d31a191-9f, and well as on the bridge itself
> >> (brq025a9a94-58). However, the reply back to the arp request
(Reply
> >> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM
> >> instance anymore. With tcpdump, I am able to see the arp reply
back
> >> packets in the bridge (brq025a9a94-58), however they do not show
up
> >> anymore on the cirros VM instance tap interface tap6d31a191-9f.
> >>
> >> To me it seems that for whatever reason, the bridge does not
forward
> >> the arp reply back packets to the cirros VM tap interface, and I
do
> >> not understand why. The strange thing is that after a while, for
> >> apparently no reason, a single arp reply back packet gets through
the
> >> bridge and the tap interface, and the arp table gets updated with
> >> correct IP address in the cirros VM instance. However, if I clean
up
> >> the arp table in the cirros VM instance, it takes again 10 to 15
> >> minutes of continuously sending arp requests, until a single arp
reply
> >> back packets gets through.
> >>
> >> I was banging my head to the table for a few days with this issue,
and
> >> finally, for apparent no reason, I manually configured the bridge
max
> >> aging time to 0, to convert it in a hub, and from that moment
> >> everything started to work without any issue. Still, I do no
> >> understand why is this happening, and obviously I cannot manually
set
> >> up the bridge aging time to 0 all the time in all the bridges
> >> openstack neutron creates automatically.
> >>
> >> Any thoughts?
> >>
> >> Many thanks in advance.
> >>
> >> Best regards,
> >> Adrian  
> >
> > Does each tap instance and the ens160 have a different and valid
Ethernet
> > address?  Also make sure the these are in the bridge forwarding table.
> 
> Yes, they have valid Ethernet addresses, and they do show up in the
> forwarding table twice, see below:
> 
> # ip addr
> <...>
> 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> brq025a9a94-58 state UP qlen 1000
>     link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::250:56ff:fea6:b378/64 scope link
>        valid_lft forever preferred_lft forever
> 4: tap2eb4cad6-cd at if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
> qdisc noqueue master brq025a9a94-58 state UP qlen 1000
>     link/ether 8a:b2:15:4c:96:55 brd ff:ff:ff:ff:ff:ff link-netnsid 0
> 5: brq025a9a94-58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UP qlen 1000
>     link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff
>     inet 10.20.21.249/24 brd 10.20.21.255 scope global brq025a9a94-58
>        valid_lft forever preferred_lft forever
>     inet6 fe80::803d:d0ff:fe2e:3ae4/64 scope link
>        valid_lft forever preferred_lft forever
> 6: tap6d31a191-9f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> pfifo_fast master brq025a9a94-58 state UNKNOWN qlen 1000
>     link/ether fe:16:3e:9a:04:95 brd ff:ff:ff:ff:ff:ff
>     inet6 fe80::fc16:3eff:fe9a:495/64 scope link
>        valid_lft forever preferred_lft forever
> 
> # brctl showmacs brq025a9a94-58
> port no mac addr                is local?       ageing timer
>   1     00:50:56:a6:b3:78       yes                0.00
>   1     00:50:56:a6:b3:78       yes                0.00
>   2     8a:b2:15:4c:96:55       yes                0.00
>   2     8a:b2:15:4c:96:55       yes                0.00
>   3     fe:16:3e:9a:04:95       yes                0.00
>   3     fe:16:3e:9a:04:95       yes                0.00

Since there are multiple entries per port maybe you are also using VLANs?

Adrian P

2017-Dec-16 07:12 UTC

head link

[Bridge] linux bridge does not forward arp reply back packets in a vmware vm

On Sat, Dec 16, 2017 at 3:47 AM, Stephen Hemminger
<stephen at networkplumber.org> wrote:> On Fri, 15 Dec 2017 18:29:58 +0200
> Adrian P <adrian27oradea at gmail.com> wrote:
>
>> On Fri, Dec 15, 2017 at 5:55 PM, Stephen Hemminger
>> <stephen at networkplumber.org> wrote:
>> > On Fri, 15 Dec 2017 15:37:39 +0200
>> > Adrian P <adrian27oradea at gmail.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> I have a strange issue with a linux bridge created by
>> >> openstack-neutron (pike release). This linux bridge is hosted
in a
>> >> vmware VM running latest CentOS 7, with a single network
interface in
>> >> promiscuous mode.
>> >>
>> >> From openstack neutron perspective, the networking setup is
simple: a
>> >> single flat external provider network, with a single cirros VM
>> >> instance connected to it.
>> >>
>> >> Therefore, in the linux bridge running in the vmware host, I
have 3 interfaces:
>> >>
>> >> # brctl show
>> >> bridge name     bridge id               STP enabled    
interfaces
>> >> brq025a9a94-58          8000.005056a6b378       no            
ens160
>> >>                                                        
tap2eb4cad6-cd
>> >>    <----- neutron DHCP agent tap interface
>> >>                                                        
tap6d31a191-9f
>> >>    <----- cirros VM instance tap interface
>> >>
>> >> The ens160 is the "physical" CentOS 7 host
interface, that is in
>> >> promiscuous mode.
>> >>
>> >> The  tap2eb4cad6-cd tap interface is the neutron DHCP agent
interface,
>> >> and the tap6d31a191-9f tap interface is used by the cirros VM
>> >> instance.
>> >>
>> >> The problem is the following:
>> >>
>> >> With a tcpdump, I am able to see the arp request (ARP, Request
who-has
>> >> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM
instance on
>> >> tap interface tap6d31a191-9f, and well as on the bridge itself
>> >> (brq025a9a94-58). However, the reply back to the arp request
(Reply
>> >> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros
VM
>> >> instance anymore. With tcpdump, I am able to see the arp reply
back
>> >> packets in the bridge (brq025a9a94-58), however they do not
show up
>> >> anymore on the cirros VM instance tap interface
tap6d31a191-9f.
>> >>
>> >> To me it seems that for whatever reason, the bridge does not
forward
>> >> the arp reply back packets to the cirros VM tap interface, and
I do
>> >> not understand why. The strange thing is that after a while,
for
>> >> apparently no reason, a single arp reply back packet gets
through the
>> >> bridge and the tap interface, and the arp table gets updated
with
>> >> correct IP address in the cirros VM instance. However, if I
clean up
>> >> the arp table in the cirros VM instance, it takes again 10 to
15
>> >> minutes of continuously sending arp requests, until a single
arp reply
>> >> back packets gets through.
>> >>
>> >> I was banging my head to the table for a few days with this
issue, and
>> >> finally, for apparent no reason, I manually configured the
bridge max
>> >> aging time to 0, to convert it in a hub, and from that moment
>> >> everything started to work without any issue. Still, I do no
>> >> understand why is this happening, and obviously I cannot
manually set
>> >> up the bridge aging time to 0 all the time in all the bridges
>> >> openstack neutron creates automatically.
>> >>
>> >> Any thoughts?
>> >>
>> >> Many thanks in advance.
>> >>
>> >> Best regards,
>> >> Adrian
>> >
>> > Does each tap instance and the ens160 have a different and valid
Ethernet
>> > address?  Also make sure the these are in the bridge forwarding
table.
>>
>> Yes, they have valid Ethernet addresses, and they do show up in the
>> forwarding table twice, see below:
>>
>> # ip addr
>> <...>
>> 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq
master
>> brq025a9a94-58 state UP qlen 1000
>>     link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::250:56ff:fea6:b378/64 scope link
>>        valid_lft forever preferred_lft forever
>> 4: tap2eb4cad6-cd at if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu
1500
>> qdisc noqueue master brq025a9a94-58 state UP qlen 1000
>>     link/ether 8a:b2:15:4c:96:55 brd ff:ff:ff:ff:ff:ff link-netnsid 0
>> 5: brq025a9a94-58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc
>> noqueue state UP qlen 1000
>>     link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff
>>     inet 10.20.21.249/24 brd 10.20.21.255 scope global brq025a9a94-58
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::803d:d0ff:fe2e:3ae4/64 scope link
>>        valid_lft forever preferred_lft forever
>> 6: tap6d31a191-9f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc
>> pfifo_fast master brq025a9a94-58 state UNKNOWN qlen 1000
>>     link/ether fe:16:3e:9a:04:95 brd ff:ff:ff:ff:ff:ff
>>     inet6 fe80::fc16:3eff:fe9a:495/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> # brctl showmacs brq025a9a94-58
>> port no mac addr                is local?       ageing timer
>>   1     00:50:56:a6:b3:78       yes                0.00
>>   1     00:50:56:a6:b3:78       yes                0.00
>>   2     8a:b2:15:4c:96:55       yes                0.00
>>   2     8a:b2:15:4c:96:55       yes                0.00
>>   3     fe:16:3e:9a:04:95       yes                0.00
>>   3     fe:16:3e:9a:04:95       yes                0.00
>
>
> Since there are multiple entries per port maybe you are also using VLANs?
So I have checked one more time: the host is a vmware centos7 VM, with
a single interface (ens160 that can be seen above) connected to a
vmware virtual switch port group configured in promiscuous mode, that
removes the VLAN tagging (similar with access mode port in a physical
switch).

I have another environment that behaves the same, where the host is a
physical server, and the interface is a bonded interface, with two
slaves, and I also have two entries for each local interface in the
forwarding table:

# ip addr
<...>
6: eno5: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq
master bond2 portid 01000000000000000000004c4930394833 state UP qlen
1000
    link/ether fc:15:b4:13:e6:a3 brd ff:ff:ff:ff:ff:ff
7: eno6: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq
master bond2 portid 02000000000000000000004c4930394833 state UP qlen
1000
    link/ether fc:15:b4:13:e6:a3 brd ff:ff:ff:ff:ff:ff
<...>
10: bond2: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue master brq75a55ef7-4a state UP qlen 1000
    link/ether fc:15:b4:13:e6:a3 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fe15:b4ff:fe13:e6a3/64 scope link
       valid_lft forever preferred_lft forever
<...>
14: brq75a55ef7-4a: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP qlen 1000
    link/ether fc:15:b4:13:e6:a3 brd ff:ff:ff:ff:ff:ff
    inet 10.20.21.55/24 brd 10.20.21.255 scope global brq75a55ef7-4a
       valid_lft forever preferred_lft forever
15: tap44bc34bb-e2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast master brq75a55ef7-4a state UNKNOWN qlen 1000
    link/ether fe:16:3e:cc:dc:ec brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fecc:dcec/64 scope link
       valid_lft forever preferred_lft forever

# brctl show
bridge name     bridge id               STP enabled     interfaces
brq75a55ef7-4a          8000.fc15b413e6a3       no              bond2
                                                        tap44bc34bb-e2

# brctl showmacs brq75a55ef7-4a | grep yes
  2     fc:15:b4:13:e6:a3       yes                0.00
  2     fc:15:b4:13:e6:a3       yes                0.00
  1     fe:16:3e:cc:dc:ec       yes                0.00
  1     fe:16:3e:cc:dc:ec       yes                0.00

Linux Ethernet Bridging - Dec 2017 - [Bridge] linux bridge does not forward arp reply back packets in a vmware vm

[Bridge] linux bridge does not forward arp reply back packets in a vmware vm

[Bridge] linux bridge does not forward arp reply back packets in a vmware vm