thr3ads.net - Linux Ethernet Bridging - [Bridge] Correct PVID behavior with bridge's VLAN filtering on/off? [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Ido Schimmel

2018-Dec-12 09:02 UTC

[Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?

On Tue, Dec 11, 2018 at 11:48:21AM -0800, Florian Fainelli
wrote:> Hi Nikolay, Roopa, Jiri, Ido,
> 
> When a bridge has vlan_filtering=0 and notifies a switch driver through
> HOST_OBJ_MDB about MC addresses that the CPU/management port is
> interested in getting MC traffic for, I am seeing that the mdb->vid is
> set to 0 because br_allowed_ingress() checks for BROPT_VLAN_ENABLED
> which is now disabled and so we never populated *vid to anything but 0
> because the caller: br_handle_frame_finish() zeroed it out.
s/br_handle_frame_finish()/br_dev_xmit()/ ? Since you're talking about
HOST_OBJ_MDB
> This creates a problem with the b53 DSA switch driver because in order
> to match the bridge's default_pvid, we did program the switch's
"default
> tag" to be 1, which gets used for all untagged frames that ingress the
> switch (which AFAICT is correct behavior for PVID).
Not sure I'm following. If bridge is not VLAN-aware, then where do you
see 'default_pvid' being used?
> Despite having turned off VLAN filtering in the switch such that it does
> allow ingress of packets with a VID that is not present in the VLAN
> table (violation), Multicast addresses do behave differently and we
> really must be strictly matching the programmed PVID in order for MC
> frames to ingress the switch even with VLAN filtering turned off.
> 
> So with all that being written, should the bridge still be sending MDB
> notifications and use the bridge's default_pvid even with
> vlan_filtering=0? And if we did that, what use case could we be possibly
> breaking?
> 
> Let me know if this is not clear so I can provide mode details.
I think you need to provide more details about the device you're working
with. I can explain what we're doing in mlxsw for reference.

When you use a VLAN-unaware bridge w/o VLAN devices, we make sure all
untagged packets get tagged with some arbitrary VLAN (now 1, soon 4095).
You never see this VLAN on the wire, since we remove it before sending
the packets. It is only used because all packets in the ASIC must be
tagged.

After we have a VLAN we classify the packet to a FID (bridge) and it
does {FID,DMAC} lookup in the FDB (MDB).

IIUC, your problem is that you also need to tag all the packets (you
used '1', can be something else), but then you program the MDB entry
according to the VLAN passed in the notification ('0') and not use
('1'). We completely ignore the VID in this case and use the FID which
we lookup based on the ifindex of the bridge.

Florian Fainelli

2018-Dec-12 19:52 UTC

head link

[Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?

On 12/12/18 1:02 AM, Ido Schimmel wrote:> On Tue, Dec 11, 2018 at 11:48:21AM -0800, Florian Fainelli wrote:
>> Hi Nikolay, Roopa, Jiri, Ido,
>>
>> When a bridge has vlan_filtering=0 and notifies a switch driver through
>> HOST_OBJ_MDB about MC addresses that the CPU/management port is
>> interested in getting MC traffic for, I am seeing that the mdb->vid
is
>> set to 0 because br_allowed_ingress() checks for BROPT_VLAN_ENABLED
>> which is now disabled and so we never populated *vid to anything but 0
>> because the caller: br_handle_frame_finish() zeroed it out.
> 
> s/br_handle_frame_finish()/br_dev_xmit()/ ? Since you're talking about
> HOST_OBJ_MDB
This affects the bridge ingress path as well, since I use HOST_OBJ_MDB
to indicate whether the CPU port wants to receive multicast,
transmitting multicast from the CPU port is almost never a problem. Is
that a correct use of HOST_OBJ_MDB?
> 
>> This creates a problem with the b53 DSA switch driver because in order
>> to match the bridge's default_pvid, we did program the switch's
"default
>> tag" to be 1, which gets used for all untagged frames that ingress
the
>> switch (which AFAICT is correct behavior for PVID).
> 
> Not sure I'm following. If bridge is not VLAN-aware, then where do you
> see 'default_pvid' being used?
A key detail I missed is that this is done with 4.9 (for now, in the
process of forward porting fixes to net-next right now) which does not
have this commit from Andrew:

2ea7a679ca2abd251c1ec03f20508619707e1749 ("net: dsa: Don't add vlans
when vlan filtering is disabled")

so the switch is actually VLAN aware, just it does not do strict VID
violation enforcement policy, I like that behavior, but Jiri corrected
me that this is not quite how it is defined.m.
> 
>> Despite having turned off VLAN filtering in the switch such that it
does
>> allow ingress of packets with a VID that is not present in the VLAN
>> table (violation), Multicast addresses do behave differently and we
>> really must be strictly matching the programmed PVID in order for MC
>> frames to ingress the switch even with VLAN filtering turned off.
>>
>> So with all that being written, should the bridge still be sending MDB
>> notifications and use the bridge's default_pvid even with
>> vlan_filtering=0? And if we did that, what use case could we be
possibly
>> breaking?
>>
>> Let me know if this is not clear so I can provide mode details.
> 
> I think you need to provide more details about the device you're
working
> with. I can explain what we're doing in mlxsw for reference.
> 
> When you use a VLAN-unaware bridge w/o VLAN devices, we make sure all
> untagged packets get tagged with some arbitrary VLAN (now 1, soon 4095).
> You never see this VLAN on the wire, since we remove it before sending
> the packets. It is only used because all packets in the ASIC must be
> tagged.
> 
> After we have a VLAN we classify the packet to a FID (bridge) and it
> does {FID,DMAC} lookup in the FDB (MDB).
> 
> IIUC, your problem is that you also need to tag all the packets (you
> used '1', can be something else), but then you program the MDB
entry
> according to the VLAN passed in the notification ('0') and not use
> ('1'). We completely ignore the VID in this case and use the FID
which
> we lookup based on the ifindex of the bridge.
> 
We do not have a concept of a FID with Broadcom switches, so we can
either use a reserved VLAN ID to emulate that behavior and do individual
MAC address learning which hashes into VID,MAC.

The switch has a "default 802.1Q tag" which gets used for untagged
packets. Internally the switch normalizes all incoming frames (when
802.1Q is enabled) to have a double VLAN tag, and untagged frames get
mapped to that "default 802.1Q/PVID tag" in the processing pipeline,
and
then when they ingress the destination switch port, they can get
untagged again using the ingress port's "default 802.1Q/PVID tag"
again.

The CPU port remains in the "default 802.1Q tag" set to 0, because
that
is also the configuration for non-bridge ports, and I need that to
continue getting non-bridge ports to function normally and not be
subjected to vlan_filtering = 1 being applied on the bridge (I will send
a documentation patch that hopefully clarifies what the correct port
behavior is and request feedback on that).

So here are essentially 3 things that could be fixed/tackled more or
less independently:

- because vlan_filtering = 0, we have the HOST_OBJ_MDB request coming
with mdb->vid = 0, which is expected with the current bridge code

- because the switch is made 802.1Q/VLAN aware, we have the ports that
are bridge member configured with PVID = default_pvid (old kernel
behavior, prior to Andrew's change), such that untagged frames show up
untagged correctly at the network device level

- CPU/management port's default untagged VID is 0 which matches
mdb->vid, but the bridged port, which is the ingress port for MC traffic
is in VID 1 and where MC ingress filter checking is done. So there is a
VID mismatch, and despite filtering being turned off, MC traffic does
not evade that restriction (could be a misconfiguration on my side,
could not find something that would allow it to just pass through).

With this one liner change both vlan_filtering states now work correctly
with respect to MC:

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index b6de4f457161..fe446e971456 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -482,6 +482,7 @@ bool br_allowed_ingress(const struct net_bridge *br,
         */
        if (!br->vlan_enabled) {
                BR_INPUT_SKB_CB(skb)->vlan_filtered = false;
+               *vid = br->default_pvid;
                return true;
        }

Or I suppose that I could just backport Andrew's patch and that would
remove all VLAN awareness in the bridge, which would likely solve the
problem as well, and/or find a way to make sure that MC flows do bypass
VLAN filtering after all when vlan_filtering = 0.

Thanks for your patience.
-- 
Florian

Florian Fainelli

2018-Dec-15 18:10 UTC

head link

[Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?

Le 12/12/18 ? 1:02 AM, Ido Schimmel a ?crit?:> On Tue, Dec 11, 2018 at 11:48:21AM -0800, Florian Fainelli wrote:
>> Hi Nikolay, Roopa, Jiri, Ido,
>>
>> When a bridge has vlan_filtering=0 and notifies a switch driver through
>> HOST_OBJ_MDB about MC addresses that the CPU/management port is
>> interested in getting MC traffic for, I am seeing that the mdb->vid
is
>> set to 0 because br_allowed_ingress() checks for BROPT_VLAN_ENABLED
>> which is now disabled and so we never populated *vid to anything but 0
>> because the caller: br_handle_frame_finish() zeroed it out.
> 
> s/br_handle_frame_finish()/br_dev_xmit()/ ? Since you're talking about
> HOST_OBJ_MDB
> 
>> This creates a problem with the b53 DSA switch driver because in order
>> to match the bridge's default_pvid, we did program the switch's
"default
>> tag" to be 1, which gets used for all untagged frames that ingress
the
>> switch (which AFAICT is correct behavior for PVID).
> 
> Not sure I'm following. If bridge is not VLAN-aware, then where do you
> see 'default_pvid' being used?
> 
>> Despite having turned off VLAN filtering in the switch such that it
does
>> allow ingress of packets with a VID that is not present in the VLAN
>> table (violation), Multicast addresses do behave differently and we
>> really must be strictly matching the programmed PVID in order for MC
>> frames to ingress the switch even with VLAN filtering turned off.
>>
>> So with all that being written, should the bridge still be sending MDB
>> notifications and use the bridge's default_pvid even with
>> vlan_filtering=0? And if we did that, what use case could we be
possibly
>> breaking?
>>
>> Let me know if this is not clear so I can provide mode details.
> 
> I think you need to provide more details about the device you're
working
> with. I can explain what we're doing in mlxsw for reference.
> 
> When you use a VLAN-unaware bridge w/o VLAN devices, we make sure all
> untagged packets get tagged with some arbitrary VLAN (now 1, soon 4095).
> You never see this VLAN on the wire, since we remove it before sending
> the packets. It is only used because all packets in the ASIC must be
> tagged.
> 
> After we have a VLAN we classify the packet to a FID (bridge) and it
> does {FID,DMAC} lookup in the FDB (MDB).
> 
> IIUC, your problem is that you also need to tag all the packets (you
> used '1', can be something else), but then you program the MDB
entry
> according to the VLAN passed in the notification ('0') and not use
> ('1'). We completely ignore the VID in this case and use the FID
which
> we lookup based on the ifindex of the bridge.
> 
Another thing that seems inconsistent or rather possibly problematic to
deal with is the following:

- create the bridge with VLAN filtering set, this leads to programming
VLAN entries into the switch through switchdev notifications, that is
expected and working

- turn off VLAN filtering on that bridge, this trickles down through
attributes notification to the switch driver which now disables ingress
VID checking (egress directed in Broadcom B53 is not easily
enforceable), we still have VLAN entries programmed into the switch, in
particular, the port's default VLAN/PVID, which is contributing to my
problem mentioned above

- bridge is now requesting MDB programming to be done with VID=0 since
the bridge is now VLAN-unaware

One might expect that when turning off VLAN filtering, the bridge layer
should also remove any programmed VLAN entries?

In spectrum_switchdev.c an error is issued to indicate that changing
VLAN filtering is not possible once the bridge has been created with
VLAN filtering on initially.

This is not necessarily something I want to restrict within B53, because
we ought to support dynamically turning on/off VLAN filtering on the
switch device and driver. I am not aware of an use case for that expect
my own tests so far, but clearly we can support it, so why not.

Instead of the patch I copied in my previous response where I would
change br_allowed_ingress(), I am considering modifying the port's
default PVID to match whatever the default VLAN is when not using a
bridge (0 in my case) when the currently configured PVID is not that
default PVID, conversely when re-enabling VLAN filtering, putting the
port back on the bridge's default_pvid.

Ido, does that make sense to you or would you advocate not to bother at
all with that use case and do what spectrum_switchdev.c does?

I would also appreciate if you could take a look at this patch since it
would answer a lot of my questions, thank you very much!

http://patchwork.ozlabs.org/patch/1012404/
-- 
Florian

Linux Ethernet Bridging - Dec 2018 - [Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?

[Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?

[Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?

[Bridge] Correct PVID behavior with bridge's VLAN filtering on/off?