thr3ads.net - Linux Ethernet Bridging - [Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Ido Schimmel

2019-Jul-29 17:51 UTC

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

On Mon, Jul 29, 2019 at 04:35:09PM +0200, Allan W. Nielsen
wrote:> The 07/29/2019 17:21, Nikolay Aleksandrov wrote:
> > On 29/07/2019 16:52, Allan W. Nielsen wrote:
> > > The 07/29/2019 15:50, Nikolay Aleksandrov wrote:
> > >> On 29/07/2019 15:22, Nikolay Aleksandrov wrote:
> > >>> Hi Allan,
> > >>> On 29/07/2019 15:14, Allan W. Nielsen wrote:
> > >>>> First of all, as mentioned further down in this
thread, I realized that our
> > >>>> implementation of the multicast floodmasks does not
align with the existing SW
> > >>>> implementation. We will change this, such that all
multicast packets goes to the
> > >>>> SW bridge.
> > >>>>
> > >>>> This changes things a bit, not that much.
> > >>>>
> > >>>> I actually think you summarized the issue we have
(after changing to multicast
> > >>>> flood-masks) right here:
> > >>>>
> > >>>> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> > >>>>>>> Actually you mentioned non-IP traffic, so
the querier stuff is not a problem. This
> > >>>>>>> traffic will always be flooded by the
bridge (and also a copy will be locally sent up).
> > >>>>>>> Thus only the flooding may need to be
controlled.
> > >>>>
> > >>>> This seems to be exactly what we need.
> > >>>>
> > >>>> Assuming we have a SW bridge (br0) with 4 slave
interfaces (eth0-3). We use this
> > >>>> on a network where we want to limit the flooding of
frames with dmac
> > >>>> 01:21:6C:00:00:01 (which is non IP traffic) to eth0
and eth1.
> > >>>>
> > >>>> One way of doing this could potentially be to support
the following command:
> > >>>>
> > >>>> bridge fdb add    01:21:6C:00:00:01 port eth0
> > >>>> bridge fdb append 01:21:6C:00:00:01 port eth1
> > >> And the fdbs become linked lists?
> > > Yes, it will most likely become a linked list
> > > 
> > >> So we'll increase the complexity for something that is
already supported by
> > >> ACLs (e.g. tc) and also bridge per-port multicast flood flag
?
> > > I do not think it can be supported with the facilities we have
today in tc.
> > > 
> > > We can do half of it (copy more fraems to the CPU) with tc, but
we can not limit
> > > the floodmask of a frame with tc (say we want it to flood to 2
out of 4 slave
> > > ports).
> > Why not ? You attach an egress filter for the ports and allow that
dmac on only
> > 2 of the ports.
> Because we want a solution which we eventually can offload in HW. And the
HW
> facilities we have is doing ingress processing (we have no egress ACLs in
this
> design), and if we try to offload an egress rule, with an ingress HW
facility,
> then we will run into other issues.
Can you please clarify what you're trying to achieve? I just read the
thread again and my impression is that you're trying to locally receive
packets with a certain link layer multicast address. Nik suggested
SIOCADDMULTI and I suggested a tc filter to get the packet to the CPU.

If you now want to limit the ports to which this packet is flooded, then
you can use tc filters in *software*:

# tc qdisc add dev eth2 clsact
# tc filter add dev eth2 egress pref 1 flower skip_hw \
	dst_mac 01:21:6C:00:00:01 action drop

If you want to forward the packet in hardware and locally receive it,
you can chain several mirred action and then a trap action.

Both options avoid HW egress ACLs which your design does not support.

Allan W. Nielsen

2019-Jul-30 06:27 UTC

head link

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

The 07/29/2019 20:51, Ido Schimmel wrote:> Can you please clarify what you're trying to achieve? I just read the
> thread again and my impression is that you're trying to locally receive
> packets with a certain link layer multicast address.Yes. The thread is also a bit confusing because we half way through realized
that we misunderstood how the multicast packets should be handled (sorry about
that). To begin with we had a driver where multicast packets was only copied to
the CPU if someone needed it. Andrew and Nikolay made us aware that this is not
how other drivers are doing it, so we changed the driver to include the CPU in
the default multicast flood-mask.

This changes the objective a bit. To begin with we needed to get more packets to
the CPU (which could have been done using tc ingress rules and a trap action).

Now after we changed the driver, we realized that we need something to limit the
flooding of certain L2 multicast packets. This is the new problem we are trying
to solve!

Example: Say we have a bridge with 4 slave interfaces, then we want to install a
forwarding rule saying that packets to a given L2-multicast MAC address, should
only be flooded to 2 of the 4 ports.

(instead of adding rules to get certain packets to the CPU, we are now adding
other rules to prevent other packets from going to the CPU and other ports where
they are not needed/wanted).

This is exactly the same thing as IGMP snooping does dynamically, but only for
IP multicast.

The "bridge mdb" allow users to manually/static add/del a port to a
multicast
group, but still it operates on IP multicast address (not L2 multicast
addresses).
> Nik suggested SIOCADDMULTI.It is not clear to me how this should be used to limit the flooding, maybe we
can make some hacks, but as far as I understand the intend of this is maintain
the list of addresses an interface should receive. I'm not sure this should
influence how for forwarding decisions are being made.
> and I suggested a tc filter to get the packet to the CPU.The TC solution is a good solution to the original problem where wanted to copy
more frames to the CPU. But we were convinced that this is not the right
approach, and that the CPU by default should receive all multicast packets, and
we should instead try to find a way to limit the flooding of certain frames as
an optimization.
> If you now want to limit the ports to which this packet is flooded, then
> you can use tc filters in *software*:
> 
> # tc qdisc add dev eth2 clsact
> # tc filter add dev eth2 egress pref 1 flower skip_hw \
> 	dst_mac 01:21:6C:00:00:01 action dropYes. This can work in the SW bridge.
> If you want to forward the packet in hardware and locally receive it,
> you can chain several mirred action and then a trap action.I'm not I fully understand how this should be done, but it does sound like
it
becomes quite complicated. Also, as far as I understand it will mean that we
will be using TCAM/ACL resources to do something that could have been done with
a simple MAC entry.
> Both options avoid HW egress ACLs which your design does not support.True, but what is wrong with expanding the functionality of the normal
forwarding/MAC operations to allow multiple destinations?

It is not an uncommon feature (I just browsed the manual of some common L2
switches and they all has this feature).

It seems to fit nicely into the existing user-interface:

bridge fdb add    01:21:6C:00:00:01 port eth0
bridge fdb append 01:21:6C:00:00:01 port eth1

It seems that it can be added to the existing implementation with out adding
significant complexity.

It will be easy to offload in HW.

I do not believe that it will be a performance issue, if this is a concern then
we may have to do a bit of benchmarking, or we can make it a configuration
option.

Long story short, we (Horatiu and I) learned a lot from the discussion here, and
I think we should try do a new patch with the learning we got. Then it is easier
to see what it actually means to the exiting code, complexity, exiting drivers,
performance, default behavioral, backwards compatibly, and other valid concerns.

If the patch is no good, and cannot be fixed, then we will go back and look
further into alternative solutions.

-- 
/Allan

Ido Schimmel

2019-Jul-30 07:06 UTC

head link

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

On Tue, Jul 30, 2019 at 08:27:22AM +0200, Allan W. Nielsen
wrote:> The 07/29/2019 20:51, Ido Schimmel wrote:
> > Can you please clarify what you're trying to achieve? I just read
the
> > thread again and my impression is that you're trying to locally
receive
> > packets with a certain link layer multicast address.
> Yes. The thread is also a bit confusing because we half way through
realized
> that we misunderstood how the multicast packets should be handled (sorry
about
> that). To begin with we had a driver where multicast packets was only
copied to
> the CPU if someone needed it. Andrew and Nikolay made us aware that this is
not
> how other drivers are doing it, so we changed the driver to include the CPU
in
> the default multicast flood-mask.
OK, so what prevents you from removing all other ports from the
flood-mask and letting the CPU handle the flooding? Then you can install
software tc filters to limit the flooding.
> This changes the objective a bit. To begin with we needed to get more
packets to
> the CPU (which could have been done using tc ingress rules and a trap
action).
> 
> Now after we changed the driver, we realized that we need something to
limit the
> flooding of certain L2 multicast packets. This is the new problem we are
trying
> to solve!
> 
> Example: Say we have a bridge with 4 slave interfaces, then we want to
install a
> forwarding rule saying that packets to a given L2-multicast MAC address,
should
> only be flooded to 2 of the 4 ports.
> 
> (instead of adding rules to get certain packets to the CPU, we are now
adding
> other rules to prevent other packets from going to the CPU and other ports
where
> they are not needed/wanted).
> 
> This is exactly the same thing as IGMP snooping does dynamically, but only
for
> IP multicast.
> 
> The "bridge mdb" allow users to manually/static add/del a port to
a multicast
> group, but still it operates on IP multicast address (not L2 multicast
> addresses).
> 
> > Nik suggested SIOCADDMULTI.
> It is not clear to me how this should be used to limit the flooding, maybe
we
> can make some hacks, but as far as I understand the intend of this is
maintain
> the list of addresses an interface should receive. I'm not sure this
should
> influence how for forwarding decisions are being made.
> 
> > and I suggested a tc filter to get the packet to the CPU.
> The TC solution is a good solution to the original problem where wanted to
copy
> more frames to the CPU. But we were convinced that this is not the right
> approach, and that the CPU by default should receive all multicast packets,
and
> we should instead try to find a way to limit the flooding of certain frames
as
> an optimization.
This can still work. In Linux, ingress tc filters are executed before the
bridge's Rx handler. The same happens in every sane HW. Ingress ACL is
performed before L2 forwarding. Assuming you have eth0-eth3 bridged and
you want to prevent packets with DMAC 01:21:6C:00:00:01 from egressing
eth2:

# tc filter add dev eth0 ingress pref 1 flower skip_sw \
	dst_mac 01:21:6C:00:00:01 action trap
# tc filter add dev eth2 egress pref 1 flower skip_hw \
	dst_mac 01:21:6C:00:00:01 action drop

The first filter is only present in HW ('skip_sw') and should result in
your HW passing you the sole copy of the packet.

The second filter is only present in SW ('skip_hw', not using HW egress
ACL that you don't have) and drops the packet after it was flooded by
the SW bridge.

As I mentioned earlier, you can install the filter once in your HW and
share it between different ports using a shared block. This means you
only consume one TCAM entry.

Note that this allows you to keep flooding all other multicast packets
in HW.
> > If you now want to limit the ports to which this packet is flooded,
then
> > you can use tc filters in *software*:
> > 
> > # tc qdisc add dev eth2 clsact
> > # tc filter add dev eth2 egress pref 1 flower skip_hw \
> > 	dst_mac 01:21:6C:00:00:01 action drop
> Yes. This can work in the SW bridge.
> 
> > If you want to forward the packet in hardware and locally receive it,
> > you can chain several mirred action and then a trap action.
> I'm not I fully understand how this should be done, but it does sound
like it
> becomes quite complicated. Also, as far as I understand it will mean that
we
> will be using TCAM/ACL resources to do something that could have been done
with
> a simple MAC entry.
> 
> > Both options avoid HW egress ACLs which your design does not support.
> True, but what is wrong with expanding the functionality of the normal
> forwarding/MAC operations to allow multiple destinations?
> 
> It is not an uncommon feature (I just browsed the manual of some common L2
> switches and they all has this feature).
> 
> It seems to fit nicely into the existing user-interface:
> 
> bridge fdb add    01:21:6C:00:00:01 port eth0
> bridge fdb append 01:21:6C:00:00:01 port eth1
Wouldn't it be better to instead extend the MDB entries so that they are
either keyed by IP or MAC? I believe FDB should remain as unicast-only.
As a bonus, existing drivers could benefit from it, as MDB entries are
already notified by MAC.
> 
> It seems that it can be added to the existing implementation with out
adding
> significant complexity.
> 
> It will be easy to offload in HW.
> 
> I do not believe that it will be a performance issue, if this is a concern
then
> we may have to do a bit of benchmarking, or we can make it a configuration
> option.
> 
> Long story short, we (Horatiu and I) learned a lot from the discussion
here, and
> I think we should try do a new patch with the learning we got. Then it is
easier
> to see what it actually means to the exiting code, complexity, exiting
drivers,
> performance, default behavioral, backwards compatibly, and other valid
concerns.
> 
> If the patch is no good, and cannot be fixed, then we will go back and look
> further into alternative solutions.
Overall, I tend to agree with Nik. I think your use case is too specific
to justify the amount of changes you want to make in the bridge driver.
We also provided other alternatives. That being said, you're more than
welcome to send the patches and we can continue the discussion then.

Andrew Lunn

2019-Jul-30 14:34 UTC

head link

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

Hi Allan

Just throwing out another idea....

The whole offloading story has been you use the hardware to accelerate
what the Linux stack can already do.

In this case, you want to accelerate Device Level Ring, DLR.  But i've
not yet seen a software implementation of DLR. Should we really be
considering first adding DLR to the SW bridge? Make it an alternative
to the STP code? Once we have a generic implementation we can then
look at how it can be accelerated using switchdev.

     Andrew

Linux Ethernet Bridging - Jul 2019 - [Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups