thr3ads.net - Linux Ethernet Bridging - [Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups [Jul 2019]

If this information is useful, please help other people find it:
Share via:

Andrew Lunn

2019-Jul-27 03:02 UTC

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

> As you properly guessed, this model is quite different from what we are
used to.
Yes, it takes a while to get the idea that the hardware is just an
accelerator for what the Linux stack can already do. And if the switch
cannot do some feature, pass the frame to Linux so it can handle it.

You need to keep in mind that there could be other ports in the bridge
than switch ports, and those ports might be interested in the
multicast traffic. Hence the CPU needs to see the traffic. But IGMP
snooping can be used to optimise this. But you still need to be
careful, eg. IPv6 Neighbour discovery has often been broken on
mv88e6xxx because we have been too aggressive with filtering
multicast.

	Andrew

Allan W. Nielsen

2019-Jul-28 19:15 UTC

head link

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

The 07/27/2019 05:02, Andrew Lunn wrote:> > As you properly guessed, this model is quite different from what we
are used to.
> 
> Yes, it takes a while to get the idea that the hardware is just an
> accelerator for what the Linux stack can already do. And if the switch
> cannot do some feature, pass the frame to Linux so it can handle it.This is understood, and not that different from what we are used to.

The surprise was to make all multicast traffic to go to the CPU.
> You need to keep in mind that there could be other ports in the bridge
> than switch ports, and those ports might be interested in the
> multicast traffic. Hence the CPU needs to see the traffic.This is a good argument, but I was under the impression that not all HW/drivers
supports foreign interfaces (see ocelot_netdevice_dev_check and
mlxsw_sp_port_dev_check).
> But IGMP snooping can be used to optimise this.Yes, IGMP snooping can limit the multicast storm of multicast IP traffic, but
not for L2 non-IP multicast traffic.

We could really use something similar for non-IP multicast MAC addresses.

Trying to get back to the original problem:

We have a network which implements the ODVA/DLR ring protocol. This protocol
sends out a beacon frame as often as every 3 us (as far as I recall, default I
believe is 400 us) to this MAC address: 01:21:6C:00:00:01.

Try take a quick look at slide 10 in [1].

If we assume that the SwitchDev driver implemented such that all multicast
traffic goes to the CPU, then we should really have a way to install a HW
offload path in the silicon, such that these packets does not go to the CPU (as
they are known not to be use full, and a frame every 3 us is a significant load
on small DMA connections and CPU resources).

If we assume that the SwitchDev driver implemented such that only
"needed"
multicast packets goes to the CPU, then we need a way to get these packets in
case we want to implement the DLR protocol.

I'm sure that both models can work, and I do not think that this is the main
issue here.

Our initial attempt was to allow install static L2-MAC entries and append
multiple ports to such an entry in the MAC table. This was rejected, for several
good reasons it seems. But I'm not sure it was clear what we wanted to
achieve,
and why we find it to be important. Hopefully this is clear with a real world
use-case.

Any hints or ideas on what would be a better way to solve this problems will be
much appreciated.

/Allan

[1]
https://www.odva.org/Portals/0/Library/Conference/2017-ODVA-Conference_Woods_High%20Availability_Guidelines%20for%20Use%20of%20DLR%20in%20EtherNetIP%20Networks_FINAL%20PPT.pdf

Andrew Lunn

2019-Jul-28 23:07 UTC

head link

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

> Trying to get back to the original problem:
> 
> We have a network which implements the ODVA/DLR ring protocol. This
protocol
> sends out a beacon frame as often as every 3 us (as far as I recall,
default I
> believe is 400 us) to this MAC address: 01:21:6C:00:00:01.
> 
> Try take a quick look at slide 10 in [1].
> 
> If we assume that the SwitchDev driver implemented such that all multicast
> traffic goes to the CPU, then we should really have a way to install a HW
> offload path in the silicon, such that these packets does not go to the CPU
(as
> they are known not to be use full, and a frame every 3 us is a significant
load
> on small DMA connections and CPU resources).
> 
> If we assume that the SwitchDev driver implemented such that only
"needed"
> multicast packets goes to the CPU, then we need a way to get these packets
in
> case we want to implement the DLR protocol.
> 
> I'm sure that both models can work, and I do not think that this is the
main
> issue here.
> 
> Our initial attempt was to allow install static L2-MAC entries and append
> multiple ports to such an entry in the MAC table. This was rejected, for
several
> good reasons it seems. But I'm not sure it was clear what we wanted to
achieve,
> and why we find it to be important. Hopefully this is clear with a real
world
> use-case.
> 
> Any hints or ideas on what would be a better way to solve this problems
will be
> much appreciated.
I always try to think about how this would work if i had a bunch of
discrete network interfaces, not a switch. What APIs are involved in
configuring such a system? How does the Linux network stack perform
software DLR? How is the reception and blocking of the multicast group
performed?

Once you understand how it works in the software implement, it should
then be more obvious which switchdev hooks should be used to
accelerate this using hardware.

	   Andrew

Ido Schimmel

2019-Jul-29 06:09 UTC

head link

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

On Sun, Jul 28, 2019 at 09:15:59PM +0200, Allan W. Nielsen
wrote:> If we assume that the SwitchDev driver implemented such that all multicast
> traffic goes to the CPU, then we should really have a way to install a HW
> offload path in the silicon, such that these packets does not go to the CPU
(as
> they are known not to be use full, and a frame every 3 us is a significant
load
> on small DMA connections and CPU resources).
> 
> If we assume that the SwitchDev driver implemented such that only
"needed"
> multicast packets goes to the CPU, then we need a way to get these packets
in
> case we want to implement the DLR protocol.
I'm not familiar with the HW you're working with, so the below might not
be relevant.

In case you don't want to send all multicast traffic to the CPU (I'll
refer to it later), you can install an ingress tc filter that traps to
the CPU the packets you do want to receive. Something like:

# tc qdisc add dev swp1 clsact
# tc filter add dev swp1 pref 1 ingress flower skip_sw dst_mac \
	01:21:6C:00:00:01 action trap

If your HW supports sharing the same filter among multiple ports, then
you can install your filter in a tc shared block and bind multiple ports
to it.

Another option is to always send a *copy* of multicast packets to the
CPU, but make sure the HW uses a policer that prevents the CPU from
being overwhelmed. To avoid packets being forwarded twice (by HW and
SW), you will need to mark such packets in your driver with
'skb->offload_fwd_mark = 1'.

Now, in case user wants to allow the CPU to receive certain packets at a
higher rate, a tc filter can be used. It will be identical to the filter
I mentioned earlier, but with a 'police' action chained before
'trap'.

I don't think this is currently supported by any driver, but I believe
it's the right way to go: By default the CPU receives all the traffic it
should receive and user can fine-tune it using ACLs.

Linux Ethernet Bridging - Jul 2019 - [Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups

[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups