Andrew Lunn
2019-Jul-27 03:02 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
> As you properly guessed, this model is quite different from what we are used to.Yes, it takes a while to get the idea that the hardware is just an accelerator for what the Linux stack can already do. And if the switch cannot do some feature, pass the frame to Linux so it can handle it. You need to keep in mind that there could be other ports in the bridge than switch ports, and those ports might be interested in the multicast traffic. Hence the CPU needs to see the traffic. But IGMP snooping can be used to optimise this. But you still need to be careful, eg. IPv6 Neighbour discovery has often been broken on mv88e6xxx because we have been too aggressive with filtering multicast. Andrew
Allan W. Nielsen
2019-Jul-28 19:15 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
The 07/27/2019 05:02, Andrew Lunn wrote:> > As you properly guessed, this model is quite different from what we are used to. > > Yes, it takes a while to get the idea that the hardware is just an > accelerator for what the Linux stack can already do. And if the switch > cannot do some feature, pass the frame to Linux so it can handle it.This is understood, and not that different from what we are used to. The surprise was to make all multicast traffic to go to the CPU.> You need to keep in mind that there could be other ports in the bridge > than switch ports, and those ports might be interested in the > multicast traffic. Hence the CPU needs to see the traffic.This is a good argument, but I was under the impression that not all HW/drivers supports foreign interfaces (see ocelot_netdevice_dev_check and mlxsw_sp_port_dev_check).> But IGMP snooping can be used to optimise this.Yes, IGMP snooping can limit the multicast storm of multicast IP traffic, but not for L2 non-IP multicast traffic. We could really use something similar for non-IP multicast MAC addresses. Trying to get back to the original problem: We have a network which implements the ODVA/DLR ring protocol. This protocol sends out a beacon frame as often as every 3 us (as far as I recall, default I believe is 400 us) to this MAC address: 01:21:6C:00:00:01. Try take a quick look at slide 10 in [1]. If we assume that the SwitchDev driver implemented such that all multicast traffic goes to the CPU, then we should really have a way to install a HW offload path in the silicon, such that these packets does not go to the CPU (as they are known not to be use full, and a frame every 3 us is a significant load on small DMA connections and CPU resources). If we assume that the SwitchDev driver implemented such that only "needed" multicast packets goes to the CPU, then we need a way to get these packets in case we want to implement the DLR protocol. I'm sure that both models can work, and I do not think that this is the main issue here. Our initial attempt was to allow install static L2-MAC entries and append multiple ports to such an entry in the MAC table. This was rejected, for several good reasons it seems. But I'm not sure it was clear what we wanted to achieve, and why we find it to be important. Hopefully this is clear with a real world use-case. Any hints or ideas on what would be a better way to solve this problems will be much appreciated. /Allan [1] https://www.odva.org/Portals/0/Library/Conference/2017-ODVA-Conference_Woods_High%20Availability_Guidelines%20for%20Use%20of%20DLR%20in%20EtherNetIP%20Networks_FINAL%20PPT.pdf
Andrew Lunn
2019-Jul-28 23:07 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
> Trying to get back to the original problem: > > We have a network which implements the ODVA/DLR ring protocol. This protocol > sends out a beacon frame as often as every 3 us (as far as I recall, default I > believe is 400 us) to this MAC address: 01:21:6C:00:00:01. > > Try take a quick look at slide 10 in [1]. > > If we assume that the SwitchDev driver implemented such that all multicast > traffic goes to the CPU, then we should really have a way to install a HW > offload path in the silicon, such that these packets does not go to the CPU (as > they are known not to be use full, and a frame every 3 us is a significant load > on small DMA connections and CPU resources). > > If we assume that the SwitchDev driver implemented such that only "needed" > multicast packets goes to the CPU, then we need a way to get these packets in > case we want to implement the DLR protocol. > > I'm sure that both models can work, and I do not think that this is the main > issue here. > > Our initial attempt was to allow install static L2-MAC entries and append > multiple ports to such an entry in the MAC table. This was rejected, for several > good reasons it seems. But I'm not sure it was clear what we wanted to achieve, > and why we find it to be important. Hopefully this is clear with a real world > use-case. > > Any hints or ideas on what would be a better way to solve this problems will be > much appreciated.I always try to think about how this would work if i had a bunch of discrete network interfaces, not a switch. What APIs are involved in configuring such a system? How does the Linux network stack perform software DLR? How is the reception and blocking of the multicast group performed? Once you understand how it works in the software implement, it should then be more obvious which switchdev hooks should be used to accelerate this using hardware. Andrew
Ido Schimmel
2019-Jul-29 06:09 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
On Sun, Jul 28, 2019 at 09:15:59PM +0200, Allan W. Nielsen wrote:> If we assume that the SwitchDev driver implemented such that all multicast > traffic goes to the CPU, then we should really have a way to install a HW > offload path in the silicon, such that these packets does not go to the CPU (as > they are known not to be use full, and a frame every 3 us is a significant load > on small DMA connections and CPU resources). > > If we assume that the SwitchDev driver implemented such that only "needed" > multicast packets goes to the CPU, then we need a way to get these packets in > case we want to implement the DLR protocol.I'm not familiar with the HW you're working with, so the below might not be relevant. In case you don't want to send all multicast traffic to the CPU (I'll refer to it later), you can install an ingress tc filter that traps to the CPU the packets you do want to receive. Something like: # tc qdisc add dev swp1 clsact # tc filter add dev swp1 pref 1 ingress flower skip_sw dst_mac \ 01:21:6C:00:00:01 action trap If your HW supports sharing the same filter among multiple ports, then you can install your filter in a tc shared block and bind multiple ports to it. Another option is to always send a *copy* of multicast packets to the CPU, but make sure the HW uses a policer that prevents the CPU from being overwhelmed. To avoid packets being forwarded twice (by HW and SW), you will need to mark such packets in your driver with 'skb->offload_fwd_mark = 1'. Now, in case user wants to allow the CPU to receive certain packets at a higher rate, a tc filter can be used. It will be identical to the filter I mentioned earlier, but with a 'police' action chained before 'trap'. I don't think this is currently supported by any driver, but I believe it's the right way to go: By default the CPU receives all the traffic it should receive and user can fine-tune it using ACLs.