Nikolay Aleksandrov
2019-Jul-26 12:31 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
On 26/07/2019 15:02, Horatiu Vultur wrote:> Hi Nikolay, > > The 07/26/2019 12:26, Nikolay Aleksandrov wrote: >> External E-Mail >> >> >> On 26/07/2019 11:41, Nikolay Aleksandrov wrote: >>> On 25/07/2019 17:21, Horatiu Vultur wrote: >>>> Hi Nikolay, >>>> >>>> The 07/25/2019 16:21, Nikolay Aleksandrov wrote: >>>>> External E-Mail >>>>> >>>>> >>>>> On 25/07/2019 16:06, Nikolay Aleksandrov wrote: >>>>>> On 25/07/2019 14:44, Horatiu Vultur wrote: >>>>>>> There is no way to configure the bridge, to receive only specific link >>>>>>> layer multicast addresses. From the description of the command 'bridge >>>>>>> fdb append' is supposed to do that, but there was no way to notify the >>>>>>> network driver that the bridge joined a group, because LLADDR was added >>>>>>> to the unicast netdev_hw_addr_list. >>>>>>> >>>>>>> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set >>>>>>> and if the source is NULL, which represent the bridge itself. Then add >>>>>>> address to multicast netdev_hw_addr_list for each bridge interfaces. >>>>>>> And then the .ndo_set_rx_mode function on the driver is called. To notify >>>>>>> the driver that the list of multicast mac addresses changed. >>>>>>> >>>>>>> Signed-off-by: Horatiu Vultur <horatiu.vultur at microchip.com> >>>>>>> --- >>>>>>> net/bridge/br_fdb.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++--- >>>>>>> 1 file changed, 46 insertions(+), 3 deletions(-) >>>>>>> >>>>>> >>>>>> Hi, >>>>>> I'm sorry but this patch is wrong on many levels, some notes below. In general >>>>>> NLM_F_APPEND is only used in vxlan, the bridge does not handle that flag at all. >>>>>> FDB is only for *unicast*, nothing is joined and no multicast should be used with fdbs. >>>>>> MDB is used for multicast handling, but both of these are used for forwarding. >>>>>> The reason the static fdbs are added to the filter is for non-promisc ports, so they can >>>>>> receive traffic destined for these FDBs for forwarding. >>>>>> If you'd like to join any multicast group please use the standard way, if you'd like to join >>>>>> it only on a specific port - join it only on that port (or ports) and the bridge and you'll >>>>> >>>>> And obviously this is for the case where you're not enabling port promisc mode (non-default). >>>>> In general you'll only need to join the group on the bridge to receive traffic for it >>>>> or add it as an mdb entry to forward it. >>>>> >>>>>> have the effect that you're describing. What do you mean there's no way ? >>>> >>>> Thanks for the explanation. >>>> There are few things that are not 100% clear to me and maybe you can >>>> explain them, not to go totally in the wrong direction. Currently I am >>>> writing a network driver on which I added switchdev support. Then I was >>>> looking for a way to configure the network driver to copy link layer >>>> multicast address to the CPU port. >>>> >>>> If I am using bridge mdb I can do it only for IP multicast addreses, >>>> but how should I do it if I want non IP frames with link layer multicast >>>> address to be copy to CPU? For example: all frames with multicast >>>> address '01-21-6C-00-00-01' to be copy to CPU. What is the user space >>>> command for that? >>>> >>> >>> Check SIOCADDMULTI (ip maddr from iproute2), f.e. add that mac to the port >>> which needs to receive it and the bridge will send it up automatically since >>> it's unknown mcast (note that if there's a querier, you'll have to make the >>> bridge mcast router if it is not the querier itself). It would also flood it to all >> >> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This >> traffic will always be flooded by the bridge (and also a copy will be locally sent up). >> Thus only the flooding may need to be controlled. > > OK, I see, but the part which is not clear to me is, which bridge > command(from iproute2) to use so the bridge would notify the network > driver(using swichdev or not) to configure the HW to copy all the frames > with dmac '01-21-6C-00-00-01' to CPU? So that the bridge can receive > those frames and then just to pass them up. > Definitly I would not like to set the front ports in promisc mode, to > copy all the frames to CPU because I think that would overkill it. > Thanks, >For non-IP traffic there's no such command, if it was IP you could catch switchdev mdb notifications, but just for a mac address you'll have to configure the port yourself, e.g. via the ip maddr command if you don't want to put it in promisc mode. You know that in order to not run in promisc mode you'll have to disable port flooding and port learning, right ? Otherwise they're always put in promisc. If you do that you could do a simple hack in your driver with the current situation: $ bridge fdb add 01:21:6C:00:00:01 dev swpX master static If swpX is not promisc you'll get dev_uc_add() which will call __dev_set_rx_mode() and the notifier so you can catch that address being added to the device, traffic for that mac will *not* be forwarded to swpX later because it's considered multicast and unknown at that so it will be flooded to all who have the FLOOD flag and will be sent up the bridge as well. But this is very hacky, I'd prefer the ip maddr add command on the port where you wish to receive that traffic, the bridge will pass it up always due to it being unknown.>> >>> other ports so you may want to control that. It really depends on the setup >>> and the how the hardware is configured. >>> >>>>>> >>>>>> In addition you're allowing a mix of mcast functions to be called with unicast addresses >>>>>> and vice versa, it is not that big of a deal because the kernel will simply return an error >>>>>> but still makes no sense. >>>>>> >>>>>> Nacked-by: Nikolay Aleksandrov <nikolay at cumulusnetworks.com> >>>>>> >>>>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c >>>>>>> index b1d3248..d93746d 100644 >>>>>>> --- a/net/bridge/br_fdb.c >>>>>>> +++ b/net/bridge/br_fdb.c >>>>>>> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr) >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char *addr) >>>>>>> +{ >>>>>>> + int err; >>>>>>> + struct net_bridge_port *p; >>>>>>> + >>>>>>> + ASSERT_RTNL(); >>>>>>> + >>>>>>> + list_for_each_entry(p, &br->port_list, list) { >>>>>>> + if (!br_promisc_port(p)) { >>>>>>> + err = dev_mc_add(p->dev, addr); >>>>>>> + if (err) >>>>>>> + goto undo; >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + return; >>>>>>> +undo: >>>>>>> + list_for_each_entry_continue_reverse(p, &br->port_list, list) { >>>>>>> + if (!br_promisc_port(p)) >>>>>>> + dev_mc_del(p->dev, addr); >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> /* When a static FDB entry is deleted, the HW address from that entry is >>>>>>> * also removed from the bridge private HW address list and updates all >>>>>>> * the ports with needed information. >>>>>>> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr) >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char *addr) >>>>>>> +{ >>>>>>> + struct net_bridge_port *p; >>>>>>> + >>>>>>> + ASSERT_RTNL(); >>>>>>> + >>>>>>> + list_for_each_entry(p, &br->port_list, list) { >>>>>>> + if (!br_promisc_port(p)) >>>>>>> + dev_mc_del(p->dev, addr); >>>>>>> + } >>>>>>> +} >>>>>>> + >>>>>>> static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, >>>>>>> bool swdev_notify) >>>>>>> { >>>>>>> trace_fdb_delete(br, f); >>>>>>> >>>>>>> - if (f->is_static) >>>>>>> + if (f->is_static) { >>>>>>> fdb_del_hw_addr(br, f->key.addr.addr); >>>>>>> + fdb_del_hw_maddr(br, f->key.addr.addr); >>>>>> >>>>>> Walking over all ports again for each static delete is a no-go. >>>>>> >>>>>>> + } >>>>>>> >>>>>>> hlist_del_init_rcu(&f->fdb_node); >>>>>>> rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode, >>>>>>> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source, >>>>>>> fdb->is_local = 1; >>>>>>> if (!fdb->is_static) { >>>>>>> fdb->is_static = 1; >>>>>>> - fdb_add_hw_addr(br, addr); >>>>>>> + if (flags & NLM_F_APPEND && !source) >>>>>>> + fdb_add_hw_maddr(br, addr); >>>>>>> + else >>>>>>> + fdb_add_hw_addr(br, addr); >>>>>>> } >>>>>>> } else if (state & NUD_NOARP) { >>>>>>> fdb->is_local = 0; >>>>>>> if (!fdb->is_static) { >>>>>>> fdb->is_static = 1; >>>>>>> - fdb_add_hw_addr(br, addr); >>>>>>> + if (flags & NLM_F_APPEND && !source) >>>>>>> + fdb_add_hw_maddr(br, addr); >>>>>>> + else >>>>>>> + fdb_add_hw_addr(br, addr); >>>>>>> } >>>>>>> } else { >>>>>>> fdb->is_local = 0; >>>>>>> >>>>>> >>>>> >>>> >>> >> >
Allan W. Nielsen
2019-Jul-29 12:14 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
Hi Nikolay, First of all, as mentioned further down in this thread, I realized that our implementation of the multicast floodmasks does not align with the existing SW implementation. We will change this, such that all multicast packets goes to the SW bridge. This changes things a bit, not that much. I actually think you summarized the issue we have (after changing to multicast flood-masks) right here: The 07/26/2019 12:26, Nikolay Aleksandrov wrote:> >> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This > >> traffic will always be flooded by the bridge (and also a copy will be locally sent up). > >> Thus only the flooding may need to be controlled.This seems to be exactly what we need. Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this on a network where we want to limit the flooding of frames with dmac 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1. One way of doing this could potentially be to support the following command: bridge fdb add 01:21:6C:00:00:01 port eth0 bridge fdb append 01:21:6C:00:00:01 port eth1 On 25/07/2019 16:06, Nikolay Aleksandrov wrote:> >>>>>> In general NLM_F_APPEND is only used in vxlan, the bridge does not > >>>>>> handle that flag at all. FDB is only for *unicast*, nothing is joined > >>>>>> and no multicast should be used with fdbs. MDB is used for multicast > >>>>>> handling, but both of these are used for forwarding.This is true, and this should have been addressed in the patch, we were too focused on setting up the offload patch in the driver, and forgot to do the SW implementation. Do you see any issues in supporting this flag, and updating the SW forwarding in br_handle_frame_finish such that it can support/allow a FDB entry to be a multicast? /Allan
Nikolay Aleksandrov
2019-Jul-29 12:22 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
Hi Allan, On 29/07/2019 15:14, Allan W. Nielsen wrote:> Hi Nikolay, > > First of all, as mentioned further down in this thread, I realized that our > implementation of the multicast floodmasks does not align with the existing SW > implementation. We will change this, such that all multicast packets goes to the > SW bridge. > > This changes things a bit, not that much. > > I actually think you summarized the issue we have (after changing to multicast > flood-masks) right here: > > The 07/26/2019 12:26, Nikolay Aleksandrov wrote: >>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This >>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up). >>>> Thus only the flooding may need to be controlled. > > This seems to be exactly what we need. > > Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this > on a network where we want to limit the flooding of frames with dmac > 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1. > > One way of doing this could potentially be to support the following command: > > bridge fdb add 01:21:6C:00:00:01 port eth0 > bridge fdb append 01:21:6C:00:00:01 port eth1 > > On 25/07/2019 16:06, Nikolay Aleksandrov wrote: >>>>>>>> In general NLM_F_APPEND is only used in vxlan, the bridge does not >>>>>>>> handle that flag at all. FDB is only for *unicast*, nothing is joined >>>>>>>> and no multicast should be used with fdbs. MDB is used for multicast >>>>>>>> handling, but both of these are used for forwarding. > This is true, and this should have been addressed in the patch, we were too > focused on setting up the offload patch in the driver, and forgot to do the SW > implementation. > > Do you see any issues in supporting this flag, and updating the SW > forwarding in br_handle_frame_finish such that it can support/allow a FDB entry > to be a multicast? >Yes, all of the multicast code is handled differently, it doesn't go through the fdb lookup or code at all. I don't see how you'll do a lookup in the fdb table with a multicast mac address, take a look at br_handle_frame_finish() and you'll notice that when a multicast dmac is detected then we use the bridge mcast code for lookups and forwarding. If you're trying to achieve Rx only on the bridge of these then why not just use Ido's tc suggestion or even the ip maddr add offload for each port ? If you add a multicast mac in the fdb (currently allowed, but has no effect) and you use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already possible to achieve via other methods, no need to go through the bridge.> /Allan >
Allan W. Nielsen
2019-Aug-01 14:22 UTC
[Bridge] [PATCH] net: bridge: Allow bridge to joing multicast groups
The 07/26/2019 15:31, Nikolay Aleksandrov wrote: ...> You know that in order to not run in promisc mode you'll have to disable > port flooding and port learning, right ? Otherwise they're always put in promisc.Yes, we have spend some time looking at nbp_update_port_count and trying to understand the reasoning behind it. Our understanding is that this is to make it work with a pure SW bridge implementation, and this is actually an optimization to allow disable promisc mode if all forwarding is static (no flooding and no learning). We also noticed that the Ocelot and the Rocker drivers avoids this "issue" by not implementing promisc mode. But promisc mode is a really nice feature for debugging, and we would actually like to have it, and when HW that can do learning/flooding it does not seem to be necessary. I tried to understand how this is handled in the Mellanox drivers, but gave up. Too big, and we lack the insight in their design. Do you know if there are better ways to prevent switchdev-offloaded-slave interfaces to go to promisc mode? /Allan