Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 0/9] net: bridge: Forward offloading
## Overview vlan1 vlan2 \ / .-----------. | br0 | '-----------' / / \ \ swp0 swp1 swp2 eth0 : : : (hwdom 1) Up to this point, switchdevs have been trusted with offloading forwarding between bridge ports, e.g. forwarding a unicast from swp0 to swp1 or flooding a broadcast from swp2 to swp1 and swp0. This series extends forward offloading to include some new classes of traffic: - Locally originating flows, i.e. packets that ingress on br0 that are to be forwarded to one or several of the ports swp{0,1,2}. Notably this also includes routed flows, e.g. a packet ingressing swp0 on VLAN 1 which is then routed over to VLAN 2 by the CPU and then forwarded to swp1 is "locally originating" from br0's point of view. - Flows originating from "foreign" interfaces, i.e. an interface that is not offloaded by a particular switchdev instance. This includes ports belonging to other switchdev instances. A typical example would be flows from eth0 towards swp{0,1,2}. The bridge still looks up its FDB/MDB as usual and then notifies the switchdev driver that a particular skb should be offloaded if it matches one of the classes above. It does so by using the _accel version of dev_queue_xmit, supplying its own netdev as the "subordinate" device. The driver can react to the presence of the subordinate in its .ndo_select_queue in what ever way it needs to make sure to forward the skb in much the same way that it would for packets ingressing on regular ports. Hardware domains to which a particular skb has been forwarded are recorded so that duplicates are avoided. The main performance benefit is thus seen on multicast flows. Imagine for example that: - An IP camera is connected to swp0 (VLAN 1) - The CPU is acting as a multicast router, routing the group from VLAN 1 to VLAN 2. - There are subscribers for the group in question behind both swp1 and swp2 (VLAN 2). With this offloading in place, the bridge need only send a single skb to the driver, which will send it to the hardware marked in such a way that the switch will perform the multicast replication according to the MDB configuration. Naturally, the number of saved skb_clones increase linearly with the number of subscribed ports. As an extra benefit, on mv88e6xxx, this also allows the switch to perform source address learning on these flows, which avoids having to sync dynamic FDB entries over slow configuration interfaces like MDIO to avoid flows directed towards the CPU being flooded as unknown unicast by the switch. ## RFC - In general, what do you think about this idea? - hwdom. What do you think about this terminology? Personally I feel that we had too many things called offload_fwd_mark, and that as the use of the bridge internal ID (nbp->offload_fwd_mark) expands, it might be useful to have a separate term for it. - .dfwd_{add,del}_station. Am I stretching this abstraction too far, and if so do you have any suggestion/preference on how to signal the offloading from the bridge down to the switchdev driver? - The way that flooding is implemented in br_forward.c (lazily cloning skbs) means that you have to mark the forwarding as completed very early (right after should_deliver in maybe_deliver) in order to avoid duplicates. Is there some way to move this decision point to a later stage that I am missing? - BR_MULTICAST_TO_UNICAST. Right now, I expect that this series is not compatible with unicast-to-multicast being used on a port. Then again, I think that this would also be broken for regular switchdev bridge offloading as this flag is not offloaded to the switchdev port, so there is no way for the driver to refuse it. Any ideas on how to handle this? ## mv88e6xxx Specifics Since we are now only receiving a single skb for both unicast and multicast flows, we can tag the packets with the FORWARD command instead of FROM_CPU. The swich(es) will then forward the packet in accordance with its ATU, VTU, STU, and PVT configuration - just like for packets ingressing on user ports. Crucially, FROM_CPU is still used for: - Ports in standalone mode. - Flows that are trapped to the CPU and software-forwarded by a bridge. Note that these flows match neither of the classes discussed in the overview. - Packets that are sent directly to a port netdev without going through the bridge, e.g. lldpd sending out PDU via an AF_PACKET socket. We thus have a pretty clean separation where the data plane uses FORWARDs and the control plane uses TO_/FROM_CPU. The barrier between different bridges is enforced by port based VLANs on mv88e6xxx, which in essence is a mapping from a source device/port pair to an allowed set of egress ports. In order to have a FORWARD frame (which carries a _source_ device/port) correctly mapped by the PVT, we must use a unique pair for each bridge. Fortunately, there is typically lots of unused address space in most switch trees. When was the last time you saw an mv88e6xxx product using more than 4 chips? Even if you found one with 16 (!) devices, you would still have room to allocate 16*16 virtual ports to software bridges. Therefore, the mv88e6xxx driver will allocate a virtual device/port pair to each bridge that it offloads. All members of the same bridge are then configured to allow packets from this virtual port in their PVTs. Tobias Waldekranz (9): net: dfwd: Constrain existing users to macvlan subordinates net: bridge: Disambiguate offload_fwd_mark net: bridge: switchdev: Recycle unused hwdoms net: bridge: switchdev: Forward offloading net: dsa: Track port PVIDs net: dsa: Forward offloading net: dsa: mv88e6xxx: Allocate a virtual DSA port for each bridge net: dsa: mv88e6xxx: Map virtual bridge port in PVT net: dsa: mv88e6xxx: Forward offloading MAINTAINERS | 1 + drivers/net/dsa/mv88e6xxx/Makefile | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 61 ++++++- drivers/net/dsa/mv88e6xxx/dst.c | 160 ++++++++++++++++++ drivers/net/dsa/mv88e6xxx/dst.h | 14 ++ .../net/ethernet/intel/fm10k/fm10k_netdev.c | 3 + drivers/net/ethernet/intel/i40e/i40e_main.c | 3 + drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 + include/linux/dsa/mv88e6xxx.h | 13 ++ include/net/dsa.h | 13 ++ net/bridge/br_forward.c | 11 +- net/bridge/br_if.c | 4 +- net/bridge/br_private.h | 54 +++++- net/bridge/br_switchdev.c | 141 +++++++++++---- net/dsa/port.c | 16 +- net/dsa/slave.c | 36 +++- net/dsa/tag_dsa.c | 33 +++- 17 files changed, 510 insertions(+), 57 deletions(-) create mode 100644 drivers/net/dsa/mv88e6xxx/dst.c create mode 100644 drivers/net/dsa/mv88e6xxx/dst.h create mode 100644 include/linux/dsa/mv88e6xxx.h -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 1/9] net: dfwd: Constrain existing users to macvlan subordinates
The dfwd_add/del_station NDOs are currently only used by the macvlan subsystem to request L2 forwarding offload from lower devices. In order add support for other types of devices (like bridges), we constrain the current users to make sure that the subordinate requesting the offload is in fact a macvlan. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 3 +++ drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +++ drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +++ 3 files changed, 9 insertions(+) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c index 2fb52bd6fc0e..4dba6e6a282d 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c @@ -1352,6 +1352,9 @@ static void *fm10k_dfwd_add_station(struct net_device *dev, int size, i; u16 vid, glort; + if (!netif_is_macvlan(sdev)) + return ERR_PTR(-EOPNOTSUPP); + /* The hardware supported by fm10k only filters on the destination MAC * address. In order to avoid issues we only support offloading modes * where the hardware can actually provide the functionality. diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index c2d145a56b5e..b90b79f7ee46 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7663,6 +7663,9 @@ static void *i40e_fwd_add(struct net_device *netdev, struct net_device *vdev) struct i40e_fwd_adapter *fwd; int avail_macvlan, ret; + if (!netif_is_macvlan(vdev)) + return ERR_PTR(-EOPNOTSUPP); + if ((pf->flags & I40E_FLAG_DCB_ENABLED)) { netdev_info(netdev, "Macvlans are not supported when DCB is enabled\n"); return ERR_PTR(-EINVAL); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index c5ec17d19c59..ff5334faf6c5 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -9940,6 +9940,9 @@ static void *ixgbe_fwd_add(struct net_device *pdev, struct net_device *vdev) int tcs = adapter->hw_tcs ? : 1; int pool, err; + if (!netif_is_macvlan(vdev)) + return ERR_PTR(-EOPNOTSUPP); + if (adapter->xdp_prog) { e_warn(probe, "L2FW offload is not supported with XDP\n"); return ERR_PTR(-EINVAL); -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 2/9] net: bridge: Disambiguate offload_fwd_mark
Before this change, four related - but distinct - concepts where named offload_fwd_mark: - skb->offload_fwd_mark: Set by the switchdev driver if the underlying hardware has already forwarded this frame to the other ports in the same hardware domain. - nbp->offload_fwd_mark: An idetifier used to group ports that share the same hardware forwarding domain. - br->offload_fwd_mark: Counter used to make sure that unique IDs are used in cases where a bridge contains ports from multiple hardware domains. - skb->cb->offload_fwd_mark: The hardware domain on which the frame ingressed and was forwarded. Introduce the term "hardware forwarding domain" ("hwdom") in the bridge to denote a set of ports with the following property: If an skb with skb->offload_fwd_mark set, is received on a port belonging to hwdom N, that frame has already been forwarded to all other ports in hwdom N. By decoupling the name from "offload_fwd_mark", we can extend the term's definition in the future - e.g. to add constraints that describe expected egress behavior - without overloading the meaning of "offload_fwd_mark". - nbp->offload_fwd_mark thus becomes nbp->hwdom. - br->offload_fwd_mark becomes br->last_hwdom. - skb->cb->offload_fwd_mark becomes skb->cb->src_hwdom. There is a slight change here: Whereas previously this was only set for offloaded packets, we now always track the incoming hwdom. As all uses where already gated behind checks of skb->offload_fwd_mark, this will not introduce any functional change, but it paves the way for future changes where the ingressing hwdom must be known both for offloaded and non-offloaded frames. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- net/bridge/br_if.c | 2 +- net/bridge/br_private.h | 10 +++++----- net/bridge/br_switchdev.c | 16 ++++++++-------- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index f7d2f472ae24..73fa703f8df5 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -643,7 +643,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev, if (err) goto err5; - err = nbp_switchdev_mark_set(p); + err = nbp_switchdev_hwdom_set(p); if (err) goto err6; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 7ce8a77cc6b6..53248715f631 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -327,7 +327,7 @@ struct net_bridge_port { struct netpoll *np; #endif #ifdef CONFIG_NET_SWITCHDEV - int offload_fwd_mark; + int hwdom; #endif u16 group_fwd_mask; u16 backup_redirected_cnt; @@ -472,7 +472,7 @@ struct net_bridge { u32 auto_cnt; #ifdef CONFIG_NET_SWITCHDEV - int offload_fwd_mark; + int last_hwdom; #endif struct hlist_head fdb_list; @@ -502,7 +502,7 @@ struct br_input_skb_cb { #endif #ifdef CONFIG_NET_SWITCHDEV - int offload_fwd_mark; + int src_hwdom; #endif }; @@ -1593,7 +1593,7 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; } /* br_switchdev.c */ #ifdef CONFIG_NET_SWITCHDEV -int nbp_switchdev_mark_set(struct net_bridge_port *p); +int nbp_switchdev_hwdom_set(struct net_bridge_port *p); void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb); bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, @@ -1613,7 +1613,7 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb) skb->offload_fwd_mark = 0; } #else -static inline int nbp_switchdev_mark_set(struct net_bridge_port *p) +static inline int nbp_switchdev_hwdom_set(struct net_bridge_port *p) { return 0; } diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index a5e601e41cb9..bc085077ae71 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -8,20 +8,20 @@ #include "br_private.h" -static int br_switchdev_mark_get(struct net_bridge *br, struct net_device *dev) +static int br_switchdev_hwdom_get(struct net_bridge *br, struct net_device *dev) { struct net_bridge_port *p; /* dev is yet to be added to the port list. */ list_for_each_entry(p, &br->port_list, list) { if (netdev_port_same_parent_id(dev, p->dev)) - return p->offload_fwd_mark; + return p->hwdom; } - return ++br->offload_fwd_mark; + return ++br->last_hwdom; } -int nbp_switchdev_mark_set(struct net_bridge_port *p) +int nbp_switchdev_hwdom_set(struct net_bridge_port *p) { struct netdev_phys_item_id ppid = { }; int err; @@ -35,7 +35,7 @@ int nbp_switchdev_mark_set(struct net_bridge_port *p) return err; } - p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev); + p->hwdom = br_switchdev_hwdom_get(p->br, p->dev); return 0; } @@ -43,15 +43,15 @@ int nbp_switchdev_mark_set(struct net_bridge_port *p) void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb) { - if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark)) - BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark; + if (p->hwdom) + BR_INPUT_SKB_CB(skb)->src_hwdom = p->hwdom; } bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, const struct sk_buff *skb) { return !skb->offload_fwd_mark || - BR_INPUT_SKB_CB(skb)->offload_fwd_mark != p->offload_fwd_mark; + BR_INPUT_SKB_CB(skb)->src_hwdom != p->hwdom; } /* Flags that can be offloaded to hardware */ -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 3/9] net: bridge: switchdev: Recycle unused hwdoms
Since hwdoms has thus far only been used for equality comparisons, the bridge has used the simplest possible assignment policy; using a counter to keep track of the last value handed out. With the upcoming transmit offloading, we need to perform set operations efficiently based on hwdoms, e.g. we want to answer questions like "has this skb been forwarded to any port within this hwdom?" Move to a bitmap-based allocation scheme that recycles hwdoms once all members leaves the bridge. This means that we can use a single unsigned long to keep track of the hwdoms that have received an skb. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- net/bridge/br_if.c | 4 +- net/bridge/br_private.h | 29 +++++++++--- net/bridge/br_switchdev.c | 94 ++++++++++++++++++++++++++------------- 3 files changed, 87 insertions(+), 40 deletions(-) diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 73fa703f8df5..adaf78e45c23 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -349,6 +349,7 @@ static void del_nbp(struct net_bridge_port *p) nbp_backup_clear(p); nbp_update_port_count(br); + nbp_switchdev_del(p); netdev_upper_dev_unlink(dev, br->dev); @@ -643,7 +644,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev, if (err) goto err5; - err = nbp_switchdev_hwdom_set(p); + err = nbp_switchdev_add(p); if (err) goto err6; @@ -704,6 +705,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev, list_del_rcu(&p->list); br_fdb_delete_by_port(br, p, 0, 1); nbp_update_port_count(br); + nbp_switchdev_del(p); err6: netdev_upper_dev_unlink(dev, br->dev); err5: diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 53248715f631..aba92864d285 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -29,6 +29,8 @@ #define BR_MULTICAST_DEFAULT_HASH_MAX 4096 +#define BR_HWDOM_MAX BITS_PER_LONG + #define BR_VERSION "2.3" /* Control of forwarding link local multicast */ @@ -54,6 +56,8 @@ typedef struct bridge_id bridge_id; typedef struct mac_addr mac_addr; typedef __u16 port_id; +typedef DECLARE_BITMAP(br_hwdom_map_t, BR_HWDOM_MAX); + struct bridge_id { unsigned char prio[2]; unsigned char addr[ETH_ALEN]; @@ -472,7 +476,7 @@ struct net_bridge { u32 auto_cnt; #ifdef CONFIG_NET_SWITCHDEV - int last_hwdom; + br_hwdom_map_t busy_hwdoms; #endif struct hlist_head fdb_list; @@ -1593,7 +1597,6 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; } /* br_switchdev.c */ #ifdef CONFIG_NET_SWITCHDEV -int nbp_switchdev_hwdom_set(struct net_bridge_port *p); void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb); bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, @@ -1607,17 +1610,15 @@ void br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int br_switchdev_port_vlan_add(struct net_device *dev, u16 vid, u16 flags, struct netlink_ext_ack *extack); int br_switchdev_port_vlan_del(struct net_device *dev, u16 vid); +int nbp_switchdev_add(struct net_bridge_port *p); +void nbp_switchdev_del(struct net_bridge_port *p); +void br_switchdev_init(struct net_bridge *br); static inline void br_switchdev_frame_unmark(struct sk_buff *skb) { skb->offload_fwd_mark = 0; } #else -static inline int nbp_switchdev_hwdom_set(struct net_bridge_port *p) -{ - return 0; -} - static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb) { @@ -1657,6 +1658,20 @@ br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb, int type) static inline void br_switchdev_frame_unmark(struct sk_buff *skb) { } + +static inline int nbp_switchdev_add(struct net_bridge_port *p) +{ + return 0; +} + +static inline void nbp_switchdev_del(struct net_bridge_port *p) +{ +} + +static inline void br_switchdev_init(struct net_bridge *br) +{ +} + #endif /* CONFIG_NET_SWITCHDEV */ /* br_arp_nd_proxy.c */ diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index bc085077ae71..54bd7205bfb5 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -8,38 +8,6 @@ #include "br_private.h" -static int br_switchdev_hwdom_get(struct net_bridge *br, struct net_device *dev) -{ - struct net_bridge_port *p; - - /* dev is yet to be added to the port list. */ - list_for_each_entry(p, &br->port_list, list) { - if (netdev_port_same_parent_id(dev, p->dev)) - return p->hwdom; - } - - return ++br->last_hwdom; -} - -int nbp_switchdev_hwdom_set(struct net_bridge_port *p) -{ - struct netdev_phys_item_id ppid = { }; - int err; - - ASSERT_RTNL(); - - err = dev_get_port_parent_id(p->dev, &ppid, true); - if (err) { - if (err == -EOPNOTSUPP) - return 0; - return err; - } - - p->hwdom = br_switchdev_hwdom_get(p->br, p->dev); - - return 0; -} - void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb) { @@ -156,3 +124,65 @@ int br_switchdev_port_vlan_del(struct net_device *dev, u16 vid) return switchdev_port_obj_del(dev, &v.obj); } + +static int nbp_switchdev_hwdom_set(struct net_bridge_port *joining) +{ + struct net_bridge *br = joining->br; + struct net_bridge_port *p; + int hwdom; + + /* joining is yet to be added to the port list. */ + list_for_each_entry(p, &br->port_list, list) { + if (netdev_port_same_parent_id(joining->dev, p->dev)) { + joining->hwdom = p->hwdom; + return 0; + } + } + + hwdom = find_next_zero_bit(br->busy_hwdoms, BR_HWDOM_MAX, 1); + if (hwdom >= BR_HWDOM_MAX) + return -EBUSY; + + set_bit(hwdom, br->busy_hwdoms); + joining->hwdom = hwdom; + return 0; +} + +static void nbp_switchdev_hwdom_put(struct net_bridge_port *leaving) +{ + struct net_bridge *br = leaving->br; + struct net_bridge_port *p; + + /* leaving is no longer in the port list. */ + list_for_each_entry(p, &br->port_list, list) { + if (p->hwdom == leaving->hwdom) + return; + } + + clear_bit(leaving->hwdom, br->busy_hwdoms); +} + +int nbp_switchdev_add(struct net_bridge_port *p) +{ + struct netdev_phys_item_id ppid = { }; + int err; + + ASSERT_RTNL(); + + err = dev_get_port_parent_id(p->dev, &ppid, true); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + + return nbp_switchdev_hwdom_set(p); +} + +void nbp_switchdev_del(struct net_bridge_port *p) +{ + ASSERT_RTNL(); + + if (p->hwdom) + nbp_switchdev_hwdom_put(p); +} -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 4/9] net: bridge: switchdev: Forward offloading
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain. Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication. The basic flow between the driver and the bridge is as follows: - The switchdev accepts the offload by returning a non-null pointer from .ndo_dfwd_add_station when the port is added to the bridge. - The bridge sends offloadable skbs to one of the ports under the switchdev's control using dev_queue_xmit_accel. - The switchdev notices the offload by checking for a non-NULL "sb_dev" in the core's call to .ndo_select_queue. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- net/bridge/br_forward.c | 11 +++++++- net/bridge/br_private.h | 27 ++++++++++++++++++ net/bridge/br_switchdev.c | 59 +++++++++++++++++++++++++++++++++++++-- 3 files changed, 93 insertions(+), 4 deletions(-) diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c index 6e9b049ae521..b4fb3b0bb1ec 100644 --- a/net/bridge/br_forward.c +++ b/net/bridge/br_forward.c @@ -32,6 +32,8 @@ static inline int should_deliver(const struct net_bridge_port *p, int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb) { + struct net_device *sb_dev = NULL; + skb_push(skb, ETH_HLEN); if (!is_skb_forwardable(skb->dev, skb)) goto drop; @@ -48,7 +50,10 @@ int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb skb_set_network_header(skb, depth); } - dev_queue_xmit(skb); + if (br_switchdev_accels_skb(skb)) + sb_dev = BR_INPUT_SKB_CB(skb)->brdev; + + dev_queue_xmit_accel(skb, sb_dev); return 0; @@ -105,6 +110,8 @@ static void __br_forward(const struct net_bridge_port *to, indev = NULL; } + nbp_switchdev_frame_mark_accel(to, skb); + NF_HOOK(NFPROTO_BRIDGE, br_hook, net, NULL, skb, indev, skb->dev, br_forward_finish); @@ -174,6 +181,8 @@ static struct net_bridge_port *maybe_deliver( if (!should_deliver(p, skb)) return prev; + nbp_switchdev_frame_mark_fwd(p, skb); + if (!prev) goto out; diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index aba92864d285..933e951b0d7a 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -332,6 +332,7 @@ struct net_bridge_port { #endif #ifdef CONFIG_NET_SWITCHDEV int hwdom; + void *accel_priv; #endif u16 group_fwd_mask; u16 backup_redirected_cnt; @@ -506,7 +507,9 @@ struct br_input_skb_cb { #endif #ifdef CONFIG_NET_SWITCHDEV + u8 fwd_accel:1; int src_hwdom; + br_hwdom_map_t fwd_hwdoms; #endif }; @@ -1597,6 +1600,15 @@ static inline void br_sysfs_delbr(struct net_device *dev) { return; } /* br_switchdev.c */ #ifdef CONFIG_NET_SWITCHDEV +static inline bool br_switchdev_accels_skb(struct sk_buff *skb) +{ + return BR_INPUT_SKB_CB(skb)->fwd_accel; +} + +void nbp_switchdev_frame_mark_accel(const struct net_bridge_port *p, + struct sk_buff *skb); +void nbp_switchdev_frame_mark_fwd(const struct net_bridge_port *p, + struct sk_buff *skb); void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb); bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, @@ -1619,6 +1631,21 @@ static inline void br_switchdev_frame_unmark(struct sk_buff *skb) skb->offload_fwd_mark = 0; } #else +static inline bool br_switchdev_accels_skb(struct sk_buff *skb) +{ + return false; +} + +static inline void nbp_switchdev_frame_mark_accel(const struct net_bridge_port *p, + struct sk_buff *skb) +{ +} + +static inline void nbp_switchdev_frame_mark_fwd(const struct net_bridge_port *p, + struct sk_buff *skb) +{ +} + static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb) { diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c index 54bd7205bfb5..c903171ad291 100644 --- a/net/bridge/br_switchdev.c +++ b/net/bridge/br_switchdev.c @@ -8,6 +8,26 @@ #include "br_private.h" +static bool nbp_switchdev_can_accel(const struct net_bridge_port *p, + const struct sk_buff *skb) +{ + return p->accel_priv && (p->hwdom != BR_INPUT_SKB_CB(skb)->src_hwdom); +} + +void nbp_switchdev_frame_mark_accel(const struct net_bridge_port *p, + struct sk_buff *skb) +{ + if (nbp_switchdev_can_accel(p, skb)) + BR_INPUT_SKB_CB(skb)->fwd_accel = true; +} + +void nbp_switchdev_frame_mark_fwd(const struct net_bridge_port *p, + struct sk_buff *skb) +{ + if (nbp_switchdev_can_accel(p, skb)) + set_bit(p->hwdom, BR_INPUT_SKB_CB(skb)->fwd_hwdoms); +} + void nbp_switchdev_frame_mark(const struct net_bridge_port *p, struct sk_buff *skb) { @@ -18,8 +38,10 @@ void nbp_switchdev_frame_mark(const struct net_bridge_port *p, bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, const struct sk_buff *skb) { - return !skb->offload_fwd_mark || - BR_INPUT_SKB_CB(skb)->src_hwdom != p->hwdom; + struct br_input_skb_cb *cb = BR_INPUT_SKB_CB(skb); + + return !test_bit(p->hwdom, cb->fwd_hwdoms) && + (!skb->offload_fwd_mark || cb->src_hwdom != p->hwdom); } /* Flags that can be offloaded to hardware */ @@ -125,6 +147,27 @@ int br_switchdev_port_vlan_del(struct net_device *dev, u16 vid) return switchdev_port_obj_del(dev, &v.obj); } +static void nbp_switchdev_fwd_offload_add(struct net_bridge_port *p) +{ + void *priv; + + if (!(p->dev->features & NETIF_F_HW_L2FW_DOFFLOAD)) + return; + + priv = p->dev->netdev_ops->ndo_dfwd_add_station(p->dev, p->br->dev); + if (!IS_ERR_OR_NULL(priv)) + p->accel_priv = priv; +} + +static void nbp_switchdev_fwd_offload_del(struct net_bridge_port *p) +{ + if (!p->accel_priv) + return; + + p->dev->netdev_ops->ndo_dfwd_del_station(p->dev, p->accel_priv); + p->accel_priv = NULL; +} + static int nbp_switchdev_hwdom_set(struct net_bridge_port *joining) { struct net_bridge *br = joining->br; @@ -176,13 +219,23 @@ int nbp_switchdev_add(struct net_bridge_port *p) return err; } - return nbp_switchdev_hwdom_set(p); + err = nbp_switchdev_hwdom_set(p); + if (err) + return err; + + if (p->hwdom) + nbp_switchdev_fwd_offload_add(p); + + return 0; } void nbp_switchdev_del(struct net_bridge_port *p) { ASSERT_RTNL(); + if (p->accel_priv) + nbp_switchdev_fwd_offload_del(p); + if (p->hwdom) nbp_switchdev_hwdom_put(p); } -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 5/9] net: dsa: Track port PVIDs
In some scenarios a tagger must know which VLAN to assign to a packet, even if the packet is set to egress untagged. Since the VLAN information in the skb will be removed by the bridge in this case, track each port's PVID such that the VID of an outgoing frame can always be determined. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- include/net/dsa.h | 1 + net/dsa/port.c | 16 ++++++++++++++-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index 507082959aa4..1f9ba9889034 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -270,6 +270,7 @@ struct dsa_port { unsigned int ageing_time; bool vlan_filtering; u8 stp_state; + u16 pvid; struct net_device *bridge_dev; struct devlink_port devlink_port; bool devlink_port_setup; diff --git a/net/dsa/port.c b/net/dsa/port.c index 6379d66a6bb3..02d96aebfcc6 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -651,8 +651,14 @@ int dsa_port_vlan_add(struct dsa_port *dp, .vlan = vlan, .extack = extack, }; + int err; + + err = dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info); - return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, &info); + if (!err && (vlan->flags & BRIDGE_VLAN_INFO_PVID)) + dp->pvid = vlan->vid; + + return err; } int dsa_port_vlan_del(struct dsa_port *dp, @@ -663,8 +669,14 @@ int dsa_port_vlan_del(struct dsa_port *dp, .port = dp->index, .vlan = vlan, }; + int err; + + err = dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info); - return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_DEL, &info); + if (!err && vlan->vid == dp->pvid) + dp->pvid = 0; + + return err; } int dsa_port_mrp_add(const struct dsa_port *dp, -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 6/9] net: dsa: Forward offloading
Allow DSA drivers to support forward offloading from a bridge by: - Passing calls to .ndo_dfwd_{add,del}_station to the drivers. - Recording the subordinate device of offloaded skbs in the control buffer so that the tagger can take the appropriate action. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- include/net/dsa.h | 7 +++++++ net/dsa/slave.c | 36 ++++++++++++++++++++++++++++++++++-- 2 files changed, 41 insertions(+), 2 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index 1f9ba9889034..77d4df819299 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -119,6 +119,7 @@ struct dsa_netdevice_ops { struct dsa_skb_cb { struct sk_buff *clone; + struct net_device *sb_dev; }; struct __dsa_skb_cb { @@ -828,6 +829,12 @@ struct dsa_switch_ops { const struct switchdev_obj_ring_role_mrp *mrp); int (*port_mrp_del_ring_role)(struct dsa_switch *ds, int port, const struct switchdev_obj_ring_role_mrp *mrp); + + /* L2 forward offloading */ + void * (*dfwd_add_station)(struct dsa_switch *ds, int port, + struct net_device *sb_dev); + void (*dfwd_del_station)(struct dsa_switch *ds, int port, + struct net_device *sb_dev); }; #define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes) \ diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 77b33bd161b8..3689ffa2dbb8 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -657,6 +657,13 @@ static netdev_tx_t dsa_slave_xmit(struct sk_buff *skb, struct net_device *dev) return dsa_enqueue_skb(nskb, dev); } +static u16 dsa_slave_select_queue(struct net_device *dev, struct sk_buff *skb, + struct net_device *sb_dev) +{ + DSA_SKB_CB(skb)->sb_dev = sb_dev; + return netdev_pick_tx(dev, skb, sb_dev); +} + /* ethtool operations *******************************************************/ static void dsa_slave_get_drvinfo(struct net_device *dev, @@ -1708,10 +1715,33 @@ static int dsa_slave_fill_forward_path(struct net_device_path_ctx *ctx, return 0; } +static void *dsa_slave_dfwd_add_station(struct net_device *dev, + struct net_device *sb_dev) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_switch *ds = dp->ds; + + if (ds->ops->dfwd_add_station) + return ds->ops->dfwd_add_station(ds, dp->index, sb_dev); + + return ERR_PTR(-EOPNOTSUPP); +} + +static void dsa_slave_dfwd_del_station(struct net_device *dev, + void *sb_dev) +{ + struct dsa_port *dp = dsa_slave_to_port(dev); + struct dsa_switch *ds = dp->ds; + + if (ds->ops->dfwd_del_station) + ds->ops->dfwd_del_station(ds, dp->index, sb_dev); +} + static const struct net_device_ops dsa_slave_netdev_ops = { .ndo_open = dsa_slave_open, .ndo_stop = dsa_slave_close, .ndo_start_xmit = dsa_slave_xmit, + .ndo_select_queue = dsa_slave_select_queue, .ndo_change_rx_flags = dsa_slave_change_rx_flags, .ndo_set_rx_mode = dsa_slave_set_rx_mode, .ndo_set_mac_address = dsa_slave_set_mac_address, @@ -1734,6 +1764,8 @@ static const struct net_device_ops dsa_slave_netdev_ops = { .ndo_get_devlink_port = dsa_slave_get_devlink_port, .ndo_change_mtu = dsa_slave_change_mtu, .ndo_fill_forward_path = dsa_slave_fill_forward_path, + .ndo_dfwd_add_station = dsa_slave_dfwd_add_station, + .ndo_dfwd_del_station = dsa_slave_dfwd_del_station, }; static struct device_type dsa_type = { @@ -1914,8 +1946,8 @@ int dsa_slave_create(struct dsa_port *port) slave_dev->features = master->vlan_features | NETIF_F_HW_TC; if (ds->ops->port_vlan_add && ds->ops->port_vlan_del) slave_dev->features |= NETIF_F_HW_VLAN_CTAG_FILTER; - slave_dev->hw_features |= NETIF_F_HW_TC; - slave_dev->features |= NETIF_F_LLTX; + slave_dev->hw_features |= NETIF_F_HW_TC | NETIF_F_HW_L2FW_DOFFLOAD; + slave_dev->features |= NETIF_F_LLTX | NETIF_F_HW_L2FW_DOFFLOAD; slave_dev->ethtool_ops = &dsa_slave_ethtool_ops; if (!is_zero_ether_addr(port->mac)) ether_addr_copy(slave_dev->dev_addr, port->mac); -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 7/9] net: dsa: mv88e6xxx: Allocate a virtual DSA port for each bridge
In the near future we want to offload transmission of both unicasts and multicasts from a bridge by sending a single FORWARD and use the switches' config to determine the destination(s). Much in the same way as we have already relied on them to do between user ports in the past. As isolation between bridges must still be maintained, we need to pass an identifier in the DSA tag that the switches can use to determine the set of physical ports that make up a particular flooding domain. Therefore: allocate a DSA device/port tuple that is not used by any physical device to each bridge we are offloading. We can then in upcoming changes use this tuple to setup cross-chip port based VLANs to restrict the set of valid egress ports to only contain the ports that are offloading the same bridge. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- drivers/net/dsa/mv88e6xxx/Makefile | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 11 +++ drivers/net/dsa/mv88e6xxx/dst.c | 127 +++++++++++++++++++++++++++++ drivers/net/dsa/mv88e6xxx/dst.h | 12 +++ include/net/dsa.h | 5 ++ 5 files changed, 156 insertions(+) create mode 100644 drivers/net/dsa/mv88e6xxx/dst.c create mode 100644 drivers/net/dsa/mv88e6xxx/dst.h diff --git a/drivers/net/dsa/mv88e6xxx/Makefile b/drivers/net/dsa/mv88e6xxx/Makefile index c8eca2b6f959..20e00695b28d 100644 --- a/drivers/net/dsa/mv88e6xxx/Makefile +++ b/drivers/net/dsa/mv88e6xxx/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o mv88e6xxx-objs := chip.o mv88e6xxx-objs += devlink.o +mv88e6xxx-objs += dst.o mv88e6xxx-objs += global1.o mv88e6xxx-objs += global1_atu.o mv88e6xxx-objs += global1_vtu.o diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index eca285aaf72f..06ef654472b7 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -33,6 +33,7 @@ #include "chip.h" #include "devlink.h" +#include "dst.h" #include "global1.h" #include "global2.h" #include "hwtstamp.h" @@ -2371,6 +2372,10 @@ static int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port, struct mv88e6xxx_chip *chip = ds->priv; int err; + err = mv88e6xxx_dst_bridge_join(ds->dst, br); + if (err) + return err; + mv88e6xxx_reg_lock(chip); err = mv88e6xxx_bridge_map(chip, br); mv88e6xxx_reg_unlock(chip); @@ -2388,6 +2393,8 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port, mv88e6xxx_port_vlan_map(chip, port)) dev_err(ds->dev, "failed to remap in-chip Port VLAN\n"); mv88e6xxx_reg_unlock(chip); + + mv88e6xxx_dst_bridge_leave(ds->dst, br); } static int mv88e6xxx_crosschip_bridge_join(struct dsa_switch *ds, @@ -3027,6 +3034,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds) mv88e6xxx_reg_lock(chip); + err = mv88e6xxx_dst_add_chip(chip); + if (err) + goto unlock; + if (chip->info->ops->setup_errata) { err = chip->info->ops->setup_errata(chip); if (err) diff --git a/drivers/net/dsa/mv88e6xxx/dst.c b/drivers/net/dsa/mv88e6xxx/dst.c new file mode 100644 index 000000000000..399a818063bf --- /dev/null +++ b/drivers/net/dsa/mv88e6xxx/dst.c @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * mv88e6xxx global DSA switch tree state + */ + +#include <linux/bitmap.h> +#include <linux/dsa/mv88e6xxx.h> +#include <linux/list.h> +#include <linux/module.h> +#include <linux/netdevice.h> +#include <net/dsa.h> + +#include "chip.h" +#include "dst.h" +#include "global2.h" + +struct mv88e6xxx_br { + struct list_head list; + + struct net_device *brdev; + u8 dev; + u8 port; +}; + +struct mv88e6xxx_dst { + struct list_head bridges; + + DECLARE_BITMAP(busy_ports, MV88E6XXX_MAX_PVT_ENTRIES); + +#define DEV_PORT_TO_BIT(_dev, _port) \ + ((_dev) * MV88E6XXX_MAX_PVT_PORTS + (_port)) +#define DEV_FROM_BIT(_bit) ((_bit) / MV88E6XXX_MAX_PVT_PORTS) +#define PORT_FROM_BIT(_bit) ((_bit) % (MV88E6XXX_MAX_PVT_PORTS)) +}; + +int mv88e6xxx_dst_bridge_join(struct dsa_switch_tree *dst, + struct net_device *brdev) +{ + struct mv88e6xxx_dst *mvdst = dst->priv; + struct mv88e6xxx_br *mvbr; + unsigned int bit; + + list_for_each_entry(mvbr, &mvdst->bridges, list) { + if (mvbr->brdev == brdev) + return 0; + } + + bit = find_first_zero_bit(mvdst->busy_ports, + MV88E6XXX_MAX_PVT_ENTRIES); + + if (bit >= MV88E6XXX_MAX_PVT_ENTRIES) { + pr_err("Unable to allocate virtual port for %s in DSA tree %d\n", + netdev_name(brdev), dst->index); + return -ENOSPC; + } + + mvbr = kzalloc(sizeof(*mvbr), GFP_KERNEL); + if (!mvbr) + return -ENOMEM; + + mvbr->brdev = brdev; + mvbr->dev = DEV_FROM_BIT(bit); + mvbr->port = PORT_FROM_BIT(bit); + + INIT_LIST_HEAD(&mvbr->list); + list_add_tail(&mvbr->list, &mvdst->bridges); + set_bit(bit, mvdst->busy_ports); + return 0; +} + +void mv88e6xxx_dst_bridge_leave(struct dsa_switch_tree *dst, + struct net_device *brdev) +{ + struct mv88e6xxx_dst *mvdst = dst->priv; + struct mv88e6xxx_br *mvbr; + struct dsa_port *dp; + + list_for_each_entry(dp, &dst->ports, list) { + if (dp->bridge_dev == brdev) + return; + } + + list_for_each_entry(mvbr, &mvdst->bridges, list) { + if (mvbr->brdev == brdev) { + clear_bit(DEV_PORT_TO_BIT(mvbr->dev, mvbr->port), + mvdst->busy_ports); + list_del(&mvbr->list); + kfree(mvbr); + return; + } + } +} + +static struct mv88e6xxx_dst *mv88e6xxx_dst_get(struct dsa_switch_tree *dst) +{ + struct mv88e6xxx_dst *mvdst; + + if (dst->priv) + return dst->priv; + + mvdst = kzalloc(sizeof(*mvdst), GFP_KERNEL); + if (!mvdst) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&mvdst->bridges); + + bitmap_set(mvdst->busy_ports, + DEV_PORT_TO_BIT(MV88E6XXX_G2_PVT_ADDR_DEV_TRUNK, 0), + MV88E6XXX_MAX_PVT_PORTS); + + dst->priv = mvdst; + return mvdst; +} + +int mv88e6xxx_dst_add_chip(struct mv88e6xxx_chip *chip) +{ + struct dsa_switch_tree *dst = chip->ds->dst; + struct mv88e6xxx_dst *mvdst; + + mvdst = mv88e6xxx_dst_get(dst); + if (IS_ERR(mvdst)) + return PTR_ERR(mvdst); + + bitmap_set(mvdst->busy_ports, DEV_PORT_TO_BIT(chip->ds->index, 0), + MV88E6XXX_MAX_PVT_PORTS); + return 0; +} diff --git a/drivers/net/dsa/mv88e6xxx/dst.h b/drivers/net/dsa/mv88e6xxx/dst.h new file mode 100644 index 000000000000..3845a19192ef --- /dev/null +++ b/drivers/net/dsa/mv88e6xxx/dst.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _MV88E6XXX_DST_H +#define _MV88E6XXX_DST_H + +int mv88e6xxx_dst_bridge_join(struct dsa_switch_tree *dst, + struct net_device *brdev); +void mv88e6xxx_dst_bridge_leave(struct dsa_switch_tree *dst, + struct net_device *brdev); +int mv88e6xxx_dst_add_chip(struct mv88e6xxx_chip *chip); + +#endif /* _MV88E6XXX_DST_H */ diff --git a/include/net/dsa.h b/include/net/dsa.h index 77d4df819299..c01e74d6e134 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -172,6 +172,11 @@ struct dsa_switch_tree { */ struct net_device **lags; unsigned int lags_len; + + /* Give the switch driver somewhere to hang its tree-wide + * private data structure. + */ + void *priv; }; #define dsa_lags_foreach_id(_id, _dst) \ -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 8/9] net: dsa: mv88e6xxx: Map virtual bridge port in PVT
Now that each bridge has a unique DSA device/port tuple, make sure that each chip limits forwarding from the bridge to only include fabric ports and local ports that are members of the same bridge. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- MAINTAINERS | 1 + drivers/net/dsa/mv88e6xxx/chip.c | 33 ++++++++++++++++++++++++-------- drivers/net/dsa/mv88e6xxx/dst.c | 33 ++++++++++++++++++++++++++++++++ drivers/net/dsa/mv88e6xxx/dst.h | 2 ++ include/linux/dsa/mv88e6xxx.h | 13 +++++++++++++ 5 files changed, 74 insertions(+), 8 deletions(-) create mode 100644 include/linux/dsa/mv88e6xxx.h diff --git a/MAINTAINERS b/MAINTAINERS index c3c8fa572580..8794b05793b2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10647,6 +10647,7 @@ S: Maintained F: Documentation/devicetree/bindings/net/dsa/marvell.txt F: Documentation/networking/devlink/mv88e6xxx.rst F: drivers/net/dsa/mv88e6xxx/ +F: include/linux/dsa/mv88e6xxx.h F: include/linux/platform_data/mv88e6xxx.h MARVELL ARMADA 3700 PHY DRIVERS diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 06ef654472b7..6975cf16da65 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -12,6 +12,7 @@ #include <linux/bitfield.h> #include <linux/delay.h> +#include <linux/dsa/mv88e6xxx.h> #include <linux/etherdevice.h> #include <linux/ethtool.h> #include <linux/if_bridge.h> @@ -1229,15 +1230,25 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port) } } - /* Prevent frames from unknown switch or port */ - if (!found) - return 0; + if (found) { + /* Frames from DSA links and CPU ports can egress any + * local port. + */ + if (dp->type == DSA_PORT_TYPE_CPU || + dp->type == DSA_PORT_TYPE_DSA) + return mv88e6xxx_port_mask(chip); - /* Frames from DSA links and CPU ports can egress any local port */ - if (dp->type == DSA_PORT_TYPE_CPU || dp->type == DSA_PORT_TYPE_DSA) - return mv88e6xxx_port_mask(chip); + br = dp->bridge_dev; + } else { + br = mv88e6xxx_dst_bridge_from_dsa(dst, dev, port); + + /* Reject frames from ports that are neither physical + * nor virtual bridge ports. + */ + if (!br) + return 0; + } - br = dp->bridge_dev; pvlan = 0; /* Frames from user ports can egress any local DSA links and CPU ports, @@ -2340,6 +2351,7 @@ static int mv88e6xxx_bridge_map(struct mv88e6xxx_chip *chip, struct dsa_switch *ds = chip->ds; struct dsa_switch_tree *dst = ds->dst; struct dsa_port *dp; + u8 dev, port; int err; list_for_each_entry(dp, &dst->ports, list) { @@ -2363,7 +2375,12 @@ static int mv88e6xxx_bridge_map(struct mv88e6xxx_chip *chip, } } - return 0; + /* Map the virtual bridge port if one is assigned. */ + err = mv88e6xxx_dst_bridge_to_dsa(dst, br, &dev, &port); + if (!err) + err = mv88e6xxx_pvt_map(chip, dev, port); + + return err; } static int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port, diff --git a/drivers/net/dsa/mv88e6xxx/dst.c b/drivers/net/dsa/mv88e6xxx/dst.c index 399a818063bf..a5f9077e5b3f 100644 --- a/drivers/net/dsa/mv88e6xxx/dst.c +++ b/drivers/net/dsa/mv88e6xxx/dst.c @@ -33,6 +33,39 @@ struct mv88e6xxx_dst { #define PORT_FROM_BIT(_bit) ((_bit) % (MV88E6XXX_MAX_PVT_PORTS)) }; +struct net_device *mv88e6xxx_dst_bridge_from_dsa(struct dsa_switch_tree *dst, + u8 dev, u8 port) +{ + struct mv88e6xxx_dst *mvdst = dst->priv; + struct mv88e6xxx_br *mvbr; + + list_for_each_entry(mvbr, &mvdst->bridges, list) { + if (mvbr->dev == dev && mvbr->port == port) + return mvbr->brdev; + } + + return NULL; +} + +int mv88e6xxx_dst_bridge_to_dsa(const struct dsa_switch_tree *dst, + const struct net_device *brdev, + u8 *dev, u8 *port) +{ + struct mv88e6xxx_dst *mvdst = dst->priv; + struct mv88e6xxx_br *mvbr; + + list_for_each_entry(mvbr, &mvdst->bridges, list) { + if (mvbr->brdev == brdev) { + *dev = mvbr->dev; + *port = mvbr->port; + return 0; + } + } + + return -ENODEV; +} +EXPORT_SYMBOL_GPL(mv88e6xxx_dst_bridge_to_dsa); + int mv88e6xxx_dst_bridge_join(struct dsa_switch_tree *dst, struct net_device *brdev) { diff --git a/drivers/net/dsa/mv88e6xxx/dst.h b/drivers/net/dsa/mv88e6xxx/dst.h index 3845a19192ef..911890ec4792 100644 --- a/drivers/net/dsa/mv88e6xxx/dst.h +++ b/drivers/net/dsa/mv88e6xxx/dst.h @@ -3,6 +3,8 @@ #ifndef _MV88E6XXX_DST_H #define _MV88E6XXX_DST_H +struct net_device *mv88e6xxx_dst_bridge_from_dsa(struct dsa_switch_tree *dst, + u8 dev, u8 port); int mv88e6xxx_dst_bridge_join(struct dsa_switch_tree *dst, struct net_device *brdev); void mv88e6xxx_dst_bridge_leave(struct dsa_switch_tree *dst, diff --git a/include/linux/dsa/mv88e6xxx.h b/include/linux/dsa/mv88e6xxx.h new file mode 100644 index 000000000000..fa486dfe9808 --- /dev/null +++ b/include/linux/dsa/mv88e6xxx.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _NET_DSA_MV88E6XXX_H +#define _NET_DSA_MV88E6XXX_H + +#include <linux/netdevice.h> +#include <net/dsa.h> + +int mv88e6xxx_dst_bridge_to_dsa(const struct dsa_switch_tree *dst, + const struct net_device *brdev, + u8 *dev, u8 *port); + +#endif /* _NET_DSA_MV88E6XXX_H */ -- 2.25.1
Tobias Waldekranz
2021-Apr-26 17:04 UTC
[Bridge] [RFC net-next 9/9] net: dsa: mv88e6xxx: Forward offloading
Allow the DSA tagger to generate FORWARD frames for offloaded skbs sent from a bridge that we offload, allowing the switch to handle any frame replication that may be required. This also means that source address learning takes place on packets sent from the CPU, meaning that return traffic no longer needs to be flooded as unknown unicast. Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> --- drivers/net/dsa/mv88e6xxx/chip.c | 17 ++++++++++++++++ net/dsa/tag_dsa.c | 33 ++++++++++++++++++++++++-------- 2 files changed, 42 insertions(+), 8 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 6975cf16da65..00ed1aa2a55a 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -6077,6 +6077,21 @@ static int mv88e6xxx_crosschip_lag_leave(struct dsa_switch *ds, int sw_index, return err_sync ? : err_pvt; } +static void *mv88e6xxx_dfwd_add_station(struct dsa_switch *ds, int port, + struct net_device *sb_dev) +{ + struct dsa_port *dp = dsa_to_port(ds, port); + struct mv88e6xxx_chip *chip = ds->priv; + + if (!mv88e6xxx_has_pvt(chip)) + return ERR_PTR(-EOPNOTSUPP); + + if (sb_dev == dp->bridge_dev) + return sb_dev; + + return ERR_PTR(-EOPNOTSUPP); +} + static const struct dsa_switch_ops mv88e6xxx_switch_ops = { .get_tag_protocol = mv88e6xxx_get_tag_protocol, .change_tag_protocol = mv88e6xxx_change_tag_protocol, @@ -6138,6 +6153,7 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = { .crosschip_lag_change = mv88e6xxx_crosschip_lag_change, .crosschip_lag_join = mv88e6xxx_crosschip_lag_join, .crosschip_lag_leave = mv88e6xxx_crosschip_lag_leave, + .dfwd_add_station = mv88e6xxx_dfwd_add_station, }; static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip) @@ -6156,6 +6172,7 @@ static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip) ds->ops = &mv88e6xxx_switch_ops; ds->ageing_time_min = chip->info->age_time_coeff; ds->ageing_time_max = chip->info->age_time_coeff * U8_MAX; + ds->num_tx_queues = 2; /* Some chips support up to 32, but that requires enabling the * 5-bit port mode, which we do not support. 640k^W16 ought to diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c index 7e7b7decdf39..09cdf77697b2 100644 --- a/net/dsa/tag_dsa.c +++ b/net/dsa/tag_dsa.c @@ -46,6 +46,7 @@ */ #include <linux/etherdevice.h> +#include <linux/dsa/mv88e6xxx.h> #include <linux/list.h> #include <linux/slab.h> @@ -126,7 +127,22 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev, u8 extra) { struct dsa_port *dp = dsa_slave_to_port(dev); + u16 pvid = dp->pvid; + enum dsa_cmd cmd; u8 *dsa_header; + u8 tag_dev, tag_port; + + if (DSA_SKB_CB(skb)->sb_dev) { + cmd = DSA_CMD_FORWARD; + if (mv88e6xxx_dst_bridge_to_dsa(dp->ds->dst, + DSA_SKB_CB(skb)->sb_dev, + &tag_dev, &tag_port)) + return NULL; + } else { + cmd = DSA_CMD_FROM_CPU; + tag_dev = dp->ds->index; + tag_port = dp->index; + } if (skb->protocol == htons(ETH_P_8021Q)) { if (extra) { @@ -134,10 +150,10 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev, memmove(skb->data, skb->data + extra, 2 * ETH_ALEN); } - /* Construct tagged FROM_CPU DSA tag from 802.1Q tag. */ + /* Construct tagged DSA tag from 802.1Q tag. */ dsa_header = skb->data + 2 * ETH_ALEN + extra; - dsa_header[0] = (DSA_CMD_FROM_CPU << 6) | 0x20 | dp->ds->index; - dsa_header[1] = dp->index << 3; + dsa_header[0] = (cmd << 6) | 0x20 | tag_dev; + dsa_header[1] = tag_port << 3; /* Move CFI field from byte 2 to byte 1. */ if (dsa_header[2] & 0x10) { @@ -148,12 +164,13 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev, skb_push(skb, DSA_HLEN + extra); memmove(skb->data, skb->data + DSA_HLEN + extra, 2 * ETH_ALEN); - /* Construct untagged FROM_CPU DSA tag. */ + /* Construct untagged DSA tag. */ dsa_header = skb->data + 2 * ETH_ALEN + extra; - dsa_header[0] = (DSA_CMD_FROM_CPU << 6) | dp->ds->index; - dsa_header[1] = dp->index << 3; - dsa_header[2] = 0x00; - dsa_header[3] = 0x00; + + dsa_header[0] = (cmd << 6) | tag_dev; + dsa_header[1] = tag_port << 3; + dsa_header[2] = pvid >> 8; + dsa_header[3] = pvid & 0xff; } return skb; -- 2.25.1
Vladimir Oltean
2021-Apr-26 19:40 UTC
[Bridge] [RFC net-next 5/9] net: dsa: Track port PVIDs
Hi Tobias, On Mon, Apr 26, 2021 at 07:04:07PM +0200, Tobias Waldekranz wrote:> In some scenarios a tagger must know which VLAN to assign to a packet, > even if the packet is set to egress untagged. Since the VLAN > information in the skb will be removed by the bridge in this case, > track each port's PVID such that the VID of an outgoing frame can > always be determined. > > Signed-off-by: Tobias Waldekranz <tobias at waldekranz.com> > ---Let me give you this real-life example: #!/bin/bash ip link add br0 type bridge vlan_filtering 1 for eth in eth0 eth1 swp2 swp3 swp4 swp5; do ip link set $eth up ip link set $eth master br0 done ip link set br0 up bridge vlan add dev eth0 vid 100 pvid untagged bridge vlan del dev swp2 vid 1 bridge vlan del dev swp3 vid 1 bridge vlan add dev swp2 vid 100 bridge vlan add dev swp3 vid 100 untagged reproducible on the NXP LS1021A-TSN board. The bridge receives an untagged packet on eth0 and floods it. It should reach swp2 and swp3, and be tagged on swp2, and untagged on swp3 respectively. With your idea of sending untagged frames towards the port's pvid, wouldn't we be leaking this packet to VLAN 1, therefore towards ports swp4 and swp5, and the real destination ports would not get this packet?
Ido Schimmel
2021-May-02 14:58 UTC
[Bridge] [RFC net-next 0/9] net: bridge: Forward offloading
On Mon, Apr 26, 2021 at 07:04:02PM +0200, Tobias Waldekranz wrote:> ## Overview > > vlan1 vlan2 > \ / > .-----------. > | br0 | > '-----------' > / / \ \ > swp0 swp1 swp2 eth0 > : : : > (hwdom 1) > > Up to this point, switchdevs have been trusted with offloading > forwarding between bridge ports, e.g. forwarding a unicast from swp0 > to swp1 or flooding a broadcast from swp2 to swp1 and swp0. This > series extends forward offloading to include some new classes of > traffic: > > - Locally originating flows, i.e. packets that ingress on br0 that are > to be forwarded to one or several of the ports swp{0,1,2}. Notably > this also includes routed flows, e.g. a packet ingressing swp0 on > VLAN 1 which is then routed over to VLAN 2 by the CPU and then > forwarded to swp1 is "locally originating" from br0's point of view. > > - Flows originating from "foreign" interfaces, i.e. an interface that > is not offloaded by a particular switchdev instance. This includes > ports belonging to other switchdev instances. A typical example > would be flows from eth0 towards swp{0,1,2}. > > The bridge still looks up its FDB/MDB as usual and then notifies the > switchdev driver that a particular skb should be offloaded if it > matches one of the classes above. It does so by using the _accel > version of dev_queue_xmit, supplying its own netdev as the > "subordinate" device. The driver can react to the presence of the > subordinate in its .ndo_select_queue in what ever way it needs to make > sure to forward the skb in much the same way that it would for packets > ingressing on regular ports. > > Hardware domains to which a particular skb has been forwarded are > recorded so that duplicates are avoided. > > The main performance benefit is thus seen on multicast flows. Imagine > for example that: > > - An IP camera is connected to swp0 (VLAN 1) > > - The CPU is acting as a multicast router, routing the group from VLAN > 1 to VLAN 2. > > - There are subscribers for the group in question behind both swp1 and > swp2 (VLAN 2).IIUC, this falls under the first use case ("Locally originating flows"). Do you have a need for this optimization in the forwarding case? Asking because it might allow us to avoid unnecessary modifications to the forwarding path. I have yet to look at the code, so maybe it's not a big deal.> > With this offloading in place, the bridge need only send a single skb > to the driver, which will send it to the hardware marked in such a way > that the switch will perform the multicast replication according to > the MDB configuration. Naturally, the number of saved skb_clones > increase linearly with the number of subscribed ports.Yes, this is clear. FWIW, Spectrum has something similar. You can send packets as either "data" or "control". Data packets are injected via the CPU port and forwarded according to the hardware database. Control packets are sent as-is via the specified front panel port, bypassing the hardware data path. mlxsw is always sending packets as "control".> > As an extra benefit, on mv88e6xxx, this also allows the switch to > perform source address learning on these flows, which avoids having to > sync dynamic FDB entries over slow configuration interfaces like MDIO > to avoid flows directed towards the CPU being flooded as unknown > unicast by the switch.Since you are not syncing FDBs, it is possible that you are needlessly flooding locally generated packets. This optimization avoids it.> > > ## RFC > > - In general, what do you think about this idea?Looks sane to me> > - hwdom. What do you think about this terminology? Personally I feel > that we had too many things called offload_fwd_mark, and that as the > use of the bridge internal ID (nbp->offload_fwd_mark) expands, it > might be useful to have a separate term for it.Sounds OK> > - .dfwd_{add,del}_station. Am I stretching this abstraction too far, > and if so do you have any suggestion/preference on how to signal the > offloading from the bridge down to the switchdev driver?I was not aware of this interface before the RFC, but your use case seems to fit the kdoc: "Called by upper layer devices to accelerate switching or other station functionality into hardware". Do you expect this optimization to only work when physical netdevs are enslaved to the bridge? What about LAG/VLANs?> > - The way that flooding is implemented in br_forward.c (lazily cloning > skbs) means that you have to mark the forwarding as completed very > early (right after should_deliver in maybe_deliver) in order to > avoid duplicates. Is there some way to move this decision point to a > later stage that I am missing? > > - BR_MULTICAST_TO_UNICAST. Right now, I expect that this series is not > compatible with unicast-to-multicast being used on a port. Then > again, I think that this would also be broken for regular switchdev > bridge offloading as this flag is not offloaded to the switchdev > port, so there is no way for the driver to refuse it. Any ideas on > how to handle this? > > > ## mv88e6xxx Specifics > > Since we are now only receiving a single skb for both unicast and > multicast flows, we can tag the packets with the FORWARD command > instead of FROM_CPU. The swich(es) will then forward the packet in > accordance with its ATU, VTU, STU, and PVT configuration - just like > for packets ingressing on user ports. > > Crucially, FROM_CPU is still used for: > > - Ports in standalone mode. > > - Flows that are trapped to the CPU and software-forwarded by a > bridge. Note that these flows match neither of the classes discussed > in the overview. > > - Packets that are sent directly to a port netdev without going > through the bridge, e.g. lldpd sending out PDU via an AF_PACKET > socket. > > We thus have a pretty clean separation where the data plane uses > FORWARDs and the control plane uses TO_/FROM_CPU. > > The barrier between different bridges is enforced by port based VLANs > on mv88e6xxx, which in essence is a mapping from a source device/port > pair to an allowed set of egress ports. In order to have a FORWARD > frame (which carries a _source_ device/port) correctly mapped by the > PVT, we must use a unique pair for each bridge. > > Fortunately, there is typically lots of unused address space in most > switch trees. When was the last time you saw an mv88e6xxx product > using more than 4 chips? Even if you found one with 16 (!) devices, > you would still have room to allocate 16*16 virtual ports to software > bridges. > > Therefore, the mv88e6xxx driver will allocate a virtual device/port > pair to each bridge that it offloads. All members of the same bridge > are then configured to allow packets from this virtual port in their > PVTs. > > Tobias Waldekranz (9): > net: dfwd: Constrain existing users to macvlan subordinates > net: bridge: Disambiguate offload_fwd_mark > net: bridge: switchdev: Recycle unused hwdoms > net: bridge: switchdev: Forward offloading > net: dsa: Track port PVIDs > net: dsa: Forward offloading > net: dsa: mv88e6xxx: Allocate a virtual DSA port for each bridge > net: dsa: mv88e6xxx: Map virtual bridge port in PVT > net: dsa: mv88e6xxx: Forward offloading > > MAINTAINERS | 1 + > drivers/net/dsa/mv88e6xxx/Makefile | 1 + > drivers/net/dsa/mv88e6xxx/chip.c | 61 ++++++- > drivers/net/dsa/mv88e6xxx/dst.c | 160 ++++++++++++++++++ > drivers/net/dsa/mv88e6xxx/dst.h | 14 ++ > .../net/ethernet/intel/fm10k/fm10k_netdev.c | 3 + > drivers/net/ethernet/intel/i40e/i40e_main.c | 3 + > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 + > include/linux/dsa/mv88e6xxx.h | 13 ++ > include/net/dsa.h | 13 ++ > net/bridge/br_forward.c | 11 +- > net/bridge/br_if.c | 4 +- > net/bridge/br_private.h | 54 +++++- > net/bridge/br_switchdev.c | 141 +++++++++++---- > net/dsa/port.c | 16 +- > net/dsa/slave.c | 36 +++- > net/dsa/tag_dsa.c | 33 +++- > 17 files changed, 510 insertions(+), 57 deletions(-) > create mode 100644 drivers/net/dsa/mv88e6xxx/dst.c > create mode 100644 drivers/net/dsa/mv88e6xxx/dst.h > create mode 100644 include/linux/dsa/mv88e6xxx.h > > -- > 2.25.1 >
Ido Schimmel
2021-May-02 15:04 UTC
[Bridge] [RFC net-next 4/9] net: bridge: switchdev: Forward offloading
On Mon, Apr 26, 2021 at 07:04:06PM +0200, Tobias Waldekranz wrote:> +static void nbp_switchdev_fwd_offload_add(struct net_bridge_port *p) > +{ > + void *priv; > + > + if (!(p->dev->features & NETIF_F_HW_L2FW_DOFFLOAD)) > + return; > + > + priv = p->dev->netdev_ops->ndo_dfwd_add_station(p->dev, p->br->dev);Some changes to team/bond/8021q will be needed in order to get this optimization to work when they are enslaved to the bridge instead of the front panel port itself?> + if (!IS_ERR_OR_NULL(priv)) > + p->accel_priv = priv; > +} > + > +static void nbp_switchdev_fwd_offload_del(struct net_bridge_port *p) > +{ > + if (!p->accel_priv) > + return; > + > + p->dev->netdev_ops->ndo_dfwd_del_station(p->dev, p->accel_priv); > + p->accel_priv = NULL; > +}