Yoshiki Komachi
2020-Jul-31 04:44 UTC
[Bridge] [RFC PATCH bpf-next 0/3] Add a new bpf helper for FDB lookup
This series adds a new bpf helper for doing FDB lookup in the kernel tables from XDP programs. This helps users to accelerate Linux bridge with XDP. In the past, XDP generally required users to reimplement their own networking functionalities with specific manners of BPF programming by themselves, hindering?its?potential?uses. IMO, bpf helpers to access networking stacks in kernel help to mitigate the programming costs because users reuse mature Linux networking feature more easily. The previous commit 87f5fc7e48dd ("bpf: Provide helper to do forwarding lookups in kernel FIB table") have already added a bpf helper for access FIB in the kernel tables from XDP programs. As a next step, this series introduces the API for FDB lookup. In the future, other bpf helpers for learning and VLAN filtering will also be required in order to realize fast XDP-based bridge although these are not included in this series. Patch 1 adds new function for access FDB in the kernel tables via the new bpf helper. Patch 2 adds the bpf helper and 3 adds a sample program. Yoshiki Komachi (3): net/bridge: Add new function to access FDB from XDP programs bpf: Add helper to do forwarding lookups in kernel FDB table samples/bpf: Add a simple bridge example accelerated with XDP include/linux/if_bridge.h | 11 ++ include/uapi/linux/bpf.h | 28 ++++ net/bridge/br_fdb.c | 25 ++++ net/core/filter.c | 45 +++++++ samples/bpf/Makefile | 3 + samples/bpf/xdp_bridge_kern.c | 129 ++++++++++++++++++ samples/bpf/xdp_bridge_user.c | 239 +++++++++++++++++++++++++++++++++ scripts/bpf_helpers_doc.py | 1 + tools/include/uapi/linux/bpf.h | 28 ++++ 9 files changed, 509 insertions(+) create mode 100644 samples/bpf/xdp_bridge_kern.c create mode 100644 samples/bpf/xdp_bridge_user.c -- 2.20.1 (Apple Git-117)
Yoshiki Komachi
2020-Jul-31 04:44 UTC
[Bridge] [RFC PATCH bpf-next 1/3] net/bridge: Add new function to access FDB from XDP programs
This patch adds a function to find the destination port from the FDB in the kernel tables, which mainly helps XDP programs to access FDB in the kernel via bpf helper. Note that, unlike the existing br_fdb_find_port(), this function takes an ingress device as an argument. The br_fdb_find_port() also enables us to access FDB in the kernel, and rcu_read_lock()/rcu_read_unlock() must be called in the function. But, these are unnecessary in that cases because XDP programs have to call APIs with rcu_read_lock()/rcu_read_unlock(). Thus, proposed function could be used without these locks in the function. Signed-off-by: Yoshiki Komachi <komachi.yoshiki at gmail.com> --- include/linux/if_bridge.h | 11 +++++++++++ net/bridge/br_fdb.c | 25 +++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 6479a38e52fa..24d72d115d0b 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -127,6 +127,9 @@ static inline int br_vlan_get_info(const struct net_device *dev, u16 vid, struct net_device *br_fdb_find_port(const struct net_device *br_dev, const unsigned char *addr, __u16 vid); +struct net_device *br_fdb_find_port_xdp(const struct net_device *dev, + const unsigned char *addr, + __u16 vid); void br_fdb_clear_offload(const struct net_device *dev, u16 vid); bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag); #else @@ -138,6 +141,14 @@ br_fdb_find_port(const struct net_device *br_dev, return NULL; } +static inline struct net_device * +br_fdb_find_port_xdp(const struct net_device *dev, + const unsigned char *addr, + __u16 vid); +{ + return NULL; +} + static inline void br_fdb_clear_offload(const struct net_device *dev, u16 vid) { } diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index 9db504baa094..79bc3c2da668 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -141,6 +141,31 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev, } EXPORT_SYMBOL_GPL(br_fdb_find_port); +struct net_device *br_fdb_find_port_xdp(const struct net_device *dev, + const unsigned char *addr, + __u16 vid) +{ + struct net_bridge_fdb_entry *f; + struct net_device *dst = NULL; + struct net_bridge *br = NULL; + struct net_bridge_port *p; + + p = br_port_get_check_rcu(dev); + if (!p) + return NULL; + + br = p->br; + if (!br) + return NULL; + + f = br_fdb_find_rcu(br, addr, vid); + if (f && f->dst) + dst = f->dst->dev; + + return dst; +} +EXPORT_SYMBOL_GPL(br_fdb_find_port_xdp); + struct net_bridge_fdb_entry *br_fdb_find_rcu(struct net_bridge *br, const unsigned char *addr, __u16 vid) -- 2.20.1 (Apple Git-117)
Yoshiki Komachi
2020-Jul-31 04:44 UTC
[Bridge] [RFC PATCH bpf-next 2/3] bpf: Add helper to do forwarding lookups in kernel FDB table
This patch adds a new bpf helper to access FDB in the kernel tables from XDP programs. The helper enables us to find the destination port of master bridge in XDP layer with high speed. If an entry in the tables is successfully found, egress device index will be returned. In cases of failure, packets will be dropped or forwarded to upper networking stack in the kernel by XDP programs. Multicast and broadcast packets are currently not supported. Thus, these will need to be passed to upper layer on the basis of XDP_PASS action. The API uses destination MAC and VLAN ID as keys, so XDP programs need to extract these from forwarded packets. Signed-off-by: Yoshiki Komachi <komachi.yoshiki at gmail.com> --- include/uapi/linux/bpf.h | 28 +++++++++++++++++++++ net/core/filter.c | 45 ++++++++++++++++++++++++++++++++++ scripts/bpf_helpers_doc.py | 1 + tools/include/uapi/linux/bpf.h | 28 +++++++++++++++++++++ 4 files changed, 102 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 54d0c886e3ba..f2e729dd1721 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2149,6 +2149,22 @@ union bpf_attr { * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the * packet is not forwarded or needs assist from full stack * + * long bpf_fdb_lookup(void *ctx, struct bpf_fdb_lookup *params, int plen, u32 flags) + * Description + * Do FDB lookup in kernel tables using parameters in *params*. + * If lookup is successful (ie., FDB lookup finds a destination entry), + * ifindex is set to the egress device index from the FDB lookup. + * Both multicast and broadcast packets are currently unsupported + * in XDP layer. + * + * *plen* argument is the size of the passed **struct bpf_fdb_lookup**. + * *ctx* is only **struct xdp_md** for XDP programs. + * + * Return + * * < 0 if any input argument is invalid + * * 0 on success (destination port is found) + * * > 0 on failure (there is no entry) + * * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) * Description * Add an entry to, or update a sockhash *map* referencing sockets. @@ -3449,6 +3465,7 @@ union bpf_attr { FN(get_stack), \ FN(skb_load_bytes_relative), \ FN(fib_lookup), \ + FN(fdb_lookup), \ FN(sock_hash_update), \ FN(msg_redirect_hash), \ FN(sk_redirect_hash), \ @@ -4328,6 +4345,17 @@ struct bpf_fib_lookup { __u8 dmac[6]; /* ETH_ALEN */ }; +enum { + BPF_FDB_LKUP_RET_SUCCESS, /* lookup successful */ + BPF_FDB_LKUP_RET_NOENT, /* entry is not found */ +}; + +struct bpf_fdb_lookup { + unsigned char addr[6]; /* ETH_ALEN */ + __u16 vlan_id; + __u32 ifindex; +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ diff --git a/net/core/filter.c b/net/core/filter.c index 654c346b7d91..68800d1b8cd5 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -45,6 +45,7 @@ #include <linux/filter.h> #include <linux/ratelimit.h> #include <linux/seccomp.h> +#include <linux/if_bridge.h> #include <linux/if_vlan.h> #include <linux/bpf.h> #include <linux/btf.h> @@ -5084,6 +5085,46 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = { .arg4_type = ARG_ANYTHING, }; +#if IS_ENABLED(CONFIG_BRIDGE) +BPF_CALL_4(bpf_xdp_fdb_lookup, struct xdp_buff *, ctx, + struct bpf_fdb_lookup *, params, int, plen, u32, flags) +{ + struct net_device *src, *dst; + struct net *net; + + if (plen < sizeof(*params)) + return -EINVAL; + + net = dev_net(ctx->rxq->dev); + + if (is_multicast_ether_addr(params->addr) || + is_broadcast_ether_addr(params->addr)) + return BPF_FDB_LKUP_RET_NOENT; + + src = dev_get_by_index_rcu(net, params->ifindex); + if (unlikely(!src)) + return -ENODEV; + + dst = br_fdb_find_port_xdp(src, params->addr, params->vlan_id); + if (dst) { + params->ifindex = dst->ifindex; + return BPF_FDB_LKUP_RET_SUCCESS; + } + + return BPF_FDB_LKUP_RET_NOENT; +} + +static const struct bpf_func_proto bpf_xdp_fdb_lookup_proto = { + .func = bpf_xdp_fdb_lookup, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, + .arg4_type = ARG_ANYTHING, +}; +#endif + #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len) { @@ -6477,6 +6518,10 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_xdp_adjust_tail_proto; case BPF_FUNC_fib_lookup: return &bpf_xdp_fib_lookup_proto; +#if IS_ENABLED(CONFIG_BRIDGE) + case BPF_FUNC_fdb_lookup: + return &bpf_xdp_fdb_lookup_proto; +#endif #ifdef CONFIG_INET case BPF_FUNC_sk_lookup_udp: return &bpf_xdp_sk_lookup_udp_proto; diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 5bfa448b4704..49ebd2273614 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -448,6 +448,7 @@ class PrinterHelpers(Printer): '__wsum', 'struct bpf_fib_lookup', + 'struct bpf_fdb_lookup', 'struct bpf_perf_event_data', 'struct bpf_perf_event_value', 'struct bpf_pidns_info', diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 54d0c886e3ba..f2e729dd1721 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2149,6 +2149,22 @@ union bpf_attr { * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the * packet is not forwarded or needs assist from full stack * + * long bpf_fdb_lookup(void *ctx, struct bpf_fdb_lookup *params, int plen, u32 flags) + * Description + * Do FDB lookup in kernel tables using parameters in *params*. + * If lookup is successful (ie., FDB lookup finds a destination entry), + * ifindex is set to the egress device index from the FDB lookup. + * Both multicast and broadcast packets are currently unsupported + * in XDP layer. + * + * *plen* argument is the size of the passed **struct bpf_fdb_lookup**. + * *ctx* is only **struct xdp_md** for XDP programs. + * + * Return + * * < 0 if any input argument is invalid + * * 0 on success (destination port is found) + * * > 0 on failure (there is no entry) + * * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) * Description * Add an entry to, or update a sockhash *map* referencing sockets. @@ -3449,6 +3465,7 @@ union bpf_attr { FN(get_stack), \ FN(skb_load_bytes_relative), \ FN(fib_lookup), \ + FN(fdb_lookup), \ FN(sock_hash_update), \ FN(msg_redirect_hash), \ FN(sk_redirect_hash), \ @@ -4328,6 +4345,17 @@ struct bpf_fib_lookup { __u8 dmac[6]; /* ETH_ALEN */ }; +enum { + BPF_FDB_LKUP_RET_SUCCESS, /* lookup successful */ + BPF_FDB_LKUP_RET_NOENT, /* entry is not found */ +}; + +struct bpf_fdb_lookup { + unsigned char addr[6]; /* ETH_ALEN */ + __u16 vlan_id; + __u32 ifindex; +}; + enum bpf_task_fd_type { BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ BPF_FD_TYPE_TRACEPOINT, /* tp name */ -- 2.20.1 (Apple Git-117)
Yoshiki Komachi
2020-Jul-31 04:44 UTC
[Bridge] [RFC PATCH bpf-next 3/3] samples/bpf: Add a simple bridge example accelerated with XDP
This patch adds a simple example of XDP-based bridge with the new bpf_fdb_lookup helper. This program simply forwards packets based on the destination port given by FDB in the kernel. Note that both vlan filtering and learning features are currently unsupported in this example. There is another plan to recreate a userspace application (xdp_bridge_user.c) as a daemon process, which helps to automate not only detection of status changes in bridge port but also handling vlan protocol updates. Note:?David?Ahern?suggested?a?new?bpf?helper?[1]?to?get?master vlan/bonding?devices in?XDP?programs?attached?to?their?slaves when?the?master?vlan/bonding?devices are?bridge?ports. If?this idea?is?accepted?and?the?helper?is?introduced?in?the?future, we can?handle?interfaces?slaved?to?vlan/bonding?devices in?this sample?by?calling the?suggested?bpf?helper?(I?guess?it?can?get vlan/bonding?ifindex?from?their?slave?ifindex). Notice?that?we don't?need?to?change?bpf_fdb_lookup()?API?to?use?such?a?feature, but?we?just?need?to?modify?bpf?programs?like?this?sample. [1]: http://vger.kernel.org/lpc-networking2018.html#session-1 Signed-off-by: Yoshiki Komachi <komachi.yoshiki at gmail.com> --- samples/bpf/Makefile | 3 + samples/bpf/xdp_bridge_kern.c | 129 ++++++++++++++++++ samples/bpf/xdp_bridge_user.c | 239 ++++++++++++++++++++++++++++++++++ 3 files changed, 371 insertions(+) create mode 100644 samples/bpf/xdp_bridge_kern.c create mode 100644 samples/bpf/xdp_bridge_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index f87ee02073ba..d470368fe8de 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -53,6 +53,7 @@ tprogs-y += task_fd_query tprogs-y += xdp_sample_pkts tprogs-y += ibumad tprogs-y += hbm +tprogs-y += xdp_bridge # Libbpf dependencies LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a @@ -109,6 +110,7 @@ task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS) xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS) ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS) hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS) +xdp_bridge-objs := xdp_bridge_user.o # Tell kbuild to always build the programs always-y := $(tprogs-y) @@ -170,6 +172,7 @@ always-y += ibumad_kern.o always-y += hbm_out_kern.o always-y += hbm_edt_kern.o always-y += xdpsock_kern.o +always-y += xdp_bridge_kern.o ifeq ($(ARCH), arm) # Strip all except -D__LINUX_ARM_ARCH__ option needed to handle linux diff --git a/samples/bpf/xdp_bridge_kern.c b/samples/bpf/xdp_bridge_kern.c new file mode 100644 index 000000000000..00f802503199 --- /dev/null +++ b/samples/bpf/xdp_bridge_kern.c @@ -0,0 +1,129 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020 NTT Corp. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ +#define KBUILD_MODNAME "foo" +#include <uapi/linux/bpf.h> +#include <linux/in.h> +#include <linux/if_ether.h> +#include <linux/if_packet.h> +#include <linux/if_vlan.h> +#include <linux/ip.h> +#include <linux/ipv6.h> + +#include <bpf/bpf_helpers.h> + +struct { + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); + __uint(key_size, sizeof(int)); + __uint(value_size, sizeof(int)); + __uint(max_entries, 64); +} xdp_tx_ports SEC(".maps"); + +static __always_inline int xdp_bridge_proto(struct xdp_md *ctx, u16 br_vlan_proto) +{ + void *data_end = (void *)(long)ctx->data_end; + void *data = (void *)(long)ctx->data; + struct bpf_fdb_lookup fdb_lookup_params; + struct vlan_hdr *vlan_hdr = NULL; + struct ethhdr *eth = data; + u16 h_proto; + u64 nh_off; + int rc; + + nh_off = sizeof(*eth); + if (data + nh_off > data_end) + return XDP_DROP; + + __builtin_memset(&fdb_lookup_params, 0, sizeof(fdb_lookup_params)); + + h_proto = eth->h_proto; + + if (unlikely(ntohs(h_proto) < ETH_P_802_3_MIN)) + return XDP_PASS; + + /* Handle VLAN tagged packet */ + if (h_proto == br_vlan_proto) { + vlan_hdr = (void *)eth + nh_off; + nh_off += sizeof(*vlan_hdr); + if ((void *)eth + nh_off > data_end) + return XDP_PASS; + + fdb_lookup_params.vlan_id = ntohs(vlan_hdr->h_vlan_TCI) & + VLAN_VID_MASK; + } + + /* FIXME: Although Linux bridge provides us with vlan filtering (contains + * PVID) at ingress, the feature is currently unsupported in this XDP program. + * + * Two ideas to realize the vlan filtering are below: + * 1. usespace daemon monitors bridge vlan events and notifies XDP programs + * of them through BPF maps + * 2. introduce another bpf helper to retrieve bridge vlan information + * + * + * FIXME: After the vlan filtering, learning feature is required here, but + * it is currently unsupported as well. If another bpf helper for learning + * is accepted, the processing could be implemented in the future. + */ + + memcpy(&fdb_lookup_params.addr, eth->h_dest, ETH_ALEN); + + /* Note: This program definitely takes ifindex of ingress interface as + * a bridge port. Linux networking devices can be stacked and physical + * interfaces are not necessarily slaves of bridges (e.g., bonding or + * vlan devices can be slaves of bridges), but stacked bridge ports are + * currently unsupported in this program. In such cases, XDP programs + * should be attached to a lower device in order to process packets with + * higher speed. Then, a new bpf helper to find upper devices will be + * required here in the future because they will be registered on FDB + * in the kernel. + */ + fdb_lookup_params.ifindex = ctx->ingress_ifindex; + + rc = bpf_fdb_lookup(ctx, &fdb_lookup_params, sizeof(fdb_lookup_params), 0); + if (rc != BPF_FDB_LKUP_RET_SUCCESS) { + /* In cases of flooding, XDP_PASS will be returned here */ + return XDP_PASS; + } + + /* FIXME: Although Linux bridge provides us with vlan filtering (contains + * untagged policy) at egress as well, the feature is currently unsupported + * in this XDP program. + * + * Two ideas to realize the vlan filtering are below: + * 1. usespace daemon monitors bridge vlan events and notifies XDP programs + * of them through BPF maps + * 2. introduce another bpf helper to retrieve bridge vlan information + */ + + return bpf_redirect_map(&xdp_tx_ports, fdb_lookup_params.ifindex, XDP_PASS); +} + +SEC("xdp_bridge") +int xdp_bridge_prog(struct xdp_md *ctx) +{ + return xdp_bridge_proto(ctx, 0); +} + +SEC("xdp_8021q_bridge") +int xdp_8021q_bridge_prog(struct xdp_md *ctx) +{ + return xdp_bridge_proto(ctx, htons(ETH_P_8021Q)); +} + +SEC("xdp_8021ad_bridge") +int xdp_8021ad_bridge_prog(struct xdp_md *ctx) +{ + return xdp_bridge_proto(ctx, htons(ETH_P_8021AD)); +} + +char _license[] SEC("license") = "GPL"; diff --git a/samples/bpf/xdp_bridge_user.c b/samples/bpf/xdp_bridge_user.c new file mode 100644 index 000000000000..6ed0a2ece6f4 --- /dev/null +++ b/samples/bpf/xdp_bridge_user.c @@ -0,0 +1,239 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020 NTT Corp. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include <linux/bpf.h> +#include <linux/if_link.h> +#include <linux/limits.h> +#include <net/if.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdbool.h> +#include <string.h> +#include <unistd.h> +#include <fcntl.h> +#include <libgen.h> + +#include <bpf/libbpf.h> +#include <bpf/bpf.h> + +#define STRERR_BUFSIZE 128 + +static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST; + +static int do_attach(int idx, int prog_fd, int map_fd, const char *name) +{ + int err; + + err = bpf_set_link_xdp_fd(idx, prog_fd, xdp_flags); + if (err < 0) { + printf("ERROR: failed to attach program to %s\n", name); + return err; + } + + /* Adding ifindex as a possible egress TX port */ + err = bpf_map_update_elem(map_fd, &idx, &idx, 0); + if (err) + printf("ERROR: failed using device %s as TX-port\n", name); + + return err; +} + +static int do_detach(int idx, const char *name) +{ + int err; + + err = bpf_set_link_xdp_fd(idx, -1, xdp_flags); + if (err < 0) + printf("ERROR: failed to detach program from %s\n", name); + + /* FIXME: Need to delete the corresponding entry in shared devmap + * with bpf_map_delete_elem((map_fd, &idx); + */ + return err; +} + +static int do_reuse_map(struct bpf_map *map, char *pin_path, bool *pinned) +{ + const char *path = "/sys/fs/bpf/xdp_bridge"; + char errmsg[STRERR_BUFSIZE]; + int err, len, pin_fd; + + len = snprintf(pin_path, PATH_MAX, "%s/%s", path, bpf_map__name(map)); + if (len < 0) + return -EINVAL; + else if (len >= PATH_MAX) + return -ENAMETOOLONG; + + pin_fd = bpf_obj_get(pin_path); + if (pin_fd < 0) { + err = -errno; + if (err == -ENOENT) { + *pinned = false; + return 0; + } + + libbpf_strerror(-err, errmsg, sizeof(errmsg)); + printf("couldn't retrieve pinned map: %s\n", errmsg); + return err; + } + + err = bpf_map__reuse_fd(map, pin_fd); + if (err) { + printf("failed to reuse map: %s\n", strerror(errno)); + close(pin_fd); + } + + return err; +} + +static void usage(const char *prog) +{ + fprintf(stderr, + "usage: %s [OPTS] interface-list\n" + "\nOPTS:\n" + " -Q enable vlan filtering (802.1Q)\n" + " -A enable vlan filtering (802.1ad)\n" + " -d detach program\n", + prog); +} + +int main(int argc, char **argv) +{ + struct bpf_object_open_attr attr = { + .prog_type = BPF_PROG_TYPE_XDP, + }; + char filename[PATH_MAX], pin_path[PATH_MAX]; + const char *prog_name = "xdp_bridge"; + int prog_fd = -1, map_fd = -1; + struct bpf_program *prog; + struct bpf_object *obj; + int opt, i, idx, err; + struct bpf_map *map; + bool pinned = true; + int attach = 1; + int ret = 0; + + while ((opt = getopt(argc, argv, ":dQASF")) != -1) { + switch (opt) { + case 'd': + attach = 0; + break; + case 'S': + xdp_flags |= XDP_FLAGS_SKB_MODE; + break; + case 'F': + xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST; + break; + case 'Q': + prog_name = "xdp_8021q_bridge"; + break; + case 'A': + prog_name = "xdp_8021ad_bridge"; + break; + default: + usage(basename(argv[0])); + return 1; + } + } + + if (!(xdp_flags & XDP_FLAGS_SKB_MODE)) + xdp_flags |= XDP_FLAGS_DRV_MODE; + + if (optind == argc) { + usage(basename(argv[0])); + return 1; + } + + if (attach) { + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); + attr.file = filename; + + if (access(filename, O_RDONLY) < 0) { + printf("error accessing file %s: %s\n", + filename, strerror(errno)); + return 1; + } + + obj = bpf_object__open_xattr(&attr); + if (libbpf_get_error(obj)) { + printf("cannot open xdp program: %s\n", strerror(errno)); + return 1; + } + + map = bpf_object__find_map_by_name(obj, "xdp_tx_ports"); + if (libbpf_get_error(map)) { + printf("map not found: %s\n", strerror(errno)); + goto err; + } + + err = do_reuse_map(map, pin_path, &pinned); + if (err) { + printf("error reusing map %s: %s\n", + bpf_map__name(map), strerror(errno)); + goto err; + } + + err = bpf_object__load(obj); + if (err) { + printf("cannot load xdp program: %s\n", strerror(errno)); + goto err; + } + + prog = bpf_object__find_program_by_title(obj, prog_name); + prog_fd = bpf_program__fd(prog); + if (prog_fd < 0) { + printf("program not found: %s\n", strerror(prog_fd)); + goto err; + } + + map_fd = bpf_map__fd(map); + if (map_fd < 0) { + printf("map not found: %s\n", strerror(map_fd)); + goto err; + } + + if (!pinned) { + err = bpf_map__pin(map, pin_path); + if (err) { + printf("failed to pin map: %s\n", strerror(errno)); + goto err; + } + } + } + + for (i = optind; i < argc; ++i) { + idx = if_nametoindex(argv[i]); + if (!idx) + idx = strtoul(argv[i], NULL, 0); + + if (!idx) { + fprintf(stderr, "Invalid arg\n"); + return 1; + } + if (attach) { + err = do_attach(idx, prog_fd, map_fd, argv[i]); + if (err) + ret = err; + } else { + err = do_detach(idx, argv[i]); + if (err) + ret = err; + } + } + + return ret; +err: + bpf_object__close(obj); + return 1; +} -- 2.20.1 (Apple Git-117)
Jesper Dangaard Brouer
2020-Jul-31 14:15 UTC
[Bridge] [RFC PATCH bpf-next 3/3] samples/bpf: Add a simple bridge example accelerated with XDP
I really appreciate that you are working on adding this helper. Some comments below. On Fri, 31 Jul 2020 13:44:20 +0900 Yoshiki Komachi <komachi.yoshiki at gmail.com> wrote:> diff --git a/samples/bpf/xdp_bridge_kern.c b/samples/bpf/xdp_bridge_kern.c > new file mode 100644 > index 000000000000..00f802503199 > --- /dev/null > +++ b/samples/bpf/xdp_bridge_kern.c > @@ -0,0 +1,129 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright (c) 2020 NTT Corp. All Rights Reserved. > + *[...]> + > +struct { > + __uint(type, BPF_MAP_TYPE_DEVMAP_HASH); > + __uint(key_size, sizeof(int)); > + __uint(value_size, sizeof(int)); > + __uint(max_entries, 64); > +} xdp_tx_ports SEC(".maps"); > + > +static __always_inline int xdp_bridge_proto(struct xdp_md *ctx, u16 br_vlan_proto) > +{ > + void *data_end = (void *)(long)ctx->data_end; > + void *data = (void *)(long)ctx->data; > + struct bpf_fdb_lookup fdb_lookup_params; > + struct vlan_hdr *vlan_hdr = NULL; > + struct ethhdr *eth = data; > + u16 h_proto; > + u64 nh_off; > + int rc; > + > + nh_off = sizeof(*eth); > + if (data + nh_off > data_end) > + return XDP_DROP; > + > + __builtin_memset(&fdb_lookup_params, 0, sizeof(fdb_lookup_params)); > + > + h_proto = eth->h_proto; > + > + if (unlikely(ntohs(h_proto) < ETH_P_802_3_MIN)) > + return XDP_PASS; > + > + /* Handle VLAN tagged packet */ > + if (h_proto == br_vlan_proto) { > + vlan_hdr = (void *)eth + nh_off; > + nh_off += sizeof(*vlan_hdr); > + if ((void *)eth + nh_off > data_end) > + return XDP_PASS; > + > + fdb_lookup_params.vlan_id = ntohs(vlan_hdr->h_vlan_TCI) & > + VLAN_VID_MASK; > + } > + > + /* FIXME: Although Linux bridge provides us with vlan filtering (contains > + * PVID) at ingress, the feature is currently unsupported in this XDP program. > + * > + * Two ideas to realize the vlan filtering are below: > + * 1. usespace daemon monitors bridge vlan events and notifies XDP programs^^ Typo: usespace -> userspace> + * of them through BPF maps > + * 2. introduce another bpf helper to retrieve bridge vlan informationThe comment appears two times time this file.> + * > + * > + * FIXME: After the vlan filtering, learning feature is required here, but > + * it is currently unsupported as well. If another bpf helper for learning > + * is accepted, the processing could be implemented in the future. > + */ > + > + memcpy(&fdb_lookup_params.addr, eth->h_dest, ETH_ALEN); > + > + /* Note: This program definitely takes ifindex of ingress interface as > + * a bridge port. Linux networking devices can be stacked and physical > + * interfaces are not necessarily slaves of bridges (e.g., bonding or > + * vlan devices can be slaves of bridges), but stacked bridge ports are > + * currently unsupported in this program. In such cases, XDP programs > + * should be attached to a lower device in order to process packets with > + * higher speed. Then, a new bpf helper to find upper devices will be > + * required here in the future because they will be registered on FDB > + * in the kernel. > + */ > + fdb_lookup_params.ifindex = ctx->ingress_ifindex; > + > + rc = bpf_fdb_lookup(ctx, &fdb_lookup_params, sizeof(fdb_lookup_params), 0); > + if (rc != BPF_FDB_LKUP_RET_SUCCESS) { > + /* In cases of flooding, XDP_PASS will be returned here */ > + return XDP_PASS; > + } > + > + /* FIXME: Although Linux bridge provides us with vlan filtering (contains > + * untagged policy) at egress as well, the feature is currently unsupported > + * in this XDP program. > + * > + * Two ideas to realize the vlan filtering are below: > + * 1. usespace daemon monitors bridge vlan events and notifies XDP programs > + * of them through BPF maps > + * 2. introduce another bpf helper to retrieve bridge vlan information > + */(2nd time the comment appears)> +A comment about below bpf_redirect_map() would be good. Explaining that we depend on fallback behavior, to let normal bridge code handle other cases (e.g. flood/broadcast). And also that if lookup fails, XDP_PASS/fallback also happens.> + return bpf_redirect_map(&xdp_tx_ports, fdb_lookup_params.ifindex, XDP_PASS); > +} > + > +SEC("xdp_bridge") > +int xdp_bridge_prog(struct xdp_md *ctx) > +{ > + return xdp_bridge_proto(ctx, 0); > +} > + > +SEC("xdp_8021q_bridge") > +int xdp_8021q_bridge_prog(struct xdp_md *ctx) > +{ > + return xdp_bridge_proto(ctx, htons(ETH_P_8021Q)); > +} > + > +SEC("xdp_8021ad_bridge") > +int xdp_8021ad_bridge_prog(struct xdp_md *ctx) > +{ > + return xdp_bridge_proto(ctx, htons(ETH_P_8021AD)); > +} > + > +char _license[] SEC("license") = "GPL";-- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
David Ahern
2020-Jul-31 17:15 UTC
[Bridge] [RFC PATCH bpf-next 2/3] bpf: Add helper to do forwarding lookups in kernel FDB table
On 7/30/20 10:44 PM, Yoshiki Komachi wrote:> diff --git a/net/core/filter.c b/net/core/filter.c > index 654c346b7d91..68800d1b8cd5 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5084,6 +5085,46 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = { > .arg4_type = ARG_ANYTHING, > }; > > +#if IS_ENABLED(CONFIG_BRIDGE) > +BPF_CALL_4(bpf_xdp_fdb_lookup, struct xdp_buff *, ctx, > + struct bpf_fdb_lookup *, params, int, plen, u32, flags) > +{ > + struct net_device *src, *dst; > + struct net *net; > + > + if (plen < sizeof(*params)) > + return -EINVAL;I need to look at the details more closely, but on first reading 2 things caught me eye: 1. you need to make sure flags is 0 since there are no supported flags at the moment, and> + > + net = dev_net(ctx->rxq->dev); > + > + if (is_multicast_ether_addr(params->addr) || > + is_broadcast_ether_addr(params->addr)) > + return BPF_FDB_LKUP_RET_NOENT; > + > + src = dev_get_by_index_rcu(net, params->ifindex); > + if (unlikely(!src)) > + return -ENODEV; > + > + dst = br_fdb_find_port_xdp(src, params->addr, params->vlan_id);2. this needs to be done via netdev ops to avoid referencing bridge code which can be compiled as a module. I suspect the build robots will id this part soon.
Daniel Borkmann
2020-Jul-31 21:12 UTC
[Bridge] [RFC PATCH bpf-next 2/3] bpf: Add helper to do forwarding lookups in kernel FDB table
On 7/31/20 6:44 AM, Yoshiki Komachi wrote:> This patch adds a new bpf helper to access FDB in the kernel tables > from XDP programs. The helper enables us to find the destination port > of master bridge in XDP layer with high speed. If an entry in the > tables is successfully found, egress device index will be returned. > > In cases of failure, packets will be dropped or forwarded to upper > networking stack in the kernel by XDP programs. Multicast and broadcast > packets are currently not supported. Thus, these will need to be > passed to upper layer on the basis of XDP_PASS action. > > The API uses destination MAC and VLAN ID as keys, so XDP programs > need to extract these from forwarded packets. > > Signed-off-by: Yoshiki Komachi <komachi.yoshiki at gmail.com>Few initial comments below:> --- > include/uapi/linux/bpf.h | 28 +++++++++++++++++++++ > net/core/filter.c | 45 ++++++++++++++++++++++++++++++++++ > scripts/bpf_helpers_doc.py | 1 + > tools/include/uapi/linux/bpf.h | 28 +++++++++++++++++++++ > 4 files changed, 102 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 54d0c886e3ba..f2e729dd1721 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2149,6 +2149,22 @@ union bpf_attr { > * * > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the > * packet is not forwarded or needs assist from full stack > * > + * long bpf_fdb_lookup(void *ctx, struct bpf_fdb_lookup *params, int plen, u32 flags) > + * Description > + * Do FDB lookup in kernel tables using parameters in *params*. > + * If lookup is successful (ie., FDB lookup finds a destination entry), > + * ifindex is set to the egress device index from the FDB lookup. > + * Both multicast and broadcast packets are currently unsupported > + * in XDP layer. > + * > + * *plen* argument is the size of the passed **struct bpf_fdb_lookup**. > + * *ctx* is only **struct xdp_md** for XDP programs. > + * > + * Return > + * * < 0 if any input argument is invalid > + * * 0 on success (destination port is found) > + * * > 0 on failure (there is no entry) > + * > * long bpf_sock_hash_update(struct bpf_sock_ops *skops, struct bpf_map *map, void *key, u64 flags) > * Description > * Add an entry to, or update a sockhash *map* referencing sockets. > @@ -3449,6 +3465,7 @@ union bpf_attr { > FN(get_stack), \ > FN(skb_load_bytes_relative), \ > FN(fib_lookup), \ > + FN(fdb_lookup), \This breaks UAPI. Needs to be added to the very end of the list.> FN(sock_hash_update), \ > FN(msg_redirect_hash), \ > FN(sk_redirect_hash), \ > @@ -4328,6 +4345,17 @@ struct bpf_fib_lookup { > __u8 dmac[6]; /* ETH_ALEN */ > }; > > +enum { > + BPF_FDB_LKUP_RET_SUCCESS, /* lookup successful */ > + BPF_FDB_LKUP_RET_NOENT, /* entry is not found */ > +}; > + > +struct bpf_fdb_lookup { > + unsigned char addr[6]; /* ETH_ALEN */ > + __u16 vlan_id; > + __u32 ifindex; > +}; > + > enum bpf_task_fd_type { > BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */ > BPF_FD_TYPE_TRACEPOINT, /* tp name */ > diff --git a/net/core/filter.c b/net/core/filter.c > index 654c346b7d91..68800d1b8cd5 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -45,6 +45,7 @@ > #include <linux/filter.h> > #include <linux/ratelimit.h> > #include <linux/seccomp.h> > +#include <linux/if_bridge.h> > #include <linux/if_vlan.h> > #include <linux/bpf.h> > #include <linux/btf.h> > @@ -5084,6 +5085,46 @@ static const struct bpf_func_proto bpf_skb_fib_lookup_proto = { > .arg4_type = ARG_ANYTHING, > }; > > +#if IS_ENABLED(CONFIG_BRIDGE) > +BPF_CALL_4(bpf_xdp_fdb_lookup, struct xdp_buff *, ctx, > + struct bpf_fdb_lookup *, params, int, plen, u32, flags) > +{ > + struct net_device *src, *dst; > + struct net *net; > + > + if (plen < sizeof(*params)) > + return -EINVAL;Given flags are not used, this needs to reject anything invalid otherwise you're not able to extend it in future.> + net = dev_net(ctx->rxq->dev); > + > + if (is_multicast_ether_addr(params->addr) || > + is_broadcast_ether_addr(params->addr)) > + return BPF_FDB_LKUP_RET_NOENT; > + > + src = dev_get_by_index_rcu(net, params->ifindex); > + if (unlikely(!src)) > + return -ENODEV; > + > + dst = br_fdb_find_port_xdp(src, params->addr, params->vlan_id); > + if (dst) { > + params->ifindex = dst->ifindex; > + return BPF_FDB_LKUP_RET_SUCCESS; > + }Currently the helper description says nothing that this is /only/ limited to bridges. I think it would be better to also name the helper bpf_br_fdb_lookup() as well if so to avoid any confusion.> + return BPF_FDB_LKUP_RET_NOENT; > +} > + > +static const struct bpf_func_proto bpf_xdp_fdb_lookup_proto = { > + .func = bpf_xdp_fdb_lookup, > + .gpl_only = true, > + .ret_type = RET_INTEGER, > + .arg1_type = ARG_PTR_TO_CTX, > + .arg2_type = ARG_PTR_TO_MEM, > + .arg3_type = ARG_CONST_SIZE, > + .arg4_type = ARG_ANYTHING, > +}; > +#endifThis should also have a tc pendant (similar as done in routing lookup helper) in case native XDP is not available. This will be useful for those that have the same code compilable for both tc/XDP.> #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) > static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len) > { > @@ -6477,6 +6518,10 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > return &bpf_xdp_adjust_tail_proto; > case BPF_FUNC_fib_lookup: > return &bpf_xdp_fib_lookup_proto; > +#if IS_ENABLED(CONFIG_BRIDGE) > + case BPF_FUNC_fdb_lookup: > + return &bpf_xdp_fdb_lookup_proto; > +#endif > #ifdef CONFIG_INET > case BPF_FUNC_sk_lookup_udp: > return &bpf_xdp_sk_lookup_udp_proto;
John Fastabend
2020-Jul-31 21:52 UTC
[Bridge] [RFC PATCH bpf-next 0/3] Add a new bpf helper for FDB lookup
Yoshiki Komachi wrote:> This series adds a new bpf helper for doing FDB lookup in the kernel > tables from XDP programs. This helps users to accelerate Linux bridge > with XDP. > > In the past, XDP generally required users to reimplement their own > networking functionalities with specific manners of BPF programming > by themselves, hindering?its?potential?uses. IMO, bpf helpers to > access networking stacks in kernel help to mitigate the programming > costs because users reuse mature Linux networking feature more easily. > > The previous commit 87f5fc7e48dd ("bpf: Provide helper to do forwarding > lookups in kernel FIB table") have already added a bpf helper for access > FIB in the kernel tables from XDP programs. As a next step, this series > introduces the API for FDB lookup. In the future, other bpf helpers for > learning and VLAN filtering will also be required in order to realize > fast XDP-based bridge although these are not included in this series.Just to clarify for myself. I expect that with just the helpers here we should only expect static configurations to work, e.g. any learning and/or aging is not likely to work if we do redirects in the XDP path. Then next to get a learning/filtering/aging we would need to have a set of bridge helpers to get that functionality as well? I believe I'm just repeating what you are saying above, but wanted to check. Then my next question is can we see some performance numbers? These things are always trade-off between performance and ease of use, but would be good to know roughly what we are looking at vs a native XDP bridge functionality. Do you have a use case for a static bridge setup? Nothing wrong to stage things IMO if the 'real' use case needs learning and filtering. I guess to get STP working you would minimally need learning and aging? Thanks, John
Yoshiki Komachi
2020-Aug-05 10:26 UTC
[Bridge] [RFC PATCH bpf-next 0/3] Add a new bpf helper for FDB lookup
Thanks for giving me a lot of comments! Find my response below, please.> 2020/08/01 6:52?John Fastabend <john.fastabend at gmail.com>????: > > Yoshiki Komachi wrote: >> This series adds a new bpf helper for doing FDB lookup in the kernel >> tables from XDP programs. This helps users to accelerate Linux bridge >> with XDP. >> >> In the past, XDP generally required users to reimplement their own >> networking functionalities with specific manners of BPF programming >> by themselves, hindering its potential uses. IMO, bpf helpers to >> access networking stacks in kernel help to mitigate the programming >> costs because users reuse mature Linux networking feature more easily. >> >> The previous commit 87f5fc7e48dd ("bpf: Provide helper to do forwarding >> lookups in kernel FIB table") have already added a bpf helper for access >> FIB in the kernel tables from XDP programs. As a next step, this series >> introduces the API for FDB lookup. In the future, other bpf helpers for >> learning and VLAN filtering will also be required in order to realize >> fast XDP-based bridge although these are not included in this series. > > Just to clarify for myself. I expect that with just the helpers here > we should only expect static configurations to work, e.g. any learning > and/or aging is not likely to work if we do redirects in the XDP path.As you described above, learning and aging don?t work at this point. IMO, another helper for learning will be required to fill the requirements. I guess that the helper will enable us to use the aging feature as well because the aging is the functionality of bridge fdb.> Then next to get a learning/filtering/aging we would need to have a > set of bridge helpers to get that functionality as well? I believe > I'm just repeating what you are saying above, but wanted to check.As for the vlan filtering, I think it doesn't necessarily have to be like that. I have the following ideas to achieve it for now: 1. Monitoring vlan events in bridges by a userspace daemon and it notifies XDP programs of the events through BPF maps 2. Another bpf helper to retrieve bridge vlan information The additional helper will be required only if the 2nd one is accepted. I would like to discuss which is better because there are pros and cons. On the other hand, the helper for the learning feature should be added, IMO. But, I guess that the learning feature is just sufficient to get the aging feature because bridges with learning have capability for aging as well.> Then my next question is can we see some performance numbers? These > things are always trade-off between performance and ease of > use, but would be good to know roughly what we are looking at vs > a native XDP bridge functionality.Sorry, I have not measured the performance numbers yet, so I will try it later.> Do you have a use case for a static bridge setup? Nothing wrong to > stage things IMO if the 'real' use case needs learning and filtering.For example, it is useful in libvirt with macTableManager. This feature makes it possible for static bridges to process packets faster than other ones with learning. However, it doesn't work properly if the vlan filtering is not enabled.> I guess to get STP working you would minimally need learning and > aging?I guess that STP seems not to be related to learning and aging, but there may be the following requirements if it is added in the future: 1. BPDU frames are transferred to normal bridges by the XDP_PASS action 2. closing targeted ports based on the STP configurations To meet the 2nd one, another bpf helper may be required. There is a possibility that bpf maps help to achieve this as another approach. Thanks & Best regards,> Thanks, > John? Yoshiki Komachi komachi.yoshiki at gmail.com