Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 0/7] Support for virtio-net hash reporting
Existing TUN module is able to use provided "steering eBPF" to calculate per-packet hash and derive the destination queue to place the packet to. The eBPF uses mapped configuration data containing a key for hash calculation and indirection table with array of queues' indices. This series of patches adds support for virtio-net hash reporting feature as defined in virtio specification. It extends the TUN module and the "steering eBPF" as follows: Extended steering eBPF calculates the hash value and hash type, keeps hash value in the skb->hash and returns index of destination virtqueue and the type of the hash. TUN module keeps returned hash type in (currently unused) field of the skb. skb->__unused renamed to 'hash_report_type'. When TUN module is called later to allocate and fill the virtio-net header and push it to destination virtqueue it populates the hash and the hash type into virtio-net header. VHOST driver is made aware of respective virtio-net feature that extends the virtio-net header to report the hash value and hash report type. Yuri Benditovich (7): skbuff: define field for hash report type vhost: support for hash report virtio-net feature tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy tun: add ioctl code TUNSETHASHPOPULATION tun: populate hash in virtio-net header when needed tun: report new tun feature IFF_HASH drivers/net/tun.c | 43 +++++++++++++++++++++++++++++++------ drivers/vhost/net.c | 37 ++++++++++++++++++++++++------- include/linux/skbuff.h | 7 +++++- include/uapi/linux/if_tun.h | 2 ++ 4 files changed, 74 insertions(+), 15 deletions(-) -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 1/7] skbuff: define field for hash report type
Used by virtio-net receive side scaling Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- include/linux/skbuff.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 416bf95cd5f2..36cf40ec0259 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -510,7 +510,7 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, * the end of the header data, ie. at skb->end. */ struct skb_shared_info { - __u8 __unused; + __u8 hash_report_type; /* virtio-net rss */ __u8 meta_len; __u8 nr_frags; __u8 tx_flags; @@ -1430,6 +1430,11 @@ static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb) return &skb_shinfo(skb)->hwtstamps; } +static inline __u8 *skb_hash_report_type(struct sk_buff *skb) +{ + return &skb_shinfo(skb)->hash_report_type; +} + static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb) { bool is_zcopy = skb && skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY; -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 2/7] vhost: support for hash report virtio-net feature
According to the virtio specification if VIRTIO_NET_F_HASH_REPORT feature acked the virtio-net header is extended to hold the hash value and hash report type. Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- drivers/vhost/net.c | 37 +++++++++++++++++++++++++++++-------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 531a00d703cd..31a894b9a992 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -73,7 +73,8 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | - (1ULL << VIRTIO_F_ACCESS_PLATFORM) + (1ULL << VIRTIO_F_ACCESS_PLATFORM) | + (1ULL << VIRTIO_NET_F_HASH_REPORT) }; enum { @@ -1108,14 +1109,16 @@ static void handle_rx(struct vhost_net *net) .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr hdr = { - .flags = 0, - .gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr_v1_hash hdrv1 = { + { + .flags = 0, + .gso_type = VIRTIO_NET_HDR_GSO_NONE + } }; size_t total_len = 0; int err, mergeable; s16 headcount; - size_t vhost_hlen, sock_hlen; + size_t vhost_hlen, sock_hlen, extra_hlen; size_t vhost_len, sock_len; bool busyloop_intr = false; struct socket *sock; @@ -1137,9 +1140,12 @@ static void handle_rx(struct vhost_net *net) vhost_hlen = nvq->vhost_hlen; sock_hlen = nvq->sock_hlen; + vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ? vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF); + extra_hlen = vhost_has_feature(vq, VIRTIO_NET_F_HASH_REPORT) ? + sizeof(hdrv1) - sizeof(hdrv1.hdr) : 0; do { sock_len = vhost_net_rx_peek_head_len(net, sock->sk, @@ -1201,8 +1207,8 @@ static void handle_rx(struct vhost_net *net) } /* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */ if (unlikely(vhost_hlen)) { - if (copy_to_iter(&hdr, sizeof(hdr), - &fixup) != sizeof(hdr)) { + if (copy_to_iter(&hdrv1, sizeof(struct virtio_net_hdr), + &fixup) != sizeof(struct virtio_net_hdr)) { vq_err(vq, "Unable to write vnet_hdr " "at addr %p\n", vq->iov->iov_base); goto out; @@ -1211,7 +1217,7 @@ static void handle_rx(struct vhost_net *net) /* Header came from socket; we'll need to patch * ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF */ - iov_iter_advance(&fixup, sizeof(hdr)); + iov_iter_advance(&fixup, sizeof(struct virtio_net_hdr)); } /* TODO: Should check and handle checksum. */ @@ -1223,6 +1229,18 @@ static void handle_rx(struct vhost_net *net) vhost_discard_vq_desc(vq, headcount); goto out; } + if (unlikely(extra_hlen)) { + if (unlikely(vhost_hlen)) { + if (copy_to_iter(&hdrv1.hash_value, extra_hlen, + &fixup) != extra_hlen) { + vq_err(vq, "Unable to write extra_hdr " + "at addr %p\n", vq->iov->iov_base); + goto out; + } + } else { + iov_iter_advance(&fixup, extra_hlen); + } + } nvq->done_idx += headcount; if (nvq->done_idx > VHOST_NET_BATCH) vhost_net_signal_used(nvq); @@ -1624,6 +1642,9 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) (1ULL << VIRTIO_F_VERSION_1))) ? sizeof(struct virtio_net_hdr_mrg_rxbuf) : sizeof(struct virtio_net_hdr); + if (features & (1ULL << VIRTIO_NET_F_HASH_REPORT)) { + hdr_len = sizeof(struct virtio_net_hdr_v1_hash); + } if (features & (1 << VHOST_NET_F_VIRTIO_NET_HDR)) { /* vhost provides vnet_hdr */ vhost_hlen = hdr_len; -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
This program type can set skb hash value. It will be useful when the tun will support hash reporting feature if virtio-net. Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- drivers/net/tun.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 7959b5c2d11f..455f7afc1f36 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p, prog = NULL; } else { prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER); + if (IS_ERR(prog)) + prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS); if (IS_ERR(prog)) return PTR_ERR(prog); } -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy
The module never creates the bpf program with bpf_prog_create so it shouldn't free it with bpf_prog_destroy. The program is obtained by bpf_prog_get and should be freed by bpf_prog_put. For BPF_PROG_TYPE_SOCKET_FILTER both methods do the same but for other program types they don't. Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- drivers/net/tun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 455f7afc1f36..18c1baf1a6c1 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2218,7 +2218,7 @@ static void tun_prog_free(struct rcu_head *rcu) { struct tun_prog *prog = container_of(rcu, struct tun_prog, rcu); - bpf_prog_destroy(prog->prog); + bpf_prog_put(prog->prog); kfree(prog); } -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION
User mode program calls this ioctl before loading of BPF program to inform the tun that the BPF program has extended functionality, i.e. sets hash value and returns the virtqueue number in the lower 16 bits and the type of the hash report in the upper 16 bits. Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- drivers/net/tun.c | 12 +++++++++++- include/uapi/linux/if_tun.h | 1 + 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 18c1baf1a6c1..45f4f04a4a3e 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -197,6 +197,7 @@ struct tun_struct { struct sock_fprog fprog; /* protected by rtnl lock */ bool filter_attached; + bool bpf_populates_hash; u32 msg_enable; spinlock_t lock; struct hlist_head flows[TUN_NUM_FLOW_ENTRIES]; @@ -2765,6 +2766,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) tun->align = NET_SKB_PAD; tun->filter_attached = false; + tun->bpf_populates_hash = false; tun->sndbuf = tfile->socket.sk->sk_sndbuf; tun->rx_batched = 0; RCU_INIT_POINTER(tun->steering_prog, NULL); @@ -2997,7 +2999,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, struct net *net = sock_net(&tfile->sk); struct tun_struct *tun; void __user* argp = (void __user*)arg; - unsigned int ifindex, carrier; + unsigned int ifindex, carrier, populate_hash; struct ifreq ifr; kuid_t owner; kgid_t group; @@ -3298,6 +3300,14 @@ static long __tun_chr_ioctl(struct file *file, unsigned int cmd, ret = open_related_ns(&net->ns, get_net_ns); break; + case TUNSETHASHPOPULATION: + ret = -EFAULT; + if (copy_from_user(&populate_hash, argp, sizeof(populate_hash))) + goto unlock; + tun->bpf_populates_hash = !!populate_hash; + ret = 0; + break; + default: ret = -EINVAL; break; diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index 454ae31b93c7..0fd43533da26 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -61,6 +61,7 @@ #define TUNSETFILTEREBPF _IOR('T', 225, int) #define TUNSETCARRIER _IOW('T', 226, int) #define TUNGETDEVNETNS _IO('T', 227) +#define TUNSETHASHPOPULATION _IOR('T', 228, int) /* TUNSETIFF ifr flags */ #define IFF_TUN 0x0001 -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 6/7] tun: populate hash in virtio-net header when needed
If the BPF program populated the hash in the skb the tun propagates the hash value and hash report type to the respective fields of virtio-net header. Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- drivers/net/tun.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 45f4f04a4a3e..214feb0b16fb 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -556,15 +556,20 @@ static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) { struct tun_prog *prog; u32 numqueues; - u16 ret = 0; + u32 ret = 0; numqueues = READ_ONCE(tun->numqueues); if (!numqueues) return 0; prog = rcu_dereference(tun->steering_prog); - if (prog) + if (prog) { ret = bpf_prog_run_clear_cb(prog->prog, skb); + if (tun->bpf_populates_hash) { + *skb_hash_report_type(skb) = (__u8)(ret >> 16); + ret &= 0xffff; + } + } return ret % numqueues; } @@ -2062,6 +2067,7 @@ static ssize_t tun_put_user(struct tun_struct *tun, if (vnet_hdr_sz) { struct virtio_net_hdr gso; + __u16 extra_copy = 0; if (iov_iter_count(iter) < vnet_hdr_sz) return -EINVAL; @@ -2085,7 +2091,20 @@ static ssize_t tun_put_user(struct tun_struct *tun, if (copy_to_iter(&gso, sizeof(gso), iter) != sizeof(gso)) return -EFAULT; - iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso)); + if (tun->bpf_populates_hash && + vnet_hdr_sz >= sizeof(struct virtio_net_hdr_v1_hash)) { + struct virtio_net_hdr_v1_hash hdr; + + hdr.hdr.num_buffers = 0; + hdr.hash_value = cpu_to_le32(skb_get_hash(skb)); + hdr.hash_report = cpu_to_le16(*skb_hash_report_type(skb)); + hdr.padding = 0; + extra_copy = sizeof(hdr) - sizeof(gso); + if (copy_to_iter(&hdr.hdr.num_buffers, extra_copy, iter) != extra_copy) + return -EFAULT; + } + + iov_iter_advance(iter, vnet_hdr_sz - sizeof(gso) - extra_copy); } if (vlan_hlen) { -- 2.17.1
Yuri Benditovich
2021-Jan-12 19:41 UTC
[RFC PATCH 7/7] tun: report new tun feature IFF_HASH
IFF_HASH feature indicates that the tun supports TUNSETHASHPOPULATION ioctl and can propagate the hash data to the virtio-net packet. Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> --- drivers/net/tun.c | 2 +- include/uapi/linux/if_tun.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 214feb0b16fb..b46aa8941a9d 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -88,7 +88,7 @@ static void tun_default_link_ksettings(struct net_device *dev, #define TUN_VNET_LE 0x80000000 #define TUN_VNET_BE 0x40000000 -#define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \ +#define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | IFF_HASH |\ IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS) #define GOODCOPY_LEN 128 diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index 0fd43533da26..116b84ede3a0 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -73,6 +73,7 @@ #define IFF_ONE_QUEUE 0x2000 #define IFF_VNET_HDR 0x4000 #define IFF_TUN_EXCL 0x8000 +#define IFF_HASH 0x0080 #define IFF_MULTI_QUEUE 0x0100 #define IFF_ATTACH_QUEUE 0x0200 #define IFF_DETACH_QUEUE 0x0400 -- 2.17.1
Alexei Starovoitov
2021-Jan-12 19:46 UTC
[RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type
On Tue, Jan 12, 2021 at 11:42 AM Yuri Benditovich <yuri.benditovich at daynix.com> wrote:> > This program type can set skb hash value. It will be useful > when the tun will support hash reporting feature if virtio-net. > > Signed-off-by: Yuri Benditovich <yuri.benditovich at daynix.com> > --- > drivers/net/tun.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index 7959b5c2d11f..455f7afc1f36 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -2981,6 +2981,8 @@ static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog __rcu **prog_p, > prog = NULL; > } else { > prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SOCKET_FILTER); > + if (IS_ERR(prog)) > + prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_SCHED_CLS);You've ignored the feedback and just resend? what for?
Yuri Benditovich
2021-Jan-12 19:49 UTC
[RFC PATCH 0/7] Support for virtio-net hash reporting
On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich <yuri.benditovich at daynix.com> wrote:> > Existing TUN module is able to use provided "steering eBPF" to > calculate per-packet hash and derive the destination queue to > place the packet to. The eBPF uses mapped configuration data > containing a key for hash calculation and indirection table > with array of queues' indices. > > This series of patches adds support for virtio-net hash reporting > feature as defined in virtio specification. It extends the TUN module > and the "steering eBPF" as follows: > > Extended steering eBPF calculates the hash value and hash type, keeps > hash value in the skb->hash and returns index of destination virtqueue > and the type of the hash. TUN module keeps returned hash type in > (currently unused) field of the skb. > skb->__unused renamed to 'hash_report_type'. > > When TUN module is called later to allocate and fill the virtio-net > header and push it to destination virtqueue it populates the hash > and the hash type into virtio-net header. > > VHOST driver is made aware of respective virtio-net feature that > extends the virtio-net header to report the hash value and hash report > type.Comment from Willem de Bruijn: Skbuff fields are in short supply. I don't think we need to add one just for this narrow path entirely internal to the tun device. Instead, you could just run the flow_dissector in tun_put_user if the feature is negotiated. Indeed, the flow dissector seems more apt to me than BPF here. Note that the flow dissector internally can be overridden by a BPF program if the admin so chooses. This also hits on a deeper point with the choice of hash values, that I also noticed in my RFC patchset to implement the inverse [1][2]. It is much more detailed than skb->hash + skb->l4_hash currently offers, and that can be gotten for free from most hardware. In most practical cases, that information suffices. I added less specific fields VIRTIO_NET_HASH_REPORT_L4, VIRTIO_NET_HASH_REPORT_OTHER that work without explicit flow dissection. I understand that the existing fields are part of the standard. Just curious, what is their purpose beyond 4-tuple based flow hashing? [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=406859&state=* [2] https://github.com/wdebruij/linux/commit/0f77febf22cd6ffc242a575807fa8382a26e511e> > Yuri Benditovich (7): > skbuff: define field for hash report type > vhost: support for hash report virtio-net feature > tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type > tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy > tun: add ioctl code TUNSETHASHPOPULATION > tun: populate hash in virtio-net header when needed > tun: report new tun feature IFF_HASH > > drivers/net/tun.c | 43 +++++++++++++++++++++++++++++++------ > drivers/vhost/net.c | 37 ++++++++++++++++++++++++------- > include/linux/skbuff.h | 7 +++++- > include/uapi/linux/if_tun.h | 2 ++ > 4 files changed, 74 insertions(+), 15 deletions(-) > > -- > 2.17.1 >