Xuan Zhuo wrote:> This patch is used to construct skb based on page to save memory copy
> overhead.
>
> This has one problem:
>
> We construct the skb by fill the data page as a frag into the skb. In
> this way, the linear space is empty, and the header information is also
> in the frag, not in the linear space, which is not allowed for some
> network cards. For example, Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx] will get the following error message:
>
> mlx5_core 0000:3b:00.1 eth1: Error cqe on cqn 0x817, ci 0x8, qn 0x1dbb,
opcode 0xd, syndrome 0x1, vendor syndrome 0x68
> 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000030: 00 00 00 00 60 10 68 01 0a 00 1d bb 00 0f 9f d2
> WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0xf, len: 64
> 00000000: 00 00 0f 0a 00 1d bb 03 00 00 00 08 00 00 00 00
> 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00000020: 00 00 00 2b 00 08 00 00 00 00 00 05 9e e3 08 00
> 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> mlx5_core 0000:3b:00.1 eth1: ERR CQE on SQ: 0x1dbb
>
> I also tried to use build_skb to construct skb, but because of the
> existence of skb_shinfo, it must be behind the linear space, so this
> method is not working. We can't put skb_shinfo on desc->addr, it
will be
> exposed to users, this is not safe.
>
> Finally, I added a feature NETIF_F_SKB_NO_LINEAR to identify whether the
> network card supports the header information of the packet in the frag
> and not in the linear space.
>
> ---------------- Performance Testing ------------
>
> The test environment is Aliyun ECS server.
> Test cmd:
> ```
> xdpsock -i eth0 -t -S -s <msg size>
> ```
>
> Test result data:
>
> size 64 512 1024 1500
> copy 1916747 1775988 1600203 1440054
> page 1974058 1953655 1945463 1904478
> percent 3.0% 10.0% 21.58% 32.3%
Looks like a good perf bump. Some easy suggestions below
> +static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> + struct xdp_desc *desc, int *err)
> +{
Passing a 'int *err' here is ugly IMO use the ERR_PTR/PTR_ERR macros
and roll it into the return value.
or maybe use the out: pattern used in the kernel, but just doing direct
returns like now but with ERR_PTR() would also be fine.
> + struct sk_buff *skb ;
struct sk_buff *skb = NULL;
err = -ENOMEM;> +
> + if (xs->dev->features & NETIF_F_SKB_NO_LINEAR) {
> + skb = xsk_build_skb_zerocopy(xs, desc);
> + if (unlikely(!skb)) {
goto out
> + *err = -ENOMEM;
> + return NULL;
> + }
> + } else {
> + char *buffer;
> + u64 addr;
> + u32 len;
> + int err;
> +
> + len = desc->len;
> + skb = sock_alloc_send_skb(&xs->sk, len, 1, &err);
> + if (unlikely(!skb)) {
goto out;> + *err = -ENOMEM;
> + return NULL;
> + }
> +
> + skb_put(skb, len);
> + addr = desc->addr;
> + buffer = xsk_buff_raw_get_data(xs->pool, desc->addr);
> + err = skb_store_bits(skb, 0, buffer, len);
> +
> + if (unlikely(err)) {
> + kfree_skb(skb);
err = -EINVAL;
goto out
> + *err = -EINVAL;
> + return NULL;
> + }
> + }
> +
> + skb->dev = xs->dev;
> + skb->priority = xs->sk.sk_priority;
> + skb->mark = xs->sk.sk_mark;
> + skb_shinfo(skb)->destructor_arg = (void *)(long)desc->addr;
> + skb->destructor = xsk_destruct_skb;
> +
> + return skb;
out:
kfree_skb(skb)
return ERR_PTR(err);
> +}
> +
Otherwise looks good thanks.