## About DMA APIs Now, virtio may can not work with DMA APIs when virtio features do not have VIRTIO_F_ACCESS_PLATFORM. 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just work with the "real" devices. 2. I tried to let xsk support callballs to get phy address from virtio-net driver as the dma address. But the maintainers of xsk may want to use dma-buf to replace the DMA APIs. I think that may be a larger effort. We will wait too long. So rethinking this, firstly, we can support premapped-dma only for devices with VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it, they have to update the device to support VIRTIO_F_RING_RESET, and they can also enable the device's VIRTIO_F_ACCESS_PLATFORM feature. Thanks for the help from Christoph. ================ XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero copy feature of xsk (XDP socket) needs to be supported by the driver. The performance of zero copy is very good. ENV: Qemu with vhost. vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS -----------------------------|---------------|------------------|------------ xmit by sockperf: 90% | 100% | | 318967 xmit by xsk: 100% | 30% | 33% | 1192064 recv by sockperf: 100% | 68% | 100% | 692288 recv by xsk: 100% | 33% | 43% | 771670 Before achieving the function of Virtio-Net, we also have to let virtio core support these features: 1. virtio core support premapped 2. virtio core support reset per-queue 3. introduce DMA APIs to virtio core Please review. Thanks. v10: 1. support to set vq to premapped mode, then the vq just handles the premapped request. 2. virtio-net support to do dma mapping in advance v9: 1. use flag to distinguish the premapped operations. no do judgment by sg. v8: 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address 2. remove unused code from vring_map_one_sg() v7: 1. virtqueue_dma_dev() return NULL when virtio is without DMA API. v6: 1. change the size of the flags to u32. v5: 1. fix for error handler 2. add flags to record internal dma mapping v4: 1. rename map_inter to dma_map_internal 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev' v3: 1. add map_inter to struct desc state to reocrd whether virtio core do dma map v2: 1. based on sgs[0]->dma_address to judgment is premapped 2. based on extra.addr to judgment to do unmap for no-indirect desc 3. based on indir_desc to judgment to do unmap for indirect desc 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev v1: 1. expose dma device. NO introduce the api for dma and sync 2. split some commit for review. Xuan Zhuo (10): virtio_ring: put mapping error check in vring_map_one_sg virtio_ring: introduce virtqueue_set_premapped() virtio_ring: split: support add premapped buf virtio_ring: packed: support add premapped buf virtio_ring: split-detach: support return dma info to driver virtio_ring: packed-detach: support return dma info to driver virtio_ring: introduce helpers for premapped virtio_ring: introduce virtqueue_dma_dev() virtio_ring: introduce virtqueue_add_sg() virtio_net: support dma premapped drivers/net/virtio_net.c | 163 ++++++++++-- drivers/virtio/virtio_ring.c | 493 +++++++++++++++++++++++++++++++---- include/linux/virtio.h | 34 +++ 3 files changed, 612 insertions(+), 78 deletions(-) -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:21 UTC
[PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg
This patch put the dma addr error check in vring_map_one_sg(). The benefits of doing this: 1. reduce one judgment of vq->use_dma_api. 2. make vring_map_one_sg more simple, without calling vring_mapping_error to check the return value. simplifies subsequent code Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++--------------- 1 file changed, 22 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index c5310eaf8b46..72ed07a604d4 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq) } /* Map one sg entry. */ -static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, - struct scatterlist *sg, - enum dma_data_direction direction) +static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg, + enum dma_data_direction direction, dma_addr_t *addr) { if (!vq->use_dma_api) { /* @@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, * depending on the direction. */ kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction); - return (dma_addr_t)sg_phys(sg); + *addr = (dma_addr_t)sg_phys(sg); + return 0; } /* @@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, * the way it expects (we don't guarantee that the scatterlist * will exist for the lifetime of the mapping). */ - return dma_map_page(vring_dma_dev(vq), + *addr = dma_map_page(vring_dma_dev(vq), sg_page(sg), sg->offset, sg->length, direction); + + if (dma_mapping_error(vring_dma_dev(vq), *addr)) + return -ENOMEM; + + return 0; } static dma_addr_t vring_map_single(const struct vring_virtqueue *vq, @@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, for (n = 0; n < out_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { - dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE); - if (vring_mapping_error(vq, addr)) + dma_addr_t addr; + + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr)) goto unmap_release; prev = i; @@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, } for (; n < (out_sgs + in_sgs); n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { - dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE); - if (vring_mapping_error(vq, addr)) + dma_addr_t addr; + + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr)) goto unmap_release; prev = i; @@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, for (n = 0; n < out_sgs + in_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { - addr = vring_map_one_sg(vq, sg, n < out_sgs ? - DMA_TO_DEVICE : DMA_FROM_DEVICE); - if (vring_mapping_error(vq, addr)) + if (vring_map_one_sg(vq, sg, n < out_sgs ? + DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr)) goto unmap_release; desc[i].flags = cpu_to_le16(n < out_sgs ? @@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, c = 0; for (n = 0; n < out_sgs + in_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { - dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ? - DMA_TO_DEVICE : DMA_FROM_DEVICE); - if (vring_mapping_error(vq, addr)) + dma_addr_t addr; + + if (vring_map_one_sg(vq, sg, n < out_sgs ? + DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr)) goto unmap_release; flags = cpu_to_le16(vq->packed.avail_used_flags | -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:21 UTC
[PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
This helper allows the driver change the dma mode to premapped mode. Under the premapped mode, the virtio core do not do dma mapping internally. This just work when the use_dma_api is true. If the use_dma_api is false, the dma options is not through the DMA APIs, that is not the standard way of the linux kernel. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++ include/linux/virtio.h | 2 ++ 2 files changed, 42 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 72ed07a604d4..2afdfb9e3e30 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -172,6 +172,9 @@ struct vring_virtqueue { /* Host publishes avail event idx */ bool event; + /* Do DMA mapping by driver */ + bool premapped; + /* Head of free buffer list. */ unsigned int free_head; /* Number we've added since last sync. */ @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->packed_ring = true; vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); + vq->premapped = false; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index, #endif vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); + vq->premapped = false; vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num, } EXPORT_SYMBOL_GPL(virtqueue_resize); +/** + * virtqueue_set_premapped - set the vring premapped mode + * @_vq: the struct virtqueue we're talking about. + * + * Enable the premapped mode of the vq. + * + * The vring in premapped mode does not do dma internally, so the driver must + * do dma mapping in advance. The driver must pass the dma_address through + * dma_address of scatterlist. When the driver got a used buffer from + * the vring, it has to unmap the dma address. So the driver must call + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped(). + * + * This must be called before adding any buf to vring. + * So this should be called immediately after init vq or vq reset. + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error. + * 0: success. + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode. + */ +int virtqueue_set_premapped(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + if (!vq->use_dma_api) + return -EINVAL; + + vq->premapped = true; + + return 0; +} +EXPORT_SYMBOL_GPL(virtqueue_set_premapped); + /* Only available for split ring */ struct virtqueue *vring_new_virtqueue(unsigned int index, unsigned int num, diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b93238db94e3..1fc0e1023bd4 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq); unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq); +int virtqueue_set_premapped(struct virtqueue *_vq); + bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:21 UTC
[PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
If the vq is the premapped mode, use the sg_dma_address() directly. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 2afdfb9e3e30..18212c3e056b 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, for (sg = sgs[n]; sg; sg = sg_next(sg)) { dma_addr_t addr; - if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr)) - goto unmap_release; + if (vq->premapped) { + addr = sg_dma_address(sg); + } else { + if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr)) + goto unmap_release; + } prev = i; /* Note that we trust indirect descriptor @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, for (sg = sgs[n]; sg; sg = sg_next(sg)) { dma_addr_t addr; - if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr)) - goto unmap_release; + if (vq->premapped) { + addr = sg_dma_address(sg); + } else { + if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr)) + goto unmap_release; + } prev = i; /* Note that we trust indirect descriptor @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq, return 0; unmap_release: - err_idx = i; + if (!vq->premapped) { + err_idx = i; - if (indirect) - i = 0; - else - i = head; - - for (n = 0; n < total_sg; n++) { - if (i == err_idx) - break; - if (indirect) { - vring_unmap_one_split_indirect(vq, &desc[i]); - i = virtio16_to_cpu(_vq->vdev, desc[i].next); - } else - i = vring_unmap_one_split(vq, i); + if (indirect) + i = 0; + else + i = head; + + for (n = 0; n < total_sg; n++) { + if (i == err_idx) + break; + if (indirect) { + vring_unmap_one_split_indirect(vq, &desc[i]); + i = virtio16_to_cpu(_vq->vdev, desc[i].next); + } else + i = vring_unmap_one_split(vq, i); + } } if (indirect) -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:22 UTC
[PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
If the vq is the premapped mode, use the sg_dma_address() directly. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 18212c3e056b..dc109fbc05a5 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, for (n = 0; n < out_sgs + in_sgs; n++) { for (sg = sgs[n]; sg; sg = sg_next(sg)) { - if (vring_map_one_sg(vq, sg, n < out_sgs ? - DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr)) - goto unmap_release; + if (vq->premapped) { + addr = sg_dma_address(sg); + } else { + if (vring_map_one_sg(vq, sg, n < out_sgs ? + DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr)) + goto unmap_release; + } desc[i].flags = cpu_to_le16(n < out_sgs ? 0 : VRING_DESC_F_WRITE); @@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq, return 0; unmap_release: - err_idx = i; + if (!vq->premapped) { + err_idx = i; - for (i = 0; i < err_idx; i++) - vring_unmap_desc_packed(vq, &desc[i]); + for (i = 0; i < err_idx; i++) + vring_unmap_desc_packed(vq, &desc[i]); + } kfree(desc); @@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, for (sg = sgs[n]; sg; sg = sg_next(sg)) { dma_addr_t addr; - if (vring_map_one_sg(vq, sg, n < out_sgs ? - DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr)) - goto unmap_release; + if (vq->premapped) { + addr = sg_dma_address(sg); + } else { + if (vring_map_one_sg(vq, sg, n < out_sgs ? + DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr)) + goto unmap_release; + } flags = cpu_to_le16(vq->packed.avail_used_flags | (++c == total_sg ? 0 : VRING_DESC_F_NEXT) | @@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq, return 0; unmap_release: + vq->packed.avail_used_flags = avail_used_flags; + + if (vq->premapped) { + END_USE(vq); + return -EIO; + } + err_idx = i; i = head; curr = vq->free_head; - vq->packed.avail_used_flags = avail_used_flags; for (n = 0; n < total_sg; n++) { if (i == err_idx) -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:22 UTC
[PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
Under the premapped mode, the driver needs to unmap the DMA address after receiving the buffer. The virtio core records the DMA address, so the driver needs a way to get the dma info from the virtio core. A straightforward approach is to pass an array to the virtio core when calling virtqueue_get_buf(). However, it is not feasible when there are multiple DMA addresses in the descriptor chain, and the array size is unknown. To solve this problem, a helper be introduced. After calling virtqueue_get_buf(), the driver can call the helper to retrieve a dma info. If the helper function returns -EAGAIN, it means that there are more DMA addresses to be processed, and the driver should call the helper function again. To keep track of the current position in the chain, a cursor must be passed to the helper function, which is initialized by virtqueue_get_buf(). Some processes are done inside this helper, so this helper MUST be called under the premapped mode. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++--- include/linux/virtio.h | 11 ++++ 2 files changed, 119 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index dc109fbc05a5..cdc4349f6066 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq) return needs_kick; } -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, - void **ctx) +static void detach_cursor_init_split(struct vring_virtqueue *vq, + struct virtqueue_detach_cursor *cursor, u16 head) +{ + struct vring_desc_extra *extra; + + extra = &vq->split.desc_extra[head]; + + /* Clear data ptr. */ + vq->split.desc_state[head].data = NULL; + + cursor->head = head; + cursor->done = 0; + + if (extra->flags & VRING_DESC_F_INDIRECT) { + cursor->num = extra->len / sizeof(struct vring_desc); + cursor->indirect = true; + cursor->pos = 0; + + vring_unmap_one_split(vq, head); + + extra->next = vq->free_head; + + vq->free_head = head; + + /* Plus final descriptor */ + vq->vq.num_free++; + + } else { + cursor->indirect = false; + cursor->pos = head; + } +} + +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor, + dma_addr_t *addr, u32 *len, enum dma_data_direction *dir) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); + int rc = -EAGAIN; + + if (unlikely(cursor->done)) + return -EINVAL; + + if (!cursor->indirect) { + struct vring_desc_extra *extra; + unsigned int i; + + i = cursor->pos; + + extra = &vq->split.desc_extra[i]; + + if (vq->split.vring.desc[i].flags & nextflag) { + cursor->pos = extra->next; + } else { + extra->next = vq->free_head; + vq->free_head = cursor->head; + cursor->done = true; + rc = 0; + } + + *addr = extra->addr; + *len = extra->len; + *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + vq->vq.num_free++; + + } else { + struct vring_desc *indir_desc, *desc; + u16 flags; + + indir_desc = vq->split.desc_state[cursor->head].indir_desc; + desc = &indir_desc[cursor->pos]; + + flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); + *addr = virtio64_to_cpu(vq->vq.vdev, desc->addr); + *len = virtio32_to_cpu(vq->vq.vdev, desc->len); + *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + if (++cursor->pos == cursor->num) { + kfree(indir_desc); + cursor->done = true; + return 0; + } + } + + return rc; +} + +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head) { unsigned int i, j; __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head, kfree(indir_desc); vq->split.desc_state[head].indir_desc = NULL; - } else if (ctx) { - *ctx = vq->split.desc_state[head].indir_desc; } } @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq) static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len, - void **ctx) + void **ctx, + struct virtqueue_detach_cursor *cursor) { struct vring_virtqueue *vq = to_vvq(_vq); void *ret; @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, /* detach_buf_split clears data, so grab it now. */ ret = vq->split.desc_state[i].data; - detach_buf_split(vq, i, ctx); + + if (!vq->indirect && ctx) + *ctx = vq->split.desc_state[i].indir_desc; + + if (vq->premapped) + detach_cursor_init_split(vq, cursor, i); + else + detach_buf_split(vq, i); + vq->last_used_idx++; /* If we expect an interrupt for the next entry, tell host * by writing event index and flush out the write before @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq) return true; } -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq, + struct virtqueue_detach_cursor *cursor) { struct vring_virtqueue *vq = to_vvq(_vq); unsigned int i; @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) continue; /* detach_buf_split clears data, so grab it now. */ buf = vq->split.desc_state[i].data; - detach_buf_split(vq, i, NULL); + if (vq->premapped) + detach_cursor_init_split(vq, cursor, i); + else + detach_buf_split(vq, i); vq->split.avail_idx_shadow--; vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->split.avail_idx_shadow); @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len, struct vring_virtqueue *vq = to_vvq(_vq); return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) : - virtqueue_get_buf_ctx_split(_vq, len, ctx); + virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL); } EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx); @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq) struct vring_virtqueue *vq = to_vvq(_vq); return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) : - virtqueue_detach_unused_buf_split(_vq); + virtqueue_detach_unused_buf_split(_vq, NULL); } EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 1fc0e1023bd4..eb4a4e4329aa 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -38,6 +38,17 @@ struct virtqueue { void *priv; }; +struct virtqueue_detach_cursor { + unsigned indirect:1; + unsigned done:1; + unsigned hole:14; + + /* for split head */ + unsigned head:16; + unsigned num:16; + unsigned pos:16; +}; + int virtqueue_add_outbuf(struct virtqueue *vq, struct scatterlist sg[], unsigned int num, void *data, -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:22 UTC
[PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver
Under the premapped mode, the driver needs to unmap the DMA address after receiving the buffer. The virtio core records the DMA address, so the driver needs a way to get the dma info from the virtio core. A straightforward approach is to pass an array to the virtio core when calling virtqueue_get_buf(). However, it is not feasible when there are multiple DMA addresses in the descriptor chain, and the array size is unknown. To solve this problem, a helper be introduced. After calling virtqueue_get_buf(), the driver can call the helper to retrieve a dma info. If the helper function returns -EAGAIN, it means that there are more DMA addresses to be processed, and the driver should call the helper function again. To keep track of the current position in the chain, a cursor must be passed to the helper function, which is initialized by virtqueue_get_buf(). Some processes are done inside this helper, so this helper MUST be called under the premapped mode. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 105 ++++++++++++++++++++++++++++++++--- include/linux/virtio.h | 9 ++- 2 files changed, 103 insertions(+), 11 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cdc4349f6066..cbc22daae7e1 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq) return needs_kick; } +static void detach_cursor_init_packed(struct vring_virtqueue *vq, + struct virtqueue_detach_cursor *cursor, u16 id) +{ + struct vring_desc_state_packed *state = NULL; + u32 len; + + state = &vq->packed.desc_state[id]; + + /* Clear data ptr. */ + state->data = NULL; + + vq->packed.desc_extra[state->last].next = vq->free_head; + vq->free_head = id; + vq->vq.num_free += state->num; + + /* init cursor */ + cursor->curr = id; + cursor->done = 0; + cursor->pos = 0; + + if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) { + len = vq->split.desc_extra[id].len; + + cursor->num = len / sizeof(struct vring_packed_desc); + cursor->indirect = true; + + vring_unmap_extra_packed(vq, &vq->packed.desc_extra[id]); + } else { + cursor->num = state->num; + cursor->indirect = false; + } +} + +static int virtqueue_detach_packed(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor, + dma_addr_t *addr, u32 *len, enum dma_data_direction *dir) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + if (unlikely(cursor->done)) + return -EINVAL; + + if (!cursor->indirect) { + struct vring_desc_extra *extra; + + extra = &vq->packed.desc_extra[cursor->curr]; + cursor->curr = extra->next; + + *addr = extra->addr; + *len = extra->len; + *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + if (++cursor->pos == cursor->num) { + cursor->done = true; + return 0; + } + } else { + struct vring_packed_desc *indir_desc, *desc; + u16 flags; + + indir_desc = vq->packed.desc_state[cursor->curr].indir_desc; + desc = &indir_desc[cursor->pos]; + + flags = le16_to_cpu(desc->flags); + *addr = le64_to_cpu(desc->addr); + *len = le32_to_cpu(desc->len); + *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + if (++cursor->pos == cursor->num) { + kfree(indir_desc); + cursor->done = true; + return 0; + } + } + + return -EAGAIN; +} + static void detach_buf_packed(struct vring_virtqueue *vq, - unsigned int id, void **ctx) + unsigned int id) { struct vring_desc_state_packed *state = NULL; struct vring_packed_desc *desc; @@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq, } kfree(desc); state->indir_desc = NULL; - } else if (ctx) { - *ctx = state->indir_desc; } } @@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct vring_virtqueue *vq) static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len, - void **ctx) + void **ctx, + struct virtqueue_detach_cursor *cursor) { struct vring_virtqueue *vq = to_vvq(_vq); u16 last_used, id, last_used_idx; @@ -1808,7 +1884,14 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, /* detach_buf_packed clears data, so grab it now. */ ret = vq->packed.desc_state[id].data; - detach_buf_packed(vq, id, ctx); + + if (!vq->indirect && ctx) + *ctx = vq->packed.desc_state[id].indir_desc; + + if (vq->premapped) + detach_cursor_init_packed(vq, cursor, id); + else + detach_buf_packed(vq, id); last_used += vq->packed.desc_state[id].num; if (unlikely(last_used >= vq->packed.vring.num)) { @@ -1960,7 +2043,8 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) return true; } -static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq) +static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq, + struct virtqueue_detach_cursor *cursor) { struct vring_virtqueue *vq = to_vvq(_vq); unsigned int i; @@ -1973,7 +2057,10 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq) continue; /* detach_buf clears data, so grab it now. */ buf = vq->packed.desc_state[i].data; - detach_buf_packed(vq, i, NULL); + if (vq->premapped) + detach_cursor_init_packed(vq, cursor, i); + else + detach_buf_packed(vq, i); END_USE(vq); return buf; } @@ -2458,7 +2545,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len, { struct vring_virtqueue *vq = to_vvq(_vq); - return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) : + return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, NULL) : virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL); } EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx); @@ -2590,7 +2677,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); - return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) : + return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, NULL) : virtqueue_detach_unused_buf_split(_vq, NULL); } EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index eb4a4e4329aa..7f137c7a9034 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -43,8 +43,13 @@ struct virtqueue_detach_cursor { unsigned done:1; unsigned hole:14; - /* for split head */ - unsigned head:16; + union { + /* for split head */ + unsigned head:16; + + /* for packed id */ + unsigned curr:16; + }; unsigned num:16; unsigned pos:16; }; -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:22 UTC
[PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
This patch introduces three helpers for premapped mode. * virtqueue_get_buf_premapped * virtqueue_detach_unused_buf_premapped The above helpers work like the non-premapped funcs. But a cursor is passed. virtqueue_detach is used to get the dma info of the last buf by cursor. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++ include/linux/virtio.h | 10 +++++ 2 files changed, 93 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cbc22daae7e1..6771b9661798 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len) return virtqueue_get_buf_ctx(_vq, len, NULL); } EXPORT_SYMBOL_GPL(virtqueue_get_buf); + +/** + * virtqueue_get_buf_premapped - get the next used buffer + * @_vq: the struct virtqueue we're talking about. + * @len: the length written into the buffer + * @ctx: extra context for the token + * @cursor: detach cursor + * + * If the device wrote data into the buffer, @len will be set to the + * amount written. This means you don't need to clear the buffer + * beforehand to ensure there's no data leakage in the case of short + * writes. + * + * Caller must ensure we don't call this with other virtqueue + * operations at the same time (except where noted). + * + * This is used for the premapped vq. The cursor is passed by the dirver, that + * is used for virtqueue_detach. That will be initialized by virtio core + * internally. + * + * Returns NULL if there are no used buffers, or the "data" token + * handed to virtqueue_add_*(). + */ +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len, + void **ctx, + struct virtqueue_detach_cursor *cursor) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) : + virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor); +} +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped); + +/** + * virtqueue_detach - get the dma info of last buf + * @_vq: the struct virtqueue we're talking about. + * @cursor: detach cursor + * @addr: the dma address + * @len: the length of the dma address + * @dir: the direction of the dma address + * + * This is used for the premapped vq. The cursor is initialized by + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped. + * + * Returns: + * -EAGAIN: there are more dma info, this function should be called more. + * -EINVAL: the process is done, should not call this function + * 0: no more dma info + */ +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor, + dma_addr_t *addr, u32 *len, enum dma_data_direction *dir) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) : + virtqueue_detach_split(_vq, cursor, addr, len, dir); +} +EXPORT_SYMBOL_GPL(virtqueue_detach); + /** * virtqueue_disable_cb - disable callbacks * @_vq: the struct virtqueue we're talking about. @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq) } EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf); +/** + * virtqueue_detach_unused_buf_premapped - detach first unused buffer + * @_vq: the struct virtqueue we're talking about. + * @cursor: detach cursor + * + * This is used for the premapped vq. The cursor is passed by the dirver, that + * is used for virtqueue_detach. That will be initialized by virtio core + * internally. + * + * Returns NULL or the "data" token handed to virtqueue_add_*(). + * This is not valid on an active queue; it is useful for device + * shutdown or the reset queue. + */ +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq, + struct virtqueue_detach_cursor *cursor) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) : + virtqueue_detach_unused_buf_split(_vq, cursor); +} +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped); + static inline bool more_used(const struct vring_virtqueue *vq) { return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 7f137c7a9034..0a11c5b32fe5 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -3,6 +3,7 @@ #define _LINUX_VIRTIO_H /* Everything a virtio driver needs to work with any particular virtio * implementation. */ +#include <linux/dma-mapping.h> #include <linux/types.h> #include <linux/scatterlist.h> #include <linux/spinlock.h> @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len); void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len, void **ctx); +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len, + void **ctx, + struct virtqueue_detach_cursor *cursor); + void virtqueue_disable_cb(struct virtqueue *vq); bool virtqueue_enable_cb(struct virtqueue *vq); @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned); bool virtqueue_enable_cb_delayed(struct virtqueue *vq); void *virtqueue_detach_unused_buf(struct virtqueue *vq); +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq, + struct virtqueue_detach_cursor *cursor); unsigned int virtqueue_get_vring_size(const struct virtqueue *vq); @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq); int virtqueue_resize(struct virtqueue *vq, u32 num, void (*recycle)(struct virtqueue *vq, void *buf)); +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor, + dma_addr_t *addr, u32 *len, enum dma_data_direction *dir); + /** * struct virtio_device - representation of a device using virtio * @index: unique position on the virtio bus -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:22 UTC
[PATCH vhost v10 08/10] virtio_ring: introduce virtqueue_dma_dev()
Added virtqueue_dma_dev() to get DMA device for virtio. Then the caller can do dma operation in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 17 +++++++++++++++++ include/linux/virtio.h | 2 ++ 2 files changed, 19 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 6771b9661798..56444f872967 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2459,6 +2459,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq, } EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx); +/** + * virtqueue_dma_dev - get the dma dev + * @_vq: the struct virtqueue we're talking about. + * + * Returns the dma dev. That can been used for dma api. + */ +struct device *virtqueue_dma_dev(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + if (vq->use_dma_api) + return vring_dma_dev(vq); + else + return NULL; +} +EXPORT_SYMBOL_GPL(virtqueue_dma_dev); + /** * virtqueue_kick_prepare - first half of split virtqueue_kick call. * @_vq: the struct virtqueue diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 0a11c5b32fe5..b24f0a665390 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -78,6 +78,8 @@ int virtqueue_add_sgs(struct virtqueue *vq, void *data, gfp_t gfp); +struct device *virtqueue_dma_dev(struct virtqueue *vq); + bool virtqueue_kick(struct virtqueue *vq); bool virtqueue_kick_prepare(struct virtqueue *vq); -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jun-02 09:22 UTC
[PATCH vhost v10 09/10] virtio_ring: introduce virtqueue_add_sg()
Introduce virtqueueu_add_sg(), so that in virtio-net we can create an unify api for rq and sq. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++ include/linux/virtio.h | 4 ++++ 2 files changed, 27 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 56444f872967..a00f049ea442 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2356,6 +2356,29 @@ static inline int virtqueue_add(struct virtqueue *_vq, out_sgs, in_sgs, data, ctx, gfp); } +/** + * virtqueue_add_sg - expose buffers to other end + * @vq: the struct virtqueue we're talking about. + * @sg: a scatterlist + * @num: the number of entries in @sg + * @out: whether the sg is readable by other side + * @data: the token identifying the buffer. + * @ctx: extra context for the token + * @gfp: how to do memory allocations (if necessary). + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO). + */ +int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg, + unsigned int num, bool out, void *data, + void *ctx, gfp_t gfp) +{ + return virtqueue_add(vq, &sg, num, (int)out, (int)!out, data, ctx, gfp); +} +EXPORT_SYMBOL_GPL(virtqueue_add_sg); + /** * virtqueue_add_sgs - expose buffers to other end * @_vq: the struct virtqueue we're talking about. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index b24f0a665390..1a4aa4382c53 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -55,6 +55,10 @@ struct virtqueue_detach_cursor { unsigned pos:16; }; +int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg, + unsigned int num, bool out, void *data, + void *ctx, gfp_t gfp); + int virtqueue_add_outbuf(struct virtqueue *vq, struct scatterlist sg[], unsigned int num, void *data, -- 2.32.0.3.g01195cf9f
Introduce the module param "experiment_premapped" to enable the function that the virtio-net do dma mapping. If that is true, the vq of virtio-net is under the premapped mode. It just handle the sg with dma_address. And the driver must get the dma address of the buffer to unmap after get the buffer from virtio core. That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet xmit will share the tx queue, so the skb xmit must support the premapped mode. Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> --- drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------ 1 file changed, 141 insertions(+), 22 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 2396c28c0122..5898212fcb3c 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -26,10 +26,11 @@ static int napi_weight = NAPI_POLL_WEIGHT; module_param(napi_weight, int, 0444); -static bool csum = true, gso = true, napi_tx = true; +static bool csum = true, gso = true, napi_tx = true, experiment_premapped; module_param(csum, bool, 0444); module_param(gso, bool, 0444); module_param(napi_tx, bool, 0644); +module_param(experiment_premapped, bool, 0644); /* FIXME: MTU in config. */ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) @@ -142,6 +143,9 @@ struct send_queue { /* Record whether sq is in reset state. */ bool reset; + + /* The vq is premapped mode. */ + bool premapped; }; /* Internal representation of a receive virtqueue */ @@ -174,6 +178,9 @@ struct receive_queue { char name[16]; struct xdp_rxq_info xdp_rxq; + + /* The vq is premapped mode. */ + bool premapped; }; /* This structure can contain rss message with maximum settings for indirection table and keysize @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, return skb; } +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor) +{ + enum dma_data_direction dir; + dma_addr_t addr; + u32 len; + int err; + + do { + err = virtqueue_detach(vq, cursor, &addr, &len, &dir); + if (!err || err == -EAGAIN) + dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0); + + } while (err == -EAGAIN); + + return err; +} + +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped) +{ + struct virtqueue_detach_cursor cursor; + void *buf; + + if (!premapped) + return virtqueue_detach_unused_buf(vq); + + buf = virtqueue_detach_unused_buf_premapped(vq, &cursor); + if (buf) + virtnet_generic_unmap(vq, &cursor); + + return buf; +} + +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx) +{ + struct virtqueue_detach_cursor cursor; + void *buf; + + if (!premapped) + return virtqueue_get_buf_ctx(vq, len, ctx); + + buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor); + if (buf) + virtnet_generic_unmap(vq, &cursor); + + return buf; +} + +#define virtnet_rq_get_buf(rq, plen, pctx) \ +({ \ + typeof(rq) _rq = (rq); \ + virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \ +}) + +#define virtnet_sq_get_buf(sq, plen, pctx) \ +({ \ + typeof(sq) _sq = (sq); \ + virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \ +}) + +static int virtnet_add_sg(struct virtqueue *vq, bool premapped, + struct scatterlist *sg, unsigned int num, bool out, + void *data, void *ctx, gfp_t gfp) +{ + enum dma_data_direction dir; + struct device *dev; + int err, ret; + + if (!premapped) + return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp); + + dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE; + dev = virtqueue_dma_dev(vq); + + ret = dma_map_sg_attrs(dev, sg, num, dir, 0); + if (ret != num) + goto err; + + err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp); + if (err < 0) + goto err; + + return 0; + +err: + dma_unmap_sg_attrs(dev, sg, num, dir, 0); + return -ENOMEM; +} + +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data) +{ + return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC); +} + +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data, + void *ctx, gfp_t gfp) +{ + return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp); +} + static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) { unsigned int len; @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi) unsigned int bytes = 0; void *ptr; - while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { + while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) { if (likely(!is_xdp_frame(ptr))) { struct sk_buff *skb = ptr; @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi, skb_frag_size(frag), skb_frag_off(frag)); } - err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1, - xdp_to_ptr(xdpf), GFP_ATOMIC); + err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf)); if (unlikely(err)) return -ENOSPC; /* Caller handle free/refcnt */ @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev, } /* Free up any pending old buffers before queueing new ones. */ - while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) { + while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) { if (likely(is_xdp_frame(ptr))) { struct xdp_frame *frame = ptr_to_xdp(ptr); @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, void *buf; int off; - buf = virtqueue_get_buf(rq->vq, &buflen); + buf = virtnet_rq_get_buf(rq, &buflen, NULL); if (unlikely(!buf)) goto err_buf; @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev, return -EINVAL; while (--*num_buf > 0) { - buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx); + buf = virtnet_rq_get_buf(rq, &len, &ctx); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers out of %d missing\n", dev->name, *num_buf, @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, while (--num_buf) { int num_skb_frags; - buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx); + buf = virtnet_rq_get_buf(rq, &len, &ctx); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers out of %d missing\n", dev->name, num_buf, @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, err_skb: put_page(page); while (num_buf-- > 1) { - buf = virtqueue_get_buf(rq->vq, &len); + buf = virtnet_rq_get_buf(rq, &len, NULL); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers missing\n", dev->name, num_buf); @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq, alloc_frag->offset += len; sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom, vi->hdr_len + GOOD_PACKET_LEN); - err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); + err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp); if (err < 0) put_page(virt_to_head_page(buf)); return err; @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq, /* chain first in list head */ first->private = (unsigned long)list; - err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2, - first, gfp); + err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2, + first, NULL, gfp); if (err < 0) give_pages(rq, first); @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi, sg_init_one(rq->sg, buf, len); ctx = mergeable_len_to_ctx(len + room, headroom); - err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp); + err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp); if (err < 0) put_page(virt_to_head_page(buf)); @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget, void *ctx; while (stats.packets < budget && - (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) { + (buf = virtnet_rq_get_buf(rq, &len, &ctx))) { receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats); stats.packets++; } } else { while (stats.packets < budget && - (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { + (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) { receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats); stats.packets++; } @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb) return num_sg; num_sg++; } - return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC); + return virtnet_add_outbuf(sq, num_sg, skb); } static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi) int i; for (i = 0; i < vi->max_queue_pairs; i++) { - struct virtqueue *vq = vi->sq[i].vq; - while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) - virtnet_sq_free_unused_buf(vq, buf); + struct send_queue *sq = &vi->sq[i]; + + while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL) + virtnet_sq_free_unused_buf(sq->vq, buf); } for (i = 0; i < vi->max_queue_pairs; i++) { - struct virtqueue *vq = vi->rq[i].vq; - while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) - virtnet_rq_free_unused_buf(vq, buf); + struct receive_queue *rq = &vi->rq[i]; + + while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL) + virtnet_rq_free_unused_buf(rq->vq, buf); } } @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi) vi->rq[i].vq = vqs[rxq2vq(i)]; vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq); vi->sq[i].vq = vqs[txq2vq(i)]; + + if (experiment_premapped) { + if (!virtqueue_set_premapped(vi->rq[i].vq)) + vi->rq[i].premapped = true; + else + netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i); + + if (!virtqueue_set_premapped(vi->sq[i].vq)) + vi->sq[i].premapped = true; + else + netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i); + } } /* run here: ret == 0. */ -- 2.32.0.3.g01195cf9f
Hi Jason, Do you have plan to review this? Thanks.
Michael S. Tsirkin
2023-Jun-22 19:38 UTC
[PATCH vhost v10 00/10] virtio core prepares for AF_XDP
On Fri, Jun 02, 2023 at 05:21:56PM +0800, Xuan Zhuo wrote:> ## About DMA APIs > > Now, virtio may can not work with DMA APIs when virtio features do not have > VIRTIO_F_ACCESS_PLATFORM. > > 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just > work with the "real" devices. > 2. I tried to let xsk support callballs to get phy address from virtio-net > driver as the dma address. But the maintainers of xsk may want to use dma-buf > to replace the DMA APIs. I think that may be a larger effort. We will wait > too long. > > So rethinking this, firstly, we can support premapped-dma only for devices with > VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it, > they have to update the device to support VIRTIO_F_RING_RESET, and they can also > enable the device's VIRTIO_F_ACCESS_PLATFORM feature. > > Thanks for the help from Christoph. > > ================> > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero > copy feature of xsk (XDP socket) needs to be supported by the driver. The > performance of zero copy is very good. > > ENV: Qemu with vhost. > > vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS > -----------------------------|---------------|------------------|------------ > xmit by sockperf: 90% | 100% | | 318967 > xmit by xsk: 100% | 30% | 33% | 1192064 > recv by sockperf: 100% | 68% | 100% | 692288 > recv by xsk: 100% | 33% | 43% | 771670 > > Before achieving the function of Virtio-Net, we also have to let virtio core > support these features:So by itself, this doesn't do this. But what effect does all this overhead have on performance?> 1. virtio core support premapped > 2. virtio core support reset per-queue > 3. introduce DMA APIs to virtio core > > Please review. > > Thanks. > > v10: > 1. support to set vq to premapped mode, then the vq just handles the premapped request. > 2. virtio-net support to do dma mapping in advance > > v9: > 1. use flag to distinguish the premapped operations. no do judgment by sg. > > v8: > 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address > 2. remove unused code from vring_map_one_sg() > > v7: > 1. virtqueue_dma_dev() return NULL when virtio is without DMA API. > > v6: > 1. change the size of the flags to u32. > > v5: > 1. fix for error handler > 2. add flags to record internal dma mapping > > v4: > 1. rename map_inter to dma_map_internal > 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev' > > v3: > 1. add map_inter to struct desc state to reocrd whether virtio core do dma map > > v2: > 1. based on sgs[0]->dma_address to judgment is premapped > 2. based on extra.addr to judgment to do unmap for no-indirect desc > 3. based on indir_desc to judgment to do unmap for indirect desc > 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev > > v1: > 1. expose dma device. NO introduce the api for dma and sync > 2. split some commit for review. > > > > > Xuan Zhuo (10): > virtio_ring: put mapping error check in vring_map_one_sg > virtio_ring: introduce virtqueue_set_premapped() > virtio_ring: split: support add premapped buf > virtio_ring: packed: support add premapped buf > virtio_ring: split-detach: support return dma info to driver > virtio_ring: packed-detach: support return dma info to driver > virtio_ring: introduce helpers for premapped > virtio_ring: introduce virtqueue_dma_dev() > virtio_ring: introduce virtqueue_add_sg() > virtio_net: support dma premapped > > drivers/net/virtio_net.c | 163 ++++++++++-- > drivers/virtio/virtio_ring.c | 493 +++++++++++++++++++++++++++++++---- > include/linux/virtio.h | 34 +++ > 3 files changed, 612 insertions(+), 78 deletions(-) > > -- > 2.32.0.3.g01195cf9f