## About DMA APIs
Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.
1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs
just
work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
driver as the dma address. But the maintainers of xsk may want to use dma-buf
to replace the DMA APIs. I think that may be a larger effort. We will wait
too long.
So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.
Thanks for the help from Christoph.
================
XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.
ENV: Qemu with vhost.
vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf: 90% | 100% | | 318967
xmit by xsk: 100% | 30% | 33% | 1192064
recv by sockperf: 100% | 68% | 100% | 692288
recv by xsk: 100% | 33% | 43% | 771670
Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:
1. virtio core support premapped
2. virtio core support reset per-queue
================
After introducing premapping, I added an example to virtio-net. virtio-net can
merge dma mappings through this feature. @Jason
Please review.
Thanks.
v11
1. virtio-net merges dma operates based on the feature premapped
2. A better way to handle the map error with the premapped
v10:
1. support to set vq to premapped mode, then the vq just handles the premapped
request.
2. virtio-net support to do dma mapping in advance
v9:
1. use flag to distinguish the premapped operations. no do judgment by sg.
v8:
1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid
dma address
2. remove unused code from vring_map_one_sg()
v7:
1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
v6:
1. change the size of the flags to u32.
v5:
1. fix for error handler
2. add flags to record internal dma mapping
v4:
1. rename map_inter to dma_map_internal
2. fix: Excess function parameter 'vq' description in
'virtqueue_dma_dev'
v3:
1. add map_inter to struct desc state to reocrd whether virtio core do dma map
v2:
1. based on sgs[0]->dma_address to judgment is premapped
2. based on extra.addr to judgment to do unmap for no-indirect desc
3. based on indir_desc to judgment to do unmap for indirect desc
4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
v1:
1. expose dma device. NO introduce the api for dma and sync
2. split some commit for review.
Xuan Zhuo (10):
virtio_ring: check use_dma_api before unmap desc for indirect
virtio_ring: put mapping error check in vring_map_one_sg
virtio_ring: introduce virtqueue_set_premapped()
virtio_ring: support add premapped buf
virtio_ring: introduce virtqueue_dma_dev()
virtio_ring: skip unmap for premapped
virtio_ring: correct the expression of the description of
virtqueue_resize()
virtio_ring: separate the logic of reset/enable from virtqueue_resize
virtio_ring: introduce virtqueue_reset()
virtio_net: merge dma operation for one page
drivers/net/virtio_net.c | 283 +++++++++++++++++++++++++++++++++--
drivers/virtio/virtio_ring.c | 257 ++++++++++++++++++++++++-------
include/linux/virtio.h | 6 +
3 files changed, 478 insertions(+), 68 deletions(-)
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 01/10] virtio_ring: check use_dma_api before unmap desc for indirect
Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
Acked-by: Jason Wang <jasowang at redhat.com>
---
drivers/virtio/virtio_ring.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..f8754f1d64d3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -774,8 +774,10 @@ static void detach_buf_split(struct vring_virtqueue *vq,
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
- for (j = 0; j < len / sizeof(struct vring_desc); j++)
- vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+ if (vq->use_dma_api) {
+ for (j = 0; j < len / sizeof(struct vring_desc); j++)
+ vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+ }
kfree(indir_desc);
vq->split.desc_state[head].indir_desc = NULL;
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 02/10] virtio_ring: put mapping error check in vring_map_one_sg
This patch put the dma addr error check in vring_map_one_sg().
The benefits of doing this:
1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
vring_mapping_error to check the return value. simplifies subsequent
code
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
Acked-by: Jason Wang <jasowang at redhat.com>
---
drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
1 file changed, 22 insertions(+), 15 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f8754f1d64d3..87d7ceeecdbd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct
vring_virtqueue *vq)
}
/* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
- struct scatterlist *sg,
- enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct
scatterlist *sg,
+ enum dma_data_direction direction, dma_addr_t *addr)
{
if (!vq->use_dma_api) {
/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct
vring_virtqueue *vq,
* depending on the direction.
*/
kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
- return (dma_addr_t)sg_phys(sg);
+ *addr = (dma_addr_t)sg_phys(sg);
+ return 0;
}
/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct
vring_virtqueue *vq,
* the way it expects (we don't guarantee that the scatterlist
* will exist for the lifetime of the mapping).
*/
- return dma_map_page(vring_dma_dev(vq),
+ *addr = dma_map_page(vring_dma_dev(vq),
sg_page(sg), sg->offset, sg->length,
direction);
+
+ if (dma_mapping_error(vring_dma_dev(vq), *addr))
+ return -ENOMEM;
+
+ return 0;
}
static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
- if (vring_mapping_error(vq, addr))
+ dma_addr_t addr;
+
+ if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
goto unmap_release;
prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
- if (vring_mapping_error(vq, addr))
+ dma_addr_t addr;
+
+ if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
goto unmap_release;
prev = i;
@@ -1281,9 +1288,8 @@ static int virtqueue_add_indirect_packed(struct
vring_virtqueue *vq,
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- addr = vring_map_one_sg(vq, sg, n < out_sgs ?
- DMA_TO_DEVICE : DMA_FROM_DEVICE);
- if (vring_mapping_error(vq, addr))
+ if (vring_map_one_sg(vq, sg, n < out_sgs ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
goto unmap_release;
desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1428,9 +1434,10 @@ static inline int virtqueue_add_packed(struct virtqueue
*_vq,
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
- dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
- DMA_TO_DEVICE : DMA_FROM_DEVICE);
- if (vring_mapping_error(vq, addr))
+ dma_addr_t addr;
+
+ if (vring_map_one_sg(vq, sg, n < out_sgs ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
goto unmap_release;
flags = cpu_to_le16(vq->packed.avail_used_flags |
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.
This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 45 ++++++++++++++++++++++++++++++++++++
include/linux/virtio.h | 2 ++
2 files changed, 47 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 87d7ceeecdbd..5ace4539344c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
/* Host publishes avail event idx */
bool event;
+ /* Do DMA mapping by driver */
+ bool premapped;
+
/* Head of free buffer list. */
unsigned int free_head;
/* Number we've added since last sync. */
@@ -2061,6 +2064,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->packed_ring = true;
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
+ vq->premapped = false;
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC)
&&
!context;
@@ -2550,6 +2554,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned
int index,
#endif
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
+ vq->premapped = false;
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC)
&&
!context;
@@ -2693,6 +2698,46 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
}
EXPORT_SYMBOL_GPL(virtqueue_resize);
+/**
+ * virtqueue_set_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address.
+ *
+ * This function must be called immediately after creating the vq, or after vq
+ * reset, and before adding any buffers to it.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped
mode.
+ */
+int virtqueue_set_premapped(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ u32 num;
+
+ num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
+
+ if (num != vq->vq.num_free)
+ return -EINVAL;
+
+ if (!vq->use_dma_api)
+ return -EINVAL;
+
+ vq->premapped = true;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+
/* Only available for split ring */
struct virtqueue *vring_new_virtqueue(unsigned int index,
unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index de6041deee37..2efd07b79ecf 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
+int virtqueue_set_premapped(struct virtqueue *_vq);
+
bool virtqueue_poll(struct virtqueue *vq, unsigned);
bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 04/10] virtio_ring: support add premapped buf
If the vq is the premapped mode, use the sg_dma_address() directly.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5ace4539344c..d471dee3f4f7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct
vring_virtqueue *vq)
static int vring_map_one_sg(const struct vring_virtqueue *vq, struct
scatterlist *sg,
enum dma_data_direction direction, dma_addr_t *addr)
{
+ if (vq->premapped) {
+ *addr = sg_dma_address(sg);
+ return 0;
+ }
+
if (!vq->use_dma_api) {
/*
* If DMA is not used, KMSAN doesn't know that the scatterlist
@@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue
*_vq,
dma_addr_t addr = vring_map_single(
vq, desc, total_sg * sizeof(struct vring_desc),
DMA_TO_DEVICE);
- if (vring_mapping_error(vq, addr))
+ if (vring_mapping_error(vq, addr)) {
+ if (vq->premapped)
+ goto free_indirect;
+
goto unmap_release;
+ }
virtqueue_add_desc_split(_vq, vq->split.vring.desc,
head, addr,
@@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
i = vring_unmap_one_split(vq, i);
}
+free_indirect:
if (indirect)
kfree(desc);
@@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct
vring_virtqueue *vq,
addr = vring_map_single(vq, desc,
total_sg * sizeof(struct vring_packed_desc),
DMA_TO_DEVICE);
- if (vring_mapping_error(vq, addr))
+ if (vring_mapping_error(vq, addr)) {
+ if (vq->premapped)
+ goto free_desc;
+
goto unmap_release;
+ }
vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
@@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct
vring_virtqueue *vq,
for (i = 0; i < err_idx; i++)
vring_unmap_desc_packed(vq, &desc[i]);
+free_desc:
kfree(desc);
END_USE(vq);
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
include/linux/virtio.h | 2 ++
2 files changed, 19 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d471dee3f4f7..1fb2c6dca9ea 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
}
EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+
+ if (vq->use_dma_api)
+ return vring_dma_dev(vq);
+ else
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
/**
* virtqueue_kick_prepare - first half of split virtqueue_kick call.
* @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2efd07b79ecf..35d175121cc6 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
void *data,
gfp_t gfp);
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
bool virtqueue_kick(struct virtqueue *vq);
bool virtqueue_kick_prepare(struct virtqueue *vq);
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
Now we add a case where we skip dma unmap, the vq->premapped is true.
We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default,
it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.
So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
---
drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
1 file changed, 28 insertions(+), 14 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1fb2c6dca9ea..10ee3b7ce571 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -175,6 +175,11 @@ struct vring_virtqueue {
/* Do DMA mapping by driver */
bool premapped;
+ /* Do unmap or not for desc. Just when premapped is False and
+ * use_dma_api is true, this is true.
+ */
+ bool do_unmap;
+
/* Head of free buffer list. */
unsigned int free_head;
/* Number we've added since last sync. */
@@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct
vring_virtqueue *vq,
{
u16 flags;
- if (!vq->use_dma_api)
+ if (!vq->do_unmap)
return;
flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct
vring_virtqueue *vq,
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
- if (!vq->use_dma_api)
- goto out;
-
flags = extra[i].flags;
if (flags & VRING_DESC_F_INDIRECT) {
+ if (!vq->use_dma_api)
+ goto out;
+
dma_unmap_single(vring_dma_dev(vq),
extra[i].addr,
extra[i].len,
(flags & VRING_DESC_F_WRITE) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+ if (!vq->do_unmap)
+ goto out;
+
dma_unmap_page(vring_dma_dev(vq),
extra[i].addr,
extra[i].len,
@@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
}
/* Last one doesn't continue. */
desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
- if (!indirect && vq->use_dma_api)
+ if (!indirect && vq->do_unmap)
vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &
~VRING_DESC_F_NEXT;
@@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq,
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
- if (vq->use_dma_api) {
+ if (vq->do_unmap) {
for (j = 0; j < len / sizeof(struct vring_desc); j++)
vring_unmap_one_split_indirect(vq, &indir_desc[j]);
}
@@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct
vring_virtqueue *vq,
{
u16 flags;
- if (!vq->use_dma_api)
- return;
-
flags = extra->flags;
if (flags & VRING_DESC_F_INDIRECT) {
+ if (!vq->use_dma_api)
+ return;
+
dma_unmap_single(vring_dma_dev(vq),
extra->addr, extra->len,
(flags & VRING_DESC_F_WRITE) ?
DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+ if (!vq->do_unmap)
+ return;
+
dma_unmap_page(vring_dma_dev(vq),
extra->addr, extra->len,
(flags & VRING_DESC_F_WRITE) ?
@@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct
vring_virtqueue *vq,
{
u16 flags;
- if (!vq->use_dma_api)
+ if (!vq->do_unmap)
return;
flags = le16_to_cpu(desc->flags);
@@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct
vring_virtqueue *vq,
sizeof(struct vring_packed_desc));
vq->packed.vring.desc[head].id = cpu_to_le16(id);
- if (vq->use_dma_api) {
+ if (vq->do_unmap) {
vq->packed.desc_extra[id].addr = addr;
vq->packed.desc_extra[id].len = total_sg *
sizeof(struct vring_packed_desc);
@@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue
*_vq,
desc[i].len = cpu_to_le32(sg->length);
desc[i].id = cpu_to_le16(id);
- if (unlikely(vq->use_dma_api)) {
+ if (unlikely(vq->do_unmap)) {
vq->packed.desc_extra[curr].addr = addr;
vq->packed.desc_extra[curr].len = sg->length;
vq->packed.desc_extra[curr].flags @@ -1604,7 +1615,7 @@ static void
detach_buf_packed(struct vring_virtqueue *vq,
vq->free_head = id;
vq->vq.num_free += state->num;
- if (unlikely(vq->use_dma_api)) {
+ if (unlikely(vq->do_unmap)) {
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
@@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
if (!desc)
return;
- if (vq->use_dma_api) {
+ if (vq->do_unmap) {
len = vq->packed.desc_extra[id].len;
for (i = 0; i < len / sizeof(struct vring_packed_desc);
i++)
@@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
vq->premapped = false;
+ vq->do_unmap = vq->use_dma_api;
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC)
&&
!context;
@@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned
int index,
vq->dma_dev = dma_dev;
vq->use_dma_api = vring_use_dma_api(vdev);
vq->premapped = false;
+ vq->do_unmap = vq->use_dma_api;
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC)
&&
!context;
@@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
return -EINVAL;
vq->premapped = true;
+ vq->do_unmap = false;
return 0;
}
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 07/10] virtio_ring: correct the expression of the description of virtqueue_resize()
Modify the "useless" to a more accurate "unused". Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com> Acked-by: Jason Wang <jasowang at redhat.com> --- drivers/virtio/virtio_ring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 10ee3b7ce571..dcbc8a5eaf16 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2678,7 +2678,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma); * virtqueue_resize - resize the vring of vq * @_vq: the struct virtqueue we're talking about. * @num: new ring num - * @recycle: callback for recycle the useless buffer + * @recycle: callback to recycle unused buffers * * When it is really necessary to create a new vring, it will set the current vq * into the reset state. Then call the passed callback to recycle the buffer -- 2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 08/10] virtio_ring: separate the logic of reset/enable from virtqueue_resize
The subsequent reset function will reuse these logic.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
Acked-by: Jason Wang <jasowang at redhat.com>
---
drivers/virtio/virtio_ring.c | 58 ++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 19 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dcbc8a5eaf16..bed0237402fa 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2152,6 +2152,43 @@ static int virtqueue_resize_packed(struct virtqueue *_vq,
u32 num)
return -ENOMEM;
}
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+ void (*recycle)(struct virtqueue *vq, void *buf))
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct virtio_device *vdev = vq->vq.vdev;
+ void *buf;
+ int err;
+
+ if (!vq->we_own_ring)
+ return -EPERM;
+
+ if (!vdev->config->disable_vq_and_reset)
+ return -ENOENT;
+
+ if (!vdev->config->enable_vq_after_reset)
+ return -ENOENT;
+
+ err = vdev->config->disable_vq_and_reset(_vq);
+ if (err)
+ return err;
+
+ while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+ recycle(_vq, buf);
+
+ return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ struct virtio_device *vdev = vq->vq.vdev;
+
+ if (vdev->config->enable_vq_after_reset(_vq))
+ return -EBUSY;
+
+ return 0;
+}
/*
* Generic functions and exported symbols.
@@ -2702,13 +2739,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
void (*recycle)(struct virtqueue *vq, void *buf))
{
struct vring_virtqueue *vq = to_vvq(_vq);
- struct virtio_device *vdev = vq->vq.vdev;
- void *buf;
int err;
- if (!vq->we_own_ring)
- return -EPERM;
-
if (num > vq->vq.num_max)
return -E2BIG;
@@ -2718,28 +2750,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) ==
num)
return 0;
- if (!vdev->config->disable_vq_and_reset)
- return -ENOENT;
-
- if (!vdev->config->enable_vq_after_reset)
- return -ENOENT;
-
- err = vdev->config->disable_vq_and_reset(_vq);
+ err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
- while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
- recycle(_vq, buf);
-
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
else
err = virtqueue_resize_split(_vq, num);
- if (vdev->config->enable_vq_after_reset(_vq))
- return -EBUSY;
-
- return err;
+ return virtqueue_enable_after_reset(_vq);
}
EXPORT_SYMBOL_GPL(virtqueue_resize);
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 09/10] virtio_ring: introduce virtqueue_reset()
Introduce virtqueue_reset() to release all buffer inside vq.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
Acked-by: Jason Wang <jasowang at redhat.com>
---
drivers/virtio/virtio_ring.c | 33 +++++++++++++++++++++++++++++++++
include/linux/virtio.h | 2 ++
2 files changed, 35 insertions(+)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bed0237402fa..1f4681102190 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2804,6 +2804,39 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
}
EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+ void (*recycle)(struct virtqueue *vq, void *buf))
+{
+ struct vring_virtqueue *vq = to_vvq(_vq);
+ int err;
+
+ err = virtqueue_disable_and_recycle(_vq, recycle);
+ if (err)
+ return err;
+
+ if (vq->packed_ring)
+ virtqueue_reinit_packed(vq);
+ else
+ virtqueue_reinit_split(vq);
+
+ return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
/* Only available for split ring */
struct virtqueue *vring_new_virtqueue(unsigned int index,
unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 35d175121cc6..465e8e0e215a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -99,6 +99,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue
*vq);
int virtqueue_resize(struct virtqueue *vq, u32 num,
void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+ void (*recycle)(struct virtqueue *vq, void *buf));
/**
* struct virtio_device - representation of a device using virtio
--
2.32.0.3.g01195cf9f
Xuan Zhuo
2023-Jul-10 03:42 UTC
[PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
Currently, the virtio core will perform a dma operation for each
operation. Although, the same page may be operated multiple times.
The driver does the dma operation and manages the dma address based the
feature premapped of virtio core.
This way, we can perform only one dma operation for the same page. In
the case of mtu 1500, this can reduce a lot of dma operations.
Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
increased from 1893766 to 1901105. An increase of 0.4%.
Signed-off-by: Xuan Zhuo <xuanzhuo at linux.alibaba.com>
---
drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
1 file changed, 267 insertions(+), 16 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 486b5849033d..4de845d35bed 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -126,6 +126,27 @@ static const struct virtnet_stat_desc
virtnet_rq_stats_desc[] = {
#define VIRTNET_SQ_STATS_LEN ARRAY_SIZE(virtnet_sq_stats_desc)
#define VIRTNET_RQ_STATS_LEN ARRAY_SIZE(virtnet_rq_stats_desc)
+/* The bufs on the same page may share this struct. */
+struct virtnet_rq_dma {
+ struct virtnet_rq_dma *next;
+
+ dma_addr_t addr;
+
+ void *buf;
+ u32 len;
+
+ u32 ref;
+};
+
+/* Record the dma and buf. */
+struct virtnet_rq_data {
+ struct virtnet_rq_data *next;
+
+ void *buf;
+
+ struct virtnet_rq_dma *dma;
+};
+
/* Internal representation of a send virtqueue */
struct send_queue {
/* Virtqueue associated with this send _queue */
@@ -175,6 +196,13 @@ struct receive_queue {
char name[16];
struct xdp_rxq_info xdp_rxq;
+
+ struct virtnet_rq_data *data_array;
+ struct virtnet_rq_data *data_free;
+
+ struct virtnet_rq_dma *dma_array;
+ struct virtnet_rq_dma *dma_free;
+ struct virtnet_rq_dma *last_dma;
};
/* This structure can contain rss message with maximum settings for indirection
table and keysize
@@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info
*vi,
return skb;
}
+static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma
*dma)
+{
+ struct device *dev;
+
+ --dma->ref;
+
+ if (dma->ref)
+ return;
+
+ dev = virtqueue_dma_dev(rq->vq);
+
+ dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
+
+ dma->next = rq->dma_free;
+ rq->dma_free = dma;
+}
+
+static void *virtnet_rq_recycle_data(struct receive_queue *rq,
+ struct virtnet_rq_data *data)
+{
+ void *buf;
+
+ buf = data->buf;
+
+ data->next = rq->data_free;
+ rq->data_free = data;
+
+ return buf;
+}
+
+static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
+ void *buf,
+ struct virtnet_rq_dma *dma)
+{
+ struct virtnet_rq_data *data;
+
+ data = rq->data_free;
+ rq->data_free = data->next;
+
+ data->buf = buf;
+ data->dma = dma;
+
+ return data;
+}
+
+static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
+{
+ struct virtnet_rq_data *data;
+ void *buf;
+
+ buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
+ if (!buf || !rq->data_array)
+ return buf;
+
+ data = buf;
+
+ virtnet_rq_unmap(rq, data->dma);
+
+ return virtnet_rq_recycle_data(rq, data);
+}
+
+static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
+{
+ struct virtnet_rq_data *data;
+ void *buf;
+
+ buf = virtqueue_detach_unused_buf(rq->vq);
+ if (!buf || !rq->data_array)
+ return buf;
+
+ data = buf;
+
+ virtnet_rq_unmap(rq, data->dma);
+
+ return virtnet_rq_recycle_data(rq, data);
+}
+
+static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
+{
+ struct virtnet_rq_dma *dma = rq->last_dma;
+ struct device *dev;
+ u32 off, map_len;
+ dma_addr_t addr;
+ void *end;
+
+ if (likely(dma) && buf >= dma->buf && (buf + len <=
dma->buf + dma->len)) {
+ ++dma->ref;
+ addr = dma->addr + (buf - dma->buf);
+ goto ok;
+ }
+
+ end = buf + len - 1;
+ off = offset_in_page(end);
+ map_len = len + PAGE_SIZE - off;
+
+ dev = virtqueue_dma_dev(rq->vq);
+
+ addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
+ map_len, DMA_FROM_DEVICE, 0);
+ if (addr == DMA_MAPPING_ERROR)
+ return -ENOMEM;
+
+ dma = rq->dma_free;
+ rq->dma_free = dma->next;
+
+ dma->ref = 1;
+ dma->buf = buf;
+ dma->addr = addr;
+ dma->len = map_len;
+
+ rq->last_dma = dma;
+
+ok:
+ sg_init_table(rq->sg, 1);
+ rq->sg[0].dma_address = addr;
+ rq->sg[0].length = len;
+
+ return 0;
+}
+
+static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
+{
+ struct receive_queue *rq;
+ int i, err, j, num;
+
+ /* disable for big mode */
+ if (!vi->mergeable_rx_bufs && vi->big_packets)
+ return 0;
+
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ err = virtqueue_set_premapped(vi->rq[i].vq);
+ if (err)
+ continue;
+
+ rq = &vi->rq[i];
+
+ num = virtqueue_get_vring_size(rq->vq);
+
+ rq->data_array = kmalloc_array(num, sizeof(*rq->data_array),
GFP_KERNEL);
+ if (!rq->data_array)
+ goto err;
+
+ rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
+ if (!rq->dma_array)
+ goto err;
+
+ for (j = 0; j < num; ++j) {
+ rq->data_array[j].next = rq->data_free;
+ rq->data_free = &rq->data_array[j];
+
+ rq->dma_array[j].next = rq->dma_free;
+ rq->dma_free = &rq->dma_array[j];
+ }
+ }
+
+ return 0;
+
+err:
+ for (i = 0; i < vi->max_queue_pairs; i++) {
+ struct receive_queue *rq;
+
+ rq = &vi->rq[i];
+
+ kfree(rq->dma_array);
+ kfree(rq->data_array);
+ }
+
+ return -ENOMEM;
+}
+
static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
{
unsigned int len;
@@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue
*rq,
void *buf;
int off;
- buf = virtqueue_get_buf(rq->vq, &buflen);
+ buf = virtnet_rq_get_buf(rq, &buflen, NULL);
if (unlikely(!buf))
goto err_buf;
@@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device
*dev,
return -EINVAL;
while (--*num_buf > 0) {
- buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+ buf = virtnet_rq_get_buf(rq, &len, &ctx);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
dev->name, *num_buf,
@@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device
*dev,
while (--num_buf) {
int num_skb_frags;
- buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+ buf = virtnet_rq_get_buf(rq, &len, &ctx);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
dev->name, num_buf,
@@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device
*dev,
err_skb:
put_page(page);
while (num_buf-- > 1) {
- buf = virtqueue_get_buf(rq->vq, &len);
+ buf = virtnet_rq_get_buf(rq, &len, NULL);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers missing\n",
dev->name, num_buf);
@@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi,
struct receive_queue *rq,
unsigned int xdp_headroom = virtnet_get_headroom(vi);
void *ctx = (void *)(unsigned long)xdp_headroom;
int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
+ struct virtnet_rq_data *data;
int err;
len = SKB_DATA_ALIGN(len) +
@@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi,
struct receive_queue *rq,
buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
get_page(alloc_frag->page);
alloc_frag->offset += len;
- sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
- vi->hdr_len + GOOD_PACKET_LEN);
- err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+
+ if (rq->data_array) {
+ err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
+ vi->hdr_len + GOOD_PACKET_LEN);
+ if (err)
+ goto map_err;
+
+ data = virtnet_rq_get_data(rq, buf, rq->last_dma);
+ } else {
+ sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
+ vi->hdr_len + GOOD_PACKET_LEN);
+ data = (void *)buf;
+ }
+
+ err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
if (err < 0)
- put_page(virt_to_head_page(buf));
+ goto add_err;
+
+ return err;
+
+add_err:
+ if (rq->data_array) {
+ virtnet_rq_unmap(rq, data->dma);
+ virtnet_rq_recycle_data(rq, data);
+ }
+
+map_err:
+ put_page(virt_to_head_page(buf));
return err;
}
@@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
unsigned int headroom = virtnet_get_headroom(vi);
unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
+ struct virtnet_rq_data *data;
char *buf;
void *ctx;
int err;
@@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info
*vi,
alloc_frag->offset += hole;
}
- sg_init_one(rq->sg, buf, len);
+ if (rq->data_array) {
+ err = virtnet_rq_map_sg(rq, buf, len);
+ if (err)
+ goto map_err;
+
+ data = virtnet_rq_get_data(rq, buf, rq->last_dma);
+ } else {
+ sg_init_one(rq->sg, buf, len);
+ data = (void *)buf;
+ }
+
ctx = mergeable_len_to_ctx(len + room, headroom);
- err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+ err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
if (err < 0)
- put_page(virt_to_head_page(buf));
+ goto add_err;
+
+ return 0;
+
+add_err:
+ if (rq->data_array) {
+ virtnet_rq_unmap(rq, data->dma);
+ virtnet_rq_recycle_data(rq, data);
+ }
+map_err:
+ put_page(virt_to_head_page(buf));
return err;
}
@@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int
budget,
void *ctx;
while (stats.packets < budget &&
- (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
+ (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
stats.packets++;
}
} else {
while (stats.packets < budget &&
- (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+ (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
stats.packets++;
}
@@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
for (i = 0; i < vi->max_queue_pairs; i++) {
__netif_napi_del(&vi->rq[i].napi);
__netif_napi_del(&vi->sq[i].napi);
+
+ kfree(vi->rq[i].data_array);
+ kfree(vi->rq[i].dma_array);
}
/* We called __netif_napi_del(),
@@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
}
for (i = 0; i < vi->max_queue_pairs; i++) {
- struct virtqueue *vq = vi->rq[i].vq;
- while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
- virtnet_rq_free_unused_buf(vq, buf);
+ struct receive_queue *rq = &vi->rq[i];
+
+ while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
+ virtnet_rq_free_unused_buf(rq->vq, buf);
cond_resched();
}
}
@@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
if (ret)
goto err_free;
+ ret = virtnet_rq_merge_map_init(vi);
+ if (ret)
+ goto err_free;
+
cpus_read_lock();
virtnet_set_affinity(vi);
cpus_read_unlock();
--
2.32.0.3.g01195cf9f
Possibly Parallel Threads
- [PATCH vhost v13 00/12] virtio core prepares for AF_XDP
- [PATCH vhost v13 00/12] virtio core prepares for AF_XDP
- [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
- [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
- [PATCH vhost v9 00/12] virtio core prepares for AF_XDP