Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 00/10] Support kernel buffers in vhost
vhost currently expects that the virtqueues and the queued buffers are accessible from a userspace process' address space. However, when using vhost to communicate between two Linux systems running on two physical CPUs in an AMP configuration (on a single SoC or via something like PCIe), it is undesirable from a security perspective to make the entire kernel memory of the other Linux system accessible from userspace. To remedy this, this series adds support to vhost for placing the virtqueues and queued buffers in kernel memory. Since userspace should not be allowed to control the placement and attributes of these virtqueues, a mechanism to do this from kernel space is added. A vDPA-based test driver is added which uses this support to allow virtio-net and vhost-net to communicate with each other on the same system without exposing kernel memory to userspace via /dev/mem or similar. This vDPA-based test driver is intended to be used as the basis for the implementation of driver which will allow Linux-Linux communication between physical CPUs on SoCs using virtio and vhost, for instance by using information from the device tree to indicate the location of shared memory, and the mailbox API to trigger interrupts between the CPUs. This patchset is also available at: https://github.com/vwax/linux/tree/vhost/rfc Vincent Whitchurch (10): vhost: scsi: use copy_to_iter() vhost: push virtqueue area pointers into a user struct vhost: add iov wrapper vhost: add support for kernel buffers vhost: extract common code for file_operations handling vhost: extract ioctl locking to common code vhost: add support for kernel control vhost: net: add support for kernel control vdpa: add test driver for kernel buffers in vhost selftests: add vhost_kernel tests drivers/vdpa/Kconfig | 8 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vhost_kernel_test/Makefile | 2 + .../vhost_kernel_test/vhost_kernel_test.c | 575 ++++++++++++++++++ drivers/vhost/Kconfig | 6 + drivers/vhost/Makefile | 3 + drivers/vhost/common.c | 340 +++++++++++ drivers/vhost/net.c | 212 ++++--- drivers/vhost/scsi.c | 50 +- drivers/vhost/test.c | 2 +- drivers/vhost/vdpa.c | 6 +- drivers/vhost/vhost.c | 437 ++++++++++--- drivers/vhost/vhost.h | 109 +++- drivers/vhost/vsock.c | 95 +-- include/linux/vhost.h | 23 + tools/testing/selftests/Makefile | 1 + .../vhost_kernel/vhost_kernel_test.c | 287 +++++++++ .../vhost_kernel/vhost_kernel_test.sh | 125 ++++ 18 files changed, 2020 insertions(+), 262 deletions(-) create mode 100644 drivers/vdpa/vhost_kernel_test/Makefile create mode 100644 drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c create mode 100644 drivers/vhost/common.c create mode 100644 include/linux/vhost.h create mode 100644 tools/testing/selftests/vhost_kernel/vhost_kernel_test.c create mode 100755 tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 01/10] vhost: scsi: use copy_to_iter()
Use copy_to_iter() instead of __copy_to_user() when accessing the virtio buffers as a preparation for supporting kernel-space buffers in vhost. It also makes the code consistent since the driver is already using copy_to_iter() in the other places it accesses the queued buffers. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/scsi.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 532e204f2b1b..bcf53685439d 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -462,7 +462,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt) { struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq; struct virtio_scsi_event *event = &evt->event; - struct virtio_scsi_event __user *eventp; + struct iov_iter iov_iter; unsigned out, in; int head, ret; @@ -499,9 +499,10 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt) vs->vs_events_missed = false; } - eventp = vq->iov[out].iov_base; - ret = __copy_to_user(eventp, event, sizeof(*event)); - if (!ret) + iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(*event)); + + ret = copy_to_iter(event, sizeof(*event), &iov_iter); + if (ret == sizeof(*event)) vhost_add_used_and_signal(&vs->dev, vq, head, 0); else vq_err(vq, "Faulted on vhost_scsi_send_event\n"); @@ -802,17 +803,18 @@ static void vhost_scsi_target_queue_cmd(struct vhost_scsi_cmd *cmd) static void vhost_scsi_send_bad_target(struct vhost_scsi *vs, struct vhost_virtqueue *vq, - int head, unsigned out) + int head, unsigned out, unsigned in) { - struct virtio_scsi_cmd_resp __user *resp; struct virtio_scsi_cmd_resp rsp; + struct iov_iter iov_iter; int ret; + iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(rsp)); + memset(&rsp, 0, sizeof(rsp)); rsp.response = VIRTIO_SCSI_S_BAD_TARGET; - resp = vq->iov[out].iov_base; - ret = __copy_to_user(resp, &rsp, sizeof(rsp)); - if (!ret) + ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter); + if (ret == sizeof(rsp)) vhost_add_used_and_signal(&vs->dev, vq, head, 0); else pr_err("Faulted on virtio_scsi_cmd_resp\n"); @@ -1124,7 +1126,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) if (ret == -ENXIO) break; else if (ret == -EIO) - vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out); + vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out, vc.in); } while (likely(!vhost_exceeds_weight(vq, ++c, 0))); out: mutex_unlock(&vq->mutex); @@ -1347,7 +1349,7 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) if (ret == -ENXIO) break; else if (ret == -EIO) - vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out); + vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out, vc.in); } while (likely(!vhost_exceeds_weight(vq, ++c, 0))); out: mutex_unlock(&vq->mutex); -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 02/10] vhost: push virtqueue area pointers into a user struct
In order to prepare for allowing vhost to operate on kernel buffers, push the virtqueue desc/avail/used area pointers down to a new "user" struct. No functional change intended. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/vdpa.c | 6 +-- drivers/vhost/vhost.c | 90 +++++++++++++++++++++---------------------- drivers/vhost/vhost.h | 8 ++-- 3 files changed, 53 insertions(+), 51 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index f41d081777f5..6f05388f5a21 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -400,9 +400,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, switch (cmd) { case VHOST_SET_VRING_ADDR: if (ops->set_vq_address(vdpa, idx, - (u64)(uintptr_t)vq->desc, - (u64)(uintptr_t)vq->avail, - (u64)(uintptr_t)vq->used)) + (u64)(uintptr_t)vq->user.desc, + (u64)(uintptr_t)vq->user.avail, + (u64)(uintptr_t)vq->user.used)) r = -EINVAL; break; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 59edb5a1ffe2..108994f386f7 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -46,8 +46,8 @@ enum { VHOST_MEMORY_F_LOG = 0x1, }; -#define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num]) -#define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num]) +#define vhost_used_event(vq) ((__virtio16 __user *)&vq->user.avail->ring[vq->num]) +#define vhost_avail_event(vq) ((__virtio16 __user *)&vq->user.used->ring[vq->num]) #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) @@ -306,7 +306,7 @@ static void vhost_vring_call_reset(struct vhost_vring_call *call_ctx) bool vhost_vq_is_setup(struct vhost_virtqueue *vq) { - return vq->avail && vq->desc && vq->used && vhost_vq_access_ok(vq); + return vq->user.avail && vq->user.desc && vq->user.used && vhost_vq_access_ok(vq); } EXPORT_SYMBOL_GPL(vhost_vq_is_setup); @@ -314,9 +314,9 @@ static void vhost_vq_reset(struct vhost_dev *dev, struct vhost_virtqueue *vq) { vq->num = 1; - vq->desc = NULL; - vq->avail = NULL; - vq->used = NULL; + vq->user.desc = NULL; + vq->user.avail = NULL; + vq->user.used = NULL; vq->last_avail_idx = 0; vq->avail_idx = 0; vq->last_used_idx = 0; @@ -444,8 +444,8 @@ static size_t vhost_get_avail_size(struct vhost_virtqueue *vq, size_t event __maybe_unused vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; - return sizeof(*vq->avail) + - sizeof(*vq->avail->ring) * num + event; + return sizeof(*vq->user.avail) + + sizeof(*vq->user.avail->ring) * num + event; } static size_t vhost_get_used_size(struct vhost_virtqueue *vq, @@ -454,14 +454,14 @@ static size_t vhost_get_used_size(struct vhost_virtqueue *vq, size_t event __maybe_unused vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0; - return sizeof(*vq->used) + - sizeof(*vq->used->ring) * num + event; + return sizeof(*vq->user.used) + + sizeof(*vq->user.used->ring) * num + event; } static size_t vhost_get_desc_size(struct vhost_virtqueue *vq, unsigned int num) { - return sizeof(*vq->desc) * num; + return sizeof(*vq->user.desc) * num; } void vhost_dev_init(struct vhost_dev *dev, @@ -959,7 +959,7 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq, struct vring_used_elem *head, int idx, int count) { - return vhost_copy_to_user(vq, vq->used->ring + idx, head, + return vhost_copy_to_user(vq, vq->user.used->ring + idx, head, count * sizeof(*head)); } @@ -967,14 +967,14 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq) { return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags), - &vq->used->flags); + &vq->user.used->flags); } static inline int vhost_put_used_idx(struct vhost_virtqueue *vq) { return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), - &vq->used->idx); + &vq->user.used->idx); } #define vhost_get_user(vq, x, ptr, type) \ @@ -1018,20 +1018,20 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d) static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq, __virtio16 *idx) { - return vhost_get_avail(vq, *idx, &vq->avail->idx); + return vhost_get_avail(vq, *idx, &vq->user.avail->idx); } static inline int vhost_get_avail_head(struct vhost_virtqueue *vq, __virtio16 *head, int idx) { return vhost_get_avail(vq, *head, - &vq->avail->ring[idx & (vq->num - 1)]); + &vq->user.avail->ring[idx & (vq->num - 1)]); } static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq, __virtio16 *flags) { - return vhost_get_avail(vq, *flags, &vq->avail->flags); + return vhost_get_avail(vq, *flags, &vq->user.avail->flags); } static inline int vhost_get_used_event(struct vhost_virtqueue *vq, @@ -1043,13 +1043,13 @@ static inline int vhost_get_used_event(struct vhost_virtqueue *vq, static inline int vhost_get_used_idx(struct vhost_virtqueue *vq, __virtio16 *idx) { - return vhost_get_used(vq, *idx, &vq->used->idx); + return vhost_get_used(vq, *idx, &vq->user.used->idx); } static inline int vhost_get_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, int idx) { - return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc)); + return vhost_copy_from_user(vq, desc, vq->user.desc + idx, sizeof(*desc)); } static void vhost_iotlb_notify_vq(struct vhost_dev *d, @@ -1363,12 +1363,12 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq) if (!vq->iotlb) return 1; - return iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->desc, + return iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->user.desc, vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) && - iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->avail, + iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->user.avail, vhost_get_avail_size(vq, num), VHOST_ADDR_AVAIL) && - iotlb_access_ok(vq, VHOST_MAP_WO, (u64)(uintptr_t)vq->used, + iotlb_access_ok(vq, VHOST_MAP_WO, (u64)(uintptr_t)vq->user.used, vhost_get_used_size(vq, num), VHOST_ADDR_USED); } EXPORT_SYMBOL_GPL(vq_meta_prefetch); @@ -1412,7 +1412,7 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq) if (!vq_log_access_ok(vq, vq->log_base)) return false; - return vq_access_ok(vq, vq->num, vq->desc, vq->avail, vq->used); + return vq_access_ok(vq, vq->num, vq->user.desc, vq->user.avail, vq->user.used); } EXPORT_SYMBOL_GPL(vhost_vq_access_ok); @@ -1523,8 +1523,8 @@ static long vhost_vring_set_addr(struct vhost_dev *d, return -EFAULT; /* Make sure it's safe to cast pointers to vring types. */ - BUILD_BUG_ON(__alignof__ *vq->avail > VRING_AVAIL_ALIGN_SIZE); - BUILD_BUG_ON(__alignof__ *vq->used > VRING_USED_ALIGN_SIZE); + BUILD_BUG_ON(__alignof__ *vq->user.avail > VRING_AVAIL_ALIGN_SIZE); + BUILD_BUG_ON(__alignof__ *vq->user.used > VRING_USED_ALIGN_SIZE); if ((a.avail_user_addr & (VRING_AVAIL_ALIGN_SIZE - 1)) || (a.used_user_addr & (VRING_USED_ALIGN_SIZE - 1)) || (a.log_guest_addr & (VRING_USED_ALIGN_SIZE - 1))) @@ -1548,10 +1548,10 @@ static long vhost_vring_set_addr(struct vhost_dev *d, } vq->log_used = !!(a.flags & (0x1 << VHOST_VRING_F_LOG)); - vq->desc = (void __user *)(unsigned long)a.desc_user_addr; - vq->avail = (void __user *)(unsigned long)a.avail_user_addr; + vq->user.desc = (void __user *)(unsigned long)a.desc_user_addr; + vq->user.avail = (void __user *)(unsigned long)a.avail_user_addr; vq->log_addr = a.log_guest_addr; - vq->used = (void __user *)(unsigned long)a.used_user_addr; + vq->user.used = (void __user *)(unsigned long)a.used_user_addr; return 0; } @@ -1912,8 +1912,8 @@ static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len) if (!vq->iotlb) return log_write(vq->log_base, vq->log_addr + used_offset, len); - ret = translate_desc(vq, (uintptr_t)vq->used + used_offset, - len, iov, 64, VHOST_ACCESS_WO); + ret = translate_desc(vq, (uintptr_t)vq->user.used + used_offset, + len, vq->log_iov, 64, VHOST_ACCESS_WO); if (ret < 0) return ret; @@ -1972,9 +1972,9 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq) /* Make sure the flag is seen before log. */ smp_wmb(); /* Log used flag write. */ - used = &vq->used->flags; - log_used(vq, (used - (void __user *)vq->used), - sizeof vq->used->flags); + used = &vq->user.used->flags; + log_used(vq, (used - (void __user *)vq->user.used), + sizeof vq->user.used->flags); if (vq->log_ctx) eventfd_signal(vq->log_ctx, 1); } @@ -1991,7 +1991,7 @@ static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event) smp_wmb(); /* Log avail event write */ used = vhost_avail_event(vq); - log_used(vq, (used - (void __user *)vq->used), + log_used(vq, (used - (void __user *)vq->user.used), sizeof *vhost_avail_event(vq)); if (vq->log_ctx) eventfd_signal(vq->log_ctx, 1); @@ -2015,14 +2015,14 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) goto err; vq->signalled_used_valid = false; if (!vq->iotlb && - !access_ok(&vq->used->idx, sizeof vq->used->idx)) { + !access_ok(&vq->user.used->idx, sizeof vq->user.used->idx)) { r = -EFAULT; goto err; } r = vhost_get_used_idx(vq, &last_used_idx); if (r) { vq_err(vq, "Can't access used idx at %p\n", - &vq->used->idx); + &vq->user.used->idx); goto err; } vq->last_used_idx = vhost16_to_cpu(vq, last_used_idx); @@ -2214,7 +2214,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, if (vq->avail_idx == vq->last_avail_idx) { if (unlikely(vhost_get_avail_idx(vq, &avail_idx))) { vq_err(vq, "Failed to access avail idx at %p\n", - &vq->avail->idx); + &vq->user.avail->idx); return -EFAULT; } vq->avail_idx = vhost16_to_cpu(vq, avail_idx); @@ -2242,7 +2242,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, if (unlikely(vhost_get_avail_head(vq, &ring_head, last_avail_idx))) { vq_err(vq, "Failed to read head: idx %d address %p\n", last_avail_idx, - &vq->avail->ring[last_avail_idx % vq->num]); + &vq->user.avail->ring[last_avail_idx % vq->num]); return -EFAULT; } @@ -2277,7 +2277,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, ret = vhost_get_desc(vq, &desc, i); if (unlikely(ret)) { vq_err(vq, "Failed to get descriptor: idx %d addr %p\n", - i, vq->desc + i); + i, vq->user.desc + i); return -EFAULT; } if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) { @@ -2366,7 +2366,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, int start; start = vq->last_used_idx & (vq->num - 1); - used = vq->used->ring + start; + used = vq->user.used->ring + start; if (vhost_put_used(vq, heads, start, count)) { vq_err(vq, "Failed to write used"); return -EFAULT; @@ -2375,7 +2375,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq, /* Make sure data is seen before log. */ smp_wmb(); /* Log used ring entry write. */ - log_used(vq, ((void __user *)used - (void __user *)vq->used), + log_used(vq, ((void __user *)used - (void __user *)vq->user.used), count * sizeof *used); } old = vq->last_used_idx; @@ -2418,7 +2418,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads, smp_wmb(); /* Log used index update. */ log_used(vq, offsetof(struct vring_used, idx), - sizeof vq->used->idx); + sizeof vq->user.used->idx); if (vq->log_ctx) eventfd_signal(vq->log_ctx, 1); } @@ -2523,7 +2523,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) r = vhost_update_used_flags(vq); if (r) { vq_err(vq, "Failed to enable notification at %p: %d\n", - &vq->used->flags, r); + &vq->user.used->flags, r); return false; } } else { @@ -2540,7 +2540,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) r = vhost_get_avail_idx(vq, &avail_idx); if (r) { vq_err(vq, "Failed to check avail idx at %p: %d\n", - &vq->avail->idx, r); + &vq->user.avail->idx, r); return false; } @@ -2560,7 +2560,7 @@ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq) r = vhost_update_used_flags(vq); if (r) vq_err(vq, "Failed to disable notification at %p: %d\n", - &vq->used->flags, r); + &vq->user.used->flags, r); } } EXPORT_SYMBOL_GPL(vhost_disable_notify); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 638bb640d6b4..b1db4ffe75f0 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -72,9 +72,11 @@ struct vhost_virtqueue { /* The actual ring of buffers. */ struct mutex mutex; unsigned int num; - vring_desc_t __user *desc; - vring_avail_t __user *avail; - vring_used_t __user *used; + struct { + vring_desc_t __user *desc; + vring_avail_t __user *avail; + vring_used_t __user *used; + } user; const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; struct vhost_vring_call call_ctx; -- 2.28.0
In order to prepare for supporting buffers in kernel space, add a vhost_iov struct to wrap the userspace iovec, add helper functions for accessing this struct, and use these helpers from all vhost drivers. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/net.c | 13 ++++++------ drivers/vhost/scsi.c | 30 +++++++++++++-------------- drivers/vhost/test.c | 2 +- drivers/vhost/vhost.c | 25 +++++++++++----------- drivers/vhost/vhost.h | 48 +++++++++++++++++++++++++++++++++++++------ drivers/vhost/vsock.c | 8 ++++---- 6 files changed, 81 insertions(+), 45 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 28ef323882fb..8f82b646d4af 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -607,9 +607,9 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter, size_t hdr_size, int out) { /* Skip header. TODO: support TSO. */ - size_t len = iov_length(vq->iov, out); + size_t len = vhost_iov_length(vq, vq->iov, out); - iov_iter_init(iter, WRITE, vq->iov, out, len); + vhost_iov_iter_init(vq, iter, WRITE, vq->iov, out, len); iov_iter_advance(iter, hdr_size); return iov_iter_count(iter); @@ -1080,7 +1080,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq, log += *log_num; } heads[headcount].id = cpu_to_vhost32(vq, d); - len = iov_length(vq->iov + seg, in); + len = vhost_iov_length(vq, vq->iov + seg, in); heads[headcount].len = cpu_to_vhost32(vq, len); datalen -= len; ++headcount; @@ -1182,14 +1182,14 @@ static void handle_rx(struct vhost_net *net) msg.msg_control = vhost_net_buf_consume(&nvq->rxq); /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { - iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); + vhost_iov_iter_init(vq, &msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } /* We don't need to be notified again. */ - iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len); + vhost_iov_iter_init(vq, &msg.msg_iter, READ, vq->iov, in, vhost_len); fixup = msg.msg_iter; if (unlikely((vhost_hlen))) { /* We will supply the header ourselves @@ -1212,8 +1212,7 @@ static void handle_rx(struct vhost_net *net) if (unlikely(vhost_hlen)) { if (copy_to_iter(&hdr, sizeof(hdr), &fixup) != sizeof(hdr)) { - vq_err(vq, "Unable to write vnet_hdr " - "at addr %p\n", vq->iov->iov_base); + vq_err(vq, "Unable to write vnet_hdr"); goto out; } } else { diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index bcf53685439d..22a372b52165 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -80,7 +80,7 @@ struct vhost_scsi_cmd { struct scatterlist *tvc_prot_sgl; struct page **tvc_upages; /* Pointer to response header iovec */ - struct iovec tvc_resp_iov; + struct vhost_iov tvc_resp_iov; /* Pointer to vhost_scsi for our device */ struct vhost_scsi *tvc_vhost; /* Pointer to vhost_virtqueue for the cmd */ @@ -208,7 +208,7 @@ struct vhost_scsi_tmf { struct se_cmd se_cmd; u8 scsi_resp; struct vhost_scsi_inflight *inflight; - struct iovec resp_iov; + struct vhost_iov resp_iov; int in_iovs; int vq_desc; }; @@ -487,9 +487,9 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt) return; } - if ((vq->iov[out].iov_len != sizeof(struct virtio_scsi_event))) { + if (vhost_iov_len(vq, &vq->iov[out]) != sizeof(struct virtio_scsi_event)) { vq_err(vq, "Expecting virtio_scsi_event, got %zu bytes\n", - vq->iov[out].iov_len); + vhost_iov_len(vq, &vq->iov[out])); vs->vs_events_missed = true; return; } @@ -499,7 +499,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt) vs->vs_events_missed = false; } - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(*event)); + vhost_iov_iter_init(vq, &iov_iter, READ, &vq->iov[out], in, sizeof(*event)); ret = copy_to_iter(event, sizeof(*event), &iov_iter); if (ret == sizeof(*event)) @@ -559,8 +559,8 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) memcpy(v_rsp.sense, cmd->tvc_sense_buf, se_cmd->scsi_sense_length); - iov_iter_init(&iov_iter, READ, &cmd->tvc_resp_iov, - cmd->tvc_in_iovs, sizeof(v_rsp)); + vhost_iov_iter_init(&vs->vqs[0].vq, &iov_iter, READ, &cmd->tvc_resp_iov, + cmd->tvc_in_iovs, sizeof(v_rsp)); ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter); if (likely(ret == sizeof(v_rsp))) { struct vhost_scsi_virtqueue *q; @@ -809,7 +809,7 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs, struct iov_iter iov_iter; int ret; - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, sizeof(rsp)); + vhost_iov_iter_init(vq, &iov_iter, READ, &vq->iov[out], in, sizeof(rsp)); memset(&rsp, 0, sizeof(rsp)); rsp.response = VIRTIO_SCSI_S_BAD_TARGET; @@ -850,8 +850,8 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq, * Get the size of request and response buffers. * FIXME: Not correct for BIDI operation */ - vc->out_size = iov_length(vq->iov, vc->out); - vc->in_size = iov_length(&vq->iov[vc->out], vc->in); + vc->out_size = vhost_iov_length(vq, vq->iov, vc->out); + vc->in_size = vhost_iov_length(vq, &vq->iov[vc->out], vc->in); /* * Copy over the virtio-scsi request header, which for a @@ -863,7 +863,7 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq, * point at the start of the outgoing WRITE payload, if * DMA_TO_DEVICE is set. */ - iov_iter_init(&vc->out_iter, WRITE, vq->iov, vc->out, vc->out_size); + vhost_iov_iter_init(vq, &vc->out_iter, WRITE, vq->iov, vc->out, vc->out_size); ret = 0; done: @@ -1015,7 +1015,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) data_direction = DMA_FROM_DEVICE; exp_data_len = vc.in_size - vc.rsp_size; - iov_iter_init(&in_iter, READ, &vq->iov[vc.out], vc.in, + vhost_iov_iter_init(vq, &in_iter, READ, &vq->iov[vc.out], vc.in, vc.rsp_size + exp_data_len); iov_iter_advance(&in_iter, vc.rsp_size); data_iter = in_iter; @@ -1134,7 +1134,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) static void vhost_scsi_send_tmf_resp(struct vhost_scsi *vs, struct vhost_virtqueue *vq, - int in_iovs, int vq_desc, struct iovec *resp_iov, + int in_iovs, int vq_desc, struct vhost_iov *resp_iov, int tmf_resp_code) { struct virtio_scsi_ctrl_tmf_resp rsp; @@ -1145,7 +1145,7 @@ vhost_scsi_send_tmf_resp(struct vhost_scsi *vs, struct vhost_virtqueue *vq, memset(&rsp, 0, sizeof(rsp)); rsp.response = tmf_resp_code; - iov_iter_init(&iov_iter, READ, resp_iov, in_iovs, sizeof(rsp)); + vhost_iov_iter_init(vq, &iov_iter, READ, resp_iov, in_iovs, sizeof(rsp)); ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter); if (likely(ret == sizeof(rsp))) @@ -1237,7 +1237,7 @@ vhost_scsi_send_an_resp(struct vhost_scsi *vs, memset(&rsp, 0, sizeof(rsp)); /* event_actual = 0 */ rsp.response = VIRTIO_SCSI_S_OK; - iov_iter_init(&iov_iter, READ, &vq->iov[vc->out], vc->in, sizeof(rsp)); + vhost_iov_iter_init(vq, &iov_iter, READ, &vq->iov[vc->out], vc->in, sizeof(rsp)); ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter); if (likely(ret == sizeof(rsp))) diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index a09dedc79f68..95794b0ea4ad 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -78,7 +78,7 @@ static void handle_vq(struct vhost_test *n) "out %d, int %d\n", out, in); break; } - len = iov_length(vq->iov, out); + len = vhost_iov_length(vq, vq->iov, out); /* Sanity check */ if (!len) { vq_err(vq, "Unexpected 0 len for TX\n"); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 108994f386f7..ce81eee2a3fa 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -812,7 +812,7 @@ static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem, } static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, - struct iovec iov[], int iov_size, int access); + struct vhost_iov iov[], int iov_size, int access); static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to, const void *from, unsigned size) @@ -840,7 +840,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to, VHOST_ACCESS_WO); if (ret < 0) goto out; - iov_iter_init(&t, WRITE, vq->iotlb_iov, ret, size); + iov_iter_init(&t, WRITE, &vq->iotlb_iov->iovec, ret, size); ret = copy_to_iter(from, size, &t); if (ret == size) ret = 0; @@ -879,7 +879,7 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to, (unsigned long long) size); goto out; } - iov_iter_init(&f, READ, vq->iotlb_iov, ret, size); + iov_iter_init(&f, READ, &vq->iotlb_iov->iovec, ret, size); ret = copy_from_iter(to, size, &f); if (ret == size) ret = 0; @@ -905,14 +905,14 @@ static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq, return NULL; } - if (ret != 1 || vq->iotlb_iov[0].iov_len != size) { + if (ret != 1 || vq->iotlb_iov->iovec.iov_len != size) { vq_err(vq, "Non atomic userspace memory access: uaddr " "%p size 0x%llx\n", addr, (unsigned long long) size); return NULL; } - return vq->iotlb_iov[0].iov_base; + return vq->iotlb_iov->iovec.iov_base; } /* This function should be called after iotlb @@ -1906,7 +1906,7 @@ static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len) static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len) { - struct iovec *iov = vq->log_iov; + struct iovec *iov = &vq->log_iov->iovec; int i, ret; if (!vq->iotlb) @@ -1928,8 +1928,9 @@ static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len) } int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, - unsigned int log_num, u64 len, struct iovec *iov, int count) + unsigned int log_num, u64 len, struct vhost_iov *viov, int count) { + struct iovec *iov = &viov->iovec; int i, r; /* Make sure data written is seen before log. */ @@ -2035,7 +2036,7 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq) EXPORT_SYMBOL_GPL(vhost_vq_init_access); static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, - struct iovec iov[], int iov_size, int access) + struct vhost_iov iov[], int iov_size, int access) { const struct vhost_iotlb_map *map; struct vhost_dev *dev = vq->dev; @@ -2064,7 +2065,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, break; } - _iov = iov + ret; + _iov = &iov->iovec + ret; size = map->size - addr + map->start; _iov->iov_len = min((u64)len - s, size); _iov->iov_base = (void __user *)(unsigned long) @@ -2096,7 +2097,7 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc) } static int get_indirect(struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, + struct vhost_iov iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num, struct vhost_log *log, unsigned int *log_num, struct vring_desc *indirect) @@ -2123,7 +2124,7 @@ static int get_indirect(struct vhost_virtqueue *vq, vq_err(vq, "Translation failure %d in indirect.\n", ret); return ret; } - iov_iter_init(&from, READ, vq->indirect, ret, len); + vhost_iov_iter_init(vq, &from, READ, vq->indirect, ret, len); count = len / sizeof desc; /* Buffers are chained via a 16 bit next field, so * we can have at most 2^16 of these. */ @@ -2197,7 +2198,7 @@ static int get_indirect(struct vhost_virtqueue *vq, * never a valid descriptor number) if none was found. A negative code is * returned on error. */ int vhost_get_vq_desc(struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, + struct vhost_iov iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num, struct vhost_log *log, unsigned int *log_num) { diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index b1db4ffe75f0..69aec724ef7f 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -65,6 +65,12 @@ struct vhost_vring_call { struct irq_bypass_producer producer; }; +struct vhost_iov { + union { + struct iovec iovec; + }; +}; + /* The virtqueue structure describes a queue attached to a device. */ struct vhost_virtqueue { struct vhost_dev *dev; @@ -110,9 +116,9 @@ struct vhost_virtqueue { bool log_used; u64 log_addr; - struct iovec iov[UIO_MAXIOV]; - struct iovec iotlb_iov[64]; - struct iovec *indirect; + struct vhost_iov iov[UIO_MAXIOV]; + struct vhost_iov iotlb_iov[64]; + struct vhost_iov *indirect; struct vring_used_elem *heads; /* Protected by virtqueue mutex. */ struct vhost_iotlb *umem; @@ -123,7 +129,7 @@ struct vhost_virtqueue { /* Log write descriptors */ void __user *log_base; struct vhost_log *log; - struct iovec log_iov[64]; + struct vhost_iov log_iov[64]; /* Ring endianness. Defaults to legacy native endianness. * Set to true when starting a modern virtio device. */ @@ -167,6 +173,26 @@ struct vhost_dev { struct vhost_iotlb_msg *msg); }; +static inline size_t vhost_iov_length(const struct vhost_virtqueue *vq, struct vhost_iov *iov, + unsigned long nr_segs) +{ + return iov_length(&iov->iovec, nr_segs); +} + +static inline size_t vhost_iov_len(const struct vhost_virtqueue *vq, struct vhost_iov *iov) +{ + return iov->iovec.iov_len; +} + +static inline void vhost_iov_iter_init(const struct vhost_virtqueue *vq, + struct iov_iter *i, unsigned int direction, + struct vhost_iov *iov, + unsigned long nr_segs, + size_t count) +{ + iov_iter_init(i, direction, &iov->iovec, nr_segs, count); +} + bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, int nvqs, int iov_limit, int weight, int byte_weight, @@ -186,9 +212,19 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq); bool vhost_log_access_ok(struct vhost_dev *); int vhost_get_vq_desc(struct vhost_virtqueue *, - struct iovec iov[], unsigned int iov_count, + struct vhost_iov iov[], unsigned int iov_count, unsigned int *out_num, unsigned int *in_num, struct vhost_log *log, unsigned int *log_num); + +int vhost_get_vq_desc_viov(struct vhost_virtqueue *vq, + struct vhost_iov *viov, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num); +int vhost_get_vq_desc_viov_offset(struct vhost_virtqueue *vq, + struct vhost_iov *viov, + int offset, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num); void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); bool vhost_vq_is_setup(struct vhost_virtqueue *vq); @@ -207,7 +243,7 @@ bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *); int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, unsigned int log_num, u64 len, - struct iovec *iov, int count); + struct vhost_iov *viov, int count); int vq_meta_prefetch(struct vhost_virtqueue *vq); struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 938aefbc75ec..190e5a6ea045 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -158,14 +158,14 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, break; } - iov_len = iov_length(&vq->iov[out], in); + iov_len = vhost_iov_length(vq, &vq->iov[out], in); if (iov_len < sizeof(pkt->hdr)) { virtio_transport_free_pkt(pkt); vq_err(vq, "Buffer len [%zu] too small\n", iov_len); break; } - iov_iter_init(&iov_iter, READ, &vq->iov[out], in, iov_len); + vhost_iov_iter_init(vq, &iov_iter, READ, &vq->iov[out], in, iov_len); payload_len = pkt->len - pkt->off; /* If the packet is greater than the space available in the @@ -370,8 +370,8 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq, if (!pkt) return NULL; - len = iov_length(vq->iov, out); - iov_iter_init(&iov_iter, WRITE, vq->iov, out, len); + len = vhost_iov_length(vq, vq->iov, out); + vhost_iov_iter_init(vq, &iov_iter, WRITE, vq->iov, out, len); nbytes = copy_from_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter); if (nbytes != sizeof(pkt->hdr)) { -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 04/10] vhost: add support for kernel buffers
Handle the virtio rings and buffers being placed in kernel memory instead of userspace memory. The software IOTLB support is used to ensure that only permitted regions are accessed. Note that this patch does not provide a way to actually request that kernel memory be used, an API for that is added in a later patch. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/Kconfig | 6 ++ drivers/vhost/vhost.c | 222 +++++++++++++++++++++++++++++++++++++++++- drivers/vhost/vhost.h | 34 +++++++ 3 files changed, 257 insertions(+), 5 deletions(-) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 587fbae06182..9e76ed485fe1 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -20,6 +20,12 @@ config VHOST This option is selected by any driver which needs to access the core of vhost. +config VHOST_KERNEL + tristate + help + This option is selected by any driver which needs to access the + support for kernel buffers in vhost. + menuconfig VHOST_MENU bool "VHOST drivers" default y diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index ce81eee2a3fa..9354061ce75e 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -49,6 +49,9 @@ enum { #define vhost_used_event(vq) ((__virtio16 __user *)&vq->user.avail->ring[vq->num]) #define vhost_avail_event(vq) ((__virtio16 __user *)&vq->user.used->ring[vq->num]) +#define vhost_used_event_kern(vq) ((__virtio16 *)&vq->kern.avail->ring[vq->num]) +#define vhost_avail_event_kern(vq) ((__virtio16 *)&vq->kern.used->ring[vq->num]) + #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static void vhost_disable_cross_endian(struct vhost_virtqueue *vq) { @@ -482,6 +485,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->iotlb = NULL; dev->mm = NULL; dev->worker = NULL; + dev->kernel = false; dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -785,6 +789,18 @@ static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq, return (void __user *)(uintptr_t)(map->addr + addr - map->start); } +static inline void *vhost_vq_meta_fetch_kern(struct vhost_virtqueue *vq, + u64 addr, unsigned int size, + int type) +{ + const struct vhost_iotlb_map *map = vq->meta_iotlb[type]; + + if (!map) + return NULL; + + return (void *)(uintptr_t)(map->addr + addr - map->start); +} + /* Can we switch to this memory table? */ /* Caller should have device mutex but not vq mutex */ static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem, @@ -849,6 +865,40 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to, return ret; } +static int vhost_copy_to_kern(struct vhost_virtqueue *vq, void *to, + const void *from, unsigned int size) +{ + int ret; + + /* This function should be called after iotlb + * prefetch, which means we're sure that all vq + * could be access through iotlb. So -EAGAIN should + * not happen in this case. + */ + struct iov_iter t; + void *kaddr = vhost_vq_meta_fetch_kern(vq, + (u64)(uintptr_t)to, size, + VHOST_ADDR_USED); + + if (kaddr) { + memcpy(kaddr, from, size); + return 0; + } + + ret = translate_desc(vq, (u64)(uintptr_t)to, size, vq->iotlb_iov, + ARRAY_SIZE(vq->iotlb_iov), + VHOST_ACCESS_WO); + if (ret < 0) + goto out; + iov_iter_kvec(&t, WRITE, &vq->iotlb_iov->kvec, ret, size); + ret = copy_to_iter(from, size, &t); + if (ret == size) + ret = 0; + +out: + return ret; +} + static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to, void __user *from, unsigned size) { @@ -889,6 +939,43 @@ static int vhost_copy_from_user(struct vhost_virtqueue *vq, void *to, return ret; } +static int vhost_copy_from_kern(struct vhost_virtqueue *vq, void *to, + void *from, unsigned int size) +{ + int ret; + + /* This function should be called after iotlb + * prefetch, which means we're sure that vq + * could be access through iotlb. So -EAGAIN should + * not happen in this case. + */ + void *kaddr = vhost_vq_meta_fetch_kern(vq, + (u64)(uintptr_t)from, size, + VHOST_ADDR_DESC); + struct iov_iter f; + + if (kaddr) { + memcpy(to, kaddr, size); + return 0; + } + + ret = translate_desc(vq, (u64)(uintptr_t)from, size, vq->iotlb_iov, + ARRAY_SIZE(vq->iotlb_iov), + VHOST_ACCESS_RO); + if (ret < 0) { + vq_err(vq, "IOTLB translation failure: kaddr %p size 0x%llx\n", + from, (unsigned long long) size); + goto out; + } + iov_iter_kvec(&f, READ, &vq->iotlb_iov->kvec, ret, size); + ret = copy_from_iter(to, size, &f); + if (ret == size) + ret = 0; + +out: + return ret; +} + static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq, void __user *addr, unsigned int size, int type) @@ -915,6 +1002,33 @@ static void __user *__vhost_get_user_slow(struct vhost_virtqueue *vq, return vq->iotlb_iov->iovec.iov_base; } +static void *__vhost_get_kern_slow(struct vhost_virtqueue *vq, + void *addr, unsigned int size, + int type) +{ + void *out; + int ret; + + ret = translate_desc(vq, (u64)(uintptr_t)addr, size, vq->iotlb_iov, + ARRAY_SIZE(vq->iotlb_iov), + VHOST_ACCESS_RO); + if (ret < 0) { + vq_err(vq, "IOTLB translation failure: addr %p size 0x%llx\n", + addr, (unsigned long long) size); + return NULL; + } + + if (ret != 1 || vq->iotlb_iov->kvec.iov_len != size) { + vq_err(vq, "Non atomic memory access: addr %p size 0x%llx\n", + addr, (unsigned long long) size); + return NULL; + } + + out = vq->iotlb_iov->kvec.iov_base; + + return out; +} + /* This function should be called after iotlb * prefetch, which means we're sure that vq * could be access through iotlb. So -EAGAIN should @@ -932,6 +1046,18 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, return __vhost_get_user_slow(vq, addr, size, type); } +static inline void *__vhost_get_kern(struct vhost_virtqueue *vq, + void *addr, unsigned int size, + int type) +{ + void *uaddr = vhost_vq_meta_fetch_kern(vq, (u64)(uintptr_t)addr, size, type); + + if (uaddr) + return uaddr; + + return __vhost_get_kern_slow(vq, addr, size, type); +} + #define vhost_put_user(vq, x, ptr) \ ({ \ int ret; \ @@ -949,8 +1075,25 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq, ret; \ }) +#define vhost_put_kern(vq, x, ptr) \ +({ \ + int ret = 0; \ + __typeof__(ptr) to = \ + (__typeof__(ptr)) __vhost_get_kern(vq, ptr, \ + sizeof(*ptr), VHOST_ADDR_USED); \ + if (to != NULL) \ + *to = x; \ + else \ + ret = -EFAULT; \ + ret; \ +}) + static inline int vhost_put_avail_event(struct vhost_virtqueue *vq) { + if (vhost_kernel(vq)) + return vhost_put_kern(vq, cpu_to_vhost16(vq, vq->avail_idx), + vhost_avail_event_kern(vq)); + return vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx), vhost_avail_event(vq)); } @@ -959,6 +1102,10 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq, struct vring_used_elem *head, int idx, int count) { + if (vhost_kernel(vq)) + return vhost_copy_to_kern(vq, vq->kern.used->ring + idx, head, + count * sizeof(*head)); + return vhost_copy_to_user(vq, vq->user.used->ring + idx, head, count * sizeof(*head)); } @@ -966,6 +1113,10 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq, static inline int vhost_put_used_flags(struct vhost_virtqueue *vq) { + if (vhost_kernel(vq)) + return vhost_put_kern(vq, cpu_to_vhost16(vq, vq->used_flags), + &vq->kern.used->flags); + return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags), &vq->user.used->flags); } @@ -973,6 +1124,10 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq) static inline int vhost_put_used_idx(struct vhost_virtqueue *vq) { + if (vhost_kernel(vq)) + return vhost_put_kern(vq, cpu_to_vhost16(vq, vq->last_used_idx), + &vq->kern.used->idx); + return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx), &vq->user.used->idx); } @@ -995,12 +1150,32 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq) ret; \ }) +#define vhost_get_kern(vq, x, ptr, type) \ +({ \ + int ret = 0; \ + __typeof__(ptr) from = \ + (__typeof__(ptr)) __vhost_get_kern(vq, ptr, \ + sizeof(*ptr), \ + type); \ + if (from != NULL) \ + x = *from; \ + else \ + ret = -EFAULT; \ + ret; \ +}) + #define vhost_get_avail(vq, x, ptr) \ vhost_get_user(vq, x, ptr, VHOST_ADDR_AVAIL) #define vhost_get_used(vq, x, ptr) \ vhost_get_user(vq, x, ptr, VHOST_ADDR_USED) +#define vhost_get_avail_kern(vq, x, ptr) \ + vhost_get_kern(vq, x, ptr, VHOST_ADDR_AVAIL) + +#define vhost_get_used_kern(vq, x, ptr) \ + vhost_get_kern(vq, x, ptr, VHOST_ADDR_USED) + static void vhost_dev_lock_vqs(struct vhost_dev *d) { int i = 0; @@ -1018,12 +1193,19 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d) static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq, __virtio16 *idx) { + if (vhost_kernel(vq)) + return vhost_get_avail_kern(vq, *idx, &vq->kern.avail->idx); + return vhost_get_avail(vq, *idx, &vq->user.avail->idx); } static inline int vhost_get_avail_head(struct vhost_virtqueue *vq, __virtio16 *head, int idx) { + if (vhost_kernel(vq)) + return vhost_get_avail_kern(vq, *head, + &vq->kern.avail->ring[idx & (vq->num - 1)]); + return vhost_get_avail(vq, *head, &vq->user.avail->ring[idx & (vq->num - 1)]); } @@ -1031,24 +1213,36 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq, static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq, __virtio16 *flags) { + if (vhost_kernel(vq)) + return vhost_get_avail_kern(vq, *flags, &vq->kern.avail->flags); + return vhost_get_avail(vq, *flags, &vq->user.avail->flags); } static inline int vhost_get_used_event(struct vhost_virtqueue *vq, __virtio16 *event) { + if (vhost_kernel(vq)) + return vhost_get_avail_kern(vq, *event, vhost_used_event_kern(vq)); + return vhost_get_avail(vq, *event, vhost_used_event(vq)); } static inline int vhost_get_used_idx(struct vhost_virtqueue *vq, __virtio16 *idx) { + if (vhost_kernel(vq)) + return vhost_get_used_kern(vq, *idx, &vq->kern.used->idx); + return vhost_get_used(vq, *idx, &vq->user.used->idx); } static inline int vhost_get_desc(struct vhost_virtqueue *vq, struct vring_desc *desc, int idx) { + if (vhost_kernel(vq)) + return vhost_copy_from_kern(vq, desc, vq->kern.desc + idx, sizeof(*desc)); + return vhost_copy_from_user(vq, desc, vq->user.desc + idx, sizeof(*desc)); } @@ -1909,6 +2103,9 @@ static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len) struct iovec *iov = &vq->log_iov->iovec; int i, ret; + if (vhost_kernel(vq)) + return 0; + if (!vq->iotlb) return log_write(vq->log_base, vq->log_addr + used_offset, len); @@ -1933,6 +2130,9 @@ int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log, struct iovec *iov = &viov->iovec; int i, r; + if (vhost_kernel(vq)) + return 0; + /* Make sure data written is seen before log. */ smp_wmb(); @@ -2041,11 +2241,11 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, const struct vhost_iotlb_map *map; struct vhost_dev *dev = vq->dev; struct vhost_iotlb *umem = dev->iotlb ? dev->iotlb : dev->umem; - struct iovec *_iov; u64 s = 0; int ret = 0; while ((u64)len > s) { + u64 mappedaddr; u64 size; if (unlikely(ret >= iov_size)) { ret = -ENOBUFS; @@ -2065,11 +2265,23 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, break; } - _iov = &iov->iovec + ret; size = map->size - addr + map->start; - _iov->iov_len = min((u64)len - s, size); - _iov->iov_base = (void __user *)(unsigned long) - (map->addr + addr - map->start); + mappedaddr = map->addr + addr - map->start; + + if (vhost_kernel(vq)) { + struct kvec *_kvec; + + _kvec = &iov->kvec + ret; + _kvec->iov_len = min((u64)len - s, size); + _kvec->iov_base = (void *)(unsigned long)mappedaddr; + } else { + struct iovec *_iov; + + _iov = &iov->iovec + ret; + _iov->iov_len = min((u64)len - s, size); + _iov->iov_base = (void __user *)(unsigned long)mappedaddr; + } + s += size; addr += size; ++ret; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 69aec724ef7f..ded1b39d7852 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -67,6 +67,7 @@ struct vhost_vring_call { struct vhost_iov { union { + struct kvec kvec; struct iovec iovec; }; }; @@ -83,6 +84,11 @@ struct vhost_virtqueue { vring_avail_t __user *avail; vring_used_t __user *used; } user; + struct { + vring_desc_t *desc; + vring_avail_t *avail; + vring_used_t *used; + } kern; const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS]; struct file *kick; struct vhost_vring_call call_ctx; @@ -169,18 +175,41 @@ struct vhost_dev { int byte_weight; u64 kcov_handle; bool use_worker; + bool kernel; int (*msg_handler)(struct vhost_dev *dev, struct vhost_iotlb_msg *msg); }; +static inline bool vhost_kernel(const struct vhost_virtqueue *vq) +{ + if (!IS_ENABLED(CONFIG_VHOST_KERNEL)) + return false; + + return vq->dev->kernel; +} + static inline size_t vhost_iov_length(const struct vhost_virtqueue *vq, struct vhost_iov *iov, unsigned long nr_segs) { + if (vhost_kernel(vq)) { + size_t ret = 0; + const struct kvec *kvec = &iov->kvec; + unsigned int seg; + + for (seg = 0; seg < nr_segs; seg++) + ret += kvec[seg].iov_len; + + return ret; + }; + return iov_length(&iov->iovec, nr_segs); } static inline size_t vhost_iov_len(const struct vhost_virtqueue *vq, struct vhost_iov *iov) { + if (vhost_kernel(vq)) + return iov->kvec.iov_len; + return iov->iovec.iov_len; } @@ -190,6 +219,11 @@ static inline void vhost_iov_iter_init(const struct vhost_virtqueue *vq, unsigned long nr_segs, size_t count) { + if (vhost_kernel(vq)) { + iov_iter_kvec(i, direction, &iov->kvec, nr_segs, count); + return; + } + iov_iter_init(i, direction, &iov->iovec, nr_segs, count); } -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 05/10] vhost: extract common code for file_operations handling
There is some duplicated code for handling of file_operations among vhost drivers. Move this to a common file. Having file_operations in a common place also makes adding functions for obaining a handle to a vhost device from a file descriptor simpler. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/Makefile | 3 + drivers/vhost/common.c | 134 +++++++++++++++++++++++++++++++++++++++++ drivers/vhost/net.c | 79 +++++++----------------- drivers/vhost/vhost.h | 15 +++++ drivers/vhost/vsock.c | 75 +++++++---------------- 5 files changed, 197 insertions(+), 109 deletions(-) create mode 100644 drivers/vhost/common.c diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index f3e1897cce85..b1ddc976aede 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -15,5 +15,8 @@ vhost_vdpa-y := vdpa.o obj-$(CONFIG_VHOST) += vhost.o +obj-$(CONFIG_VHOST) += vhost_common.o +vhost_common-y := common.o + obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o vhost_iotlb-y := iotlb.o diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c new file mode 100644 index 000000000000..27d4672b15d3 --- /dev/null +++ b/drivers/vhost/common.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include <linux/eventfd.h> +#include <linux/vhost.h> +#include <linux/uio.h> +#include <linux/mm.h> +#include <linux/miscdevice.h> +#include <linux/mutex.h> +#include <linux/poll.h> +#include <linux/file.h> +#include <linux/highmem.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> +#include <linux/kthread.h> +#include <linux/cgroup.h> +#include <linux/module.h> +#include <linux/sort.h> +#include <linux/sched/mm.h> +#include <linux/sched/signal.h> +#include <linux/interval_tree_generic.h> +#include <linux/nospec.h> +#include <linux/kcov.h> + +#include "vhost.h" + +struct vhost_ops; + +struct vhost { + struct miscdevice misc; + const struct vhost_ops *ops; +}; + +static int vhost_open(struct inode *inode, struct file *file) +{ + struct miscdevice *misc = file->private_data; + struct vhost *vhost = container_of(misc, struct vhost, misc); + struct vhost_dev *dev; + + dev = vhost->ops->open(vhost); + if (IS_ERR(dev)) + return PTR_ERR(dev); + + dev->vhost = vhost; + dev->file = file; + file->private_data = dev; + + return 0; +} + +static int vhost_release(struct inode *inode, struct file *file) +{ + struct vhost_dev *dev = file->private_data; + struct vhost *vhost = dev->vhost; + + vhost->ops->release(dev); + + return 0; +} + +static long vhost_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) +{ + struct vhost_dev *dev = file->private_data; + struct vhost *vhost = dev->vhost; + + return vhost->ops->ioctl(dev, ioctl, arg); +} + +static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + struct vhost_dev *dev = file->private_data; + int noblock = file->f_flags & O_NONBLOCK; + + return vhost_chr_read_iter(dev, to, noblock); +} + +static ssize_t vhost_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct vhost_dev *dev = file->private_data; + + return vhost_chr_write_iter(dev, from); +} + +static __poll_t vhost_poll(struct file *file, poll_table *wait) +{ + struct vhost_dev *dev = file->private_data; + + return vhost_chr_poll(file, dev, wait); +} + +static const struct file_operations vhost_fops = { + .owner = THIS_MODULE, + .open = vhost_open, + .release = vhost_release, + .llseek = noop_llseek, + .unlocked_ioctl = vhost_ioctl, + .compat_ioctl = compat_ptr_ioctl, + .read_iter = vhost_read_iter, + .write_iter = vhost_write_iter, + .poll = vhost_poll, +}; + +struct vhost *vhost_register(const struct vhost_ops *ops) +{ + struct vhost *vhost; + int ret; + + vhost = kzalloc(sizeof(*vhost), GFP_KERNEL); + if (!vhost) + return ERR_PTR(-ENOMEM); + + vhost->misc.minor = ops->minor; + vhost->misc.name = ops->name; + vhost->misc.fops = &vhost_fops; + vhost->ops = ops; + + ret = misc_register(&vhost->misc); + if (ret) { + kfree(vhost); + return ERR_PTR(ret); + } + + return vhost; +} +EXPORT_SYMBOL_GPL(vhost_register); + +void vhost_unregister(struct vhost *vhost) +{ + misc_deregister(&vhost->misc); + kfree(vhost); +} +EXPORT_SYMBOL_GPL(vhost_unregister); + +MODULE_LICENSE("GPL v2"); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8f82b646d4af..8910d9e2a74e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1281,7 +1281,7 @@ static void handle_rx_net(struct vhost_work *work) handle_rx(net); } -static int vhost_net_open(struct inode *inode, struct file *f) +static struct vhost_dev *vhost_net_open(struct vhost *vhost) { struct vhost_net *n; struct vhost_dev *dev; @@ -1292,11 +1292,11 @@ static int vhost_net_open(struct inode *inode, struct file *f) n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!n) - return -ENOMEM; + return ERR_PTR(-ENOMEM); vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL); if (!vqs) { kvfree(n); - return -ENOMEM; + return ERR_PTR(-ENOMEM); } queue = kmalloc_array(VHOST_NET_BATCH, sizeof(void *), @@ -1304,7 +1304,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) if (!queue) { kfree(vqs); kvfree(n); - return -ENOMEM; + return ERR_PTR(-ENOMEM); } n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue; @@ -1313,7 +1313,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) kfree(vqs); kvfree(n); kfree(queue); - return -ENOMEM; + return ERR_PTR(-ENOMEM); } n->vqs[VHOST_NET_VQ_TX].xdp = xdp; @@ -1341,11 +1341,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev); - f->private_data = n; n->page_frag.page = NULL; n->refcnt_bias = 0; - return 0; + return dev; } static struct socket *vhost_net_stop_vq(struct vhost_net *n, @@ -1395,9 +1394,9 @@ static void vhost_net_flush(struct vhost_net *n) } } -static int vhost_net_release(struct inode *inode, struct file *f) +static void vhost_net_release(struct vhost_dev *dev) { - struct vhost_net *n = f->private_data; + struct vhost_net *n = container_of(dev, struct vhost_net, dev); struct socket *tx_sock; struct socket *rx_sock; @@ -1421,7 +1420,6 @@ static int vhost_net_release(struct inode *inode, struct file *f) if (n->page_frag.page) __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias); kvfree(n); - return 0; } static struct socket *get_raw_socket(int fd) @@ -1687,10 +1685,10 @@ static long vhost_net_set_owner(struct vhost_net *n) return r; } -static long vhost_net_ioctl(struct file *f, unsigned int ioctl, +static long vhost_net_ioctl(struct vhost_dev *dev, unsigned int ioctl, unsigned long arg) { - struct vhost_net *n = f->private_data; + struct vhost_net *n = container_of(dev, struct vhost_net, dev); void __user *argp = (void __user *)arg; u64 __user *featurep = argp; struct vhost_vring_file backend; @@ -1741,63 +1739,32 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl, } } -static ssize_t vhost_net_chr_read_iter(struct kiocb *iocb, struct iov_iter *to) -{ - struct file *file = iocb->ki_filp; - struct vhost_net *n = file->private_data; - struct vhost_dev *dev = &n->dev; - int noblock = file->f_flags & O_NONBLOCK; - - return vhost_chr_read_iter(dev, to, noblock); -} - -static ssize_t vhost_net_chr_write_iter(struct kiocb *iocb, - struct iov_iter *from) -{ - struct file *file = iocb->ki_filp; - struct vhost_net *n = file->private_data; - struct vhost_dev *dev = &n->dev; - - return vhost_chr_write_iter(dev, from); -} - -static __poll_t vhost_net_chr_poll(struct file *file, poll_table *wait) -{ - struct vhost_net *n = file->private_data; - struct vhost_dev *dev = &n->dev; - - return vhost_chr_poll(file, dev, wait); -} - -static const struct file_operations vhost_net_fops = { - .owner = THIS_MODULE, - .release = vhost_net_release, - .read_iter = vhost_net_chr_read_iter, - .write_iter = vhost_net_chr_write_iter, - .poll = vhost_net_chr_poll, - .unlocked_ioctl = vhost_net_ioctl, - .compat_ioctl = compat_ptr_ioctl, +static const struct vhost_ops vhost_net_ops = { + .minor = VHOST_NET_MINOR, + .name = "vhost-net", .open = vhost_net_open, - .llseek = noop_llseek, + .release = vhost_net_release, + .ioctl = vhost_net_ioctl, }; -static struct miscdevice vhost_net_misc = { - .minor = VHOST_NET_MINOR, - .name = "vhost-net", - .fops = &vhost_net_fops, -}; +static struct vhost *vhost_net; static int vhost_net_init(void) { if (experimental_zcopytx) vhost_net_enable_zcopy(VHOST_NET_VQ_TX); - return misc_register(&vhost_net_misc); + + vhost_net = vhost_register(&vhost_net_ops); + if (IS_ERR(vhost_net)) + return PTR_ERR(vhost_net); + + return 0; } module_init(vhost_net_init); static void vhost_net_exit(void) { - misc_deregister(&vhost_net_misc); + vhost_unregister(vhost_net); } module_exit(vhost_net_exit); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index ded1b39d7852..562387b92730 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -15,6 +15,19 @@ #include <linux/vhost_iotlb.h> #include <linux/irqbypass.h> +struct vhost; + +struct vhost_ops { + int minor; + const char *name; + struct vhost_dev * (*open)(struct vhost *vhost); + long (*ioctl)(struct vhost_dev *dev, unsigned int ioctl, unsigned long arg); + void (*release)(struct vhost_dev *dev); +}; + +struct vhost *vhost_register(const struct vhost_ops *ops); +void vhost_unregister(struct vhost *vhost); + struct vhost_work; typedef void (*vhost_work_fn_t)(struct vhost_work *work); @@ -160,6 +173,8 @@ struct vhost_dev { struct mm_struct *mm; struct mutex mutex; struct vhost_virtqueue **vqs; + struct vhost *vhost; + struct file *file; int nvqs; struct eventfd_ctx *log_ctx; struct llist_head work_list; diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 190e5a6ea045..93f74a0010d5 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -662,7 +662,7 @@ static void vhost_vsock_free(struct vhost_vsock *vsock) kvfree(vsock); } -static int vhost_vsock_dev_open(struct inode *inode, struct file *file) +static struct vhost_dev *vhost_vsock_dev_open(struct vhost *vhost) { struct vhost_virtqueue **vqs; struct vhost_vsock *vsock; @@ -673,7 +673,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) */ vsock = kvmalloc(sizeof(*vsock), GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!vsock) - return -ENOMEM; + return ERR_PTR(-ENOMEM); vqs = kmalloc_array(ARRAY_SIZE(vsock->vqs), sizeof(*vqs), GFP_KERNEL); if (!vqs) { @@ -694,15 +694,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT, VHOST_VSOCK_WEIGHT, true, NULL); - file->private_data = vsock; spin_lock_init(&vsock->send_pkt_list_lock); INIT_LIST_HEAD(&vsock->send_pkt_list); vhost_work_init(&vsock->send_pkt_work, vhost_transport_send_pkt_work); - return 0; + return &vsock->dev; out: vhost_vsock_free(vsock); - return ret; + return ERR_PTR(ret); } static void vhost_vsock_flush(struct vhost_vsock *vsock) @@ -741,9 +740,9 @@ static void vhost_vsock_reset_orphans(struct sock *sk) sk_error_report(sk); } -static int vhost_vsock_dev_release(struct inode *inode, struct file *file) +static void vhost_vsock_dev_release(struct vhost_dev *dev) { - struct vhost_vsock *vsock = file->private_data; + struct vhost_vsock *vsock = container_of(dev, struct vhost_vsock, dev); mutex_lock(&vhost_vsock_mutex); if (vsock->guest_cid) @@ -775,7 +774,6 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file) vhost_dev_cleanup(&vsock->dev); kfree(vsock->dev.vqs); vhost_vsock_free(vsock); - return 0; } static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid) @@ -851,10 +849,10 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features) return -EFAULT; } -static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl, +static long vhost_vsock_dev_ioctl(struct vhost_dev *dev, unsigned int ioctl, unsigned long arg) { - struct vhost_vsock *vsock = f->private_data; + struct vhost_vsock *vsock = container_of(dev, struct vhost_vsock, dev); void __user *argp = (void __user *)arg; u64 guest_cid; u64 features; @@ -906,51 +904,15 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl, } } -static ssize_t vhost_vsock_chr_read_iter(struct kiocb *iocb, struct iov_iter *to) -{ - struct file *file = iocb->ki_filp; - struct vhost_vsock *vsock = file->private_data; - struct vhost_dev *dev = &vsock->dev; - int noblock = file->f_flags & O_NONBLOCK; - - return vhost_chr_read_iter(dev, to, noblock); -} - -static ssize_t vhost_vsock_chr_write_iter(struct kiocb *iocb, - struct iov_iter *from) -{ - struct file *file = iocb->ki_filp; - struct vhost_vsock *vsock = file->private_data; - struct vhost_dev *dev = &vsock->dev; - - return vhost_chr_write_iter(dev, from); -} - -static __poll_t vhost_vsock_chr_poll(struct file *file, poll_table *wait) -{ - struct vhost_vsock *vsock = file->private_data; - struct vhost_dev *dev = &vsock->dev; - - return vhost_chr_poll(file, dev, wait); -} - -static const struct file_operations vhost_vsock_fops = { - .owner = THIS_MODULE, +static const struct vhost_ops vhost_vsock_ops = { + .minor = VHOST_VSOCK_MINOR, + .name = "vhost-vsock", .open = vhost_vsock_dev_open, .release = vhost_vsock_dev_release, - .llseek = noop_llseek, - .unlocked_ioctl = vhost_vsock_dev_ioctl, - .compat_ioctl = compat_ptr_ioctl, - .read_iter = vhost_vsock_chr_read_iter, - .write_iter = vhost_vsock_chr_write_iter, - .poll = vhost_vsock_chr_poll, + .ioctl = vhost_vsock_dev_ioctl, }; -static struct miscdevice vhost_vsock_misc = { - .minor = VHOST_VSOCK_MINOR, - .name = "vhost-vsock", - .fops = &vhost_vsock_fops, -}; +static struct vhost *vhost_vsock; static int __init vhost_vsock_init(void) { @@ -960,12 +922,19 @@ static int __init vhost_vsock_init(void) VSOCK_TRANSPORT_F_H2G); if (ret < 0) return ret; - return misc_register(&vhost_vsock_misc); + + vhost_vsock = vhost_register(&vhost_vsock_ops); + if (IS_ERR(vhost_vsock)) { + vsock_core_unregister(&vhost_transport.transport); + return PTR_ERR(vhost_vsock); + } + + return 0; }; static void __exit vhost_vsock_exit(void) { - misc_deregister(&vhost_vsock_misc); + vhost_unregister(vhost_vsock); vsock_core_unregister(&vhost_transport.transport); }; -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 06/10] vhost: extract ioctl locking to common code
Extract the mutex locking for the vhost ioctl into common code. This will allow the common code to easily add some extra handling required for adding a kernel API to control vhost. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/common.c | 7 ++++++- drivers/vhost/net.c | 14 +------------- drivers/vhost/vhost.c | 10 ++++++++-- drivers/vhost/vhost.h | 1 + drivers/vhost/vsock.c | 12 ------------ 5 files changed, 16 insertions(+), 28 deletions(-) diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c index 27d4672b15d3..a5722ad65e24 100644 --- a/drivers/vhost/common.c +++ b/drivers/vhost/common.c @@ -60,8 +60,13 @@ static long vhost_ioctl(struct file *file, unsigned int ioctl, unsigned long arg { struct vhost_dev *dev = file->private_data; struct vhost *vhost = dev->vhost; + long ret; - return vhost->ops->ioctl(dev, ioctl, arg); + mutex_lock(&dev->mutex); + ret = vhost->ops->ioctl(dev, ioctl, arg); + mutex_unlock(&dev->mutex); + + return ret; } static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 8910d9e2a74e..b5590b7862a9 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1505,7 +1505,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) struct vhost_net_ubuf_ref *ubufs, *oldubufs = NULL; int r; - mutex_lock(&n->dev.mutex); r = vhost_dev_check_owner(&n->dev); if (r) goto err; @@ -1573,7 +1572,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) sockfd_put(oldsock); } - mutex_unlock(&n->dev.mutex); return 0; err_used: @@ -1587,7 +1585,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) err_vq: mutex_unlock(&vq->mutex); err: - mutex_unlock(&n->dev.mutex); return r; } @@ -1598,7 +1595,6 @@ static long vhost_net_reset_owner(struct vhost_net *n) long err; struct vhost_iotlb *umem; - mutex_lock(&n->dev.mutex); err = vhost_dev_check_owner(&n->dev); if (err) goto done; @@ -1613,7 +1609,6 @@ static long vhost_net_reset_owner(struct vhost_net *n) vhost_dev_reset_owner(&n->dev, umem); vhost_net_vq_reset(n); done: - mutex_unlock(&n->dev.mutex); if (tx_sock) sockfd_put(tx_sock); if (rx_sock) @@ -1639,7 +1634,6 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) vhost_hlen = 0; sock_hlen = hdr_len; } - mutex_lock(&n->dev.mutex); if ((features & (1 << VHOST_F_LOG_ALL)) && !vhost_log_access_ok(&n->dev)) goto out_unlock; @@ -1656,11 +1650,9 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) n->vqs[i].sock_hlen = sock_hlen; mutex_unlock(&n->vqs[i].vq.mutex); } - mutex_unlock(&n->dev.mutex); return 0; out_unlock: - mutex_unlock(&n->dev.mutex); return -EFAULT; } @@ -1668,7 +1660,6 @@ static long vhost_net_set_owner(struct vhost_net *n) { int r; - mutex_lock(&n->dev.mutex); if (vhost_dev_has_owner(&n->dev)) { r = -EBUSY; goto out; @@ -1681,7 +1672,6 @@ static long vhost_net_set_owner(struct vhost_net *n) vhost_net_clear_ubuf_info(n); vhost_net_flush(n); out: - mutex_unlock(&n->dev.mutex); return r; } @@ -1721,20 +1711,18 @@ static long vhost_net_ioctl(struct vhost_dev *dev, unsigned int ioctl, return -EFAULT; if (features & ~VHOST_NET_BACKEND_FEATURES) return -EOPNOTSUPP; - vhost_set_backend_features(&n->dev, features); + __vhost_set_backend_features(&n->dev, features); return 0; case VHOST_RESET_OWNER: return vhost_net_reset_owner(n); case VHOST_SET_OWNER: return vhost_net_set_owner(n); default: - mutex_lock(&n->dev.mutex); r = vhost_dev_ioctl(&n->dev, ioctl, argp); if (r == -ENOIOCTLCMD) r = vhost_vring_ioctl(&n->dev, ioctl, argp); else vhost_net_flush(n); - mutex_unlock(&n->dev.mutex); return r; } } diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9354061ce75e..9d6496b7ad85 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2821,18 +2821,24 @@ struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev, } EXPORT_SYMBOL_GPL(vhost_dequeue_msg); -void vhost_set_backend_features(struct vhost_dev *dev, u64 features) +void __vhost_set_backend_features(struct vhost_dev *dev, u64 features) { struct vhost_virtqueue *vq; int i; - mutex_lock(&dev->mutex); for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; mutex_lock(&vq->mutex); vq->acked_backend_features = features; mutex_unlock(&vq->mutex); } +} +EXPORT_SYMBOL_GPL(__vhost_set_backend_features); + +void vhost_set_backend_features(struct vhost_dev *dev, u64 features) +{ + mutex_lock(&dev->mutex); + __vhost_set_backend_features(dev, features); mutex_unlock(&dev->mutex); } EXPORT_SYMBOL_GPL(vhost_set_backend_features); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 562387b92730..408ff243ed31 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -302,6 +302,7 @@ void vhost_enqueue_msg(struct vhost_dev *dev, struct vhost_msg_node *vhost_dequeue_msg(struct vhost_dev *dev, struct list_head *head); void vhost_set_backend_features(struct vhost_dev *dev, u64 features); +void __vhost_set_backend_features(struct vhost_dev *dev, u64 features); __poll_t vhost_chr_poll(struct file *file, struct vhost_dev *dev, poll_table *wait); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 93f74a0010d5..062767636226 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -583,8 +583,6 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) size_t i; int ret; - mutex_lock(&vsock->dev.mutex); - ret = vhost_dev_check_owner(&vsock->dev); if (ret) goto err; @@ -614,7 +612,6 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) */ vhost_work_queue(&vsock->dev, &vsock->send_pkt_work); - mutex_unlock(&vsock->dev.mutex); return 0; err_vq: @@ -629,7 +626,6 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) mutex_unlock(&vq->mutex); } err: - mutex_unlock(&vsock->dev.mutex); return ret; } @@ -638,8 +634,6 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock) size_t i; int ret; - mutex_lock(&vsock->dev.mutex); - ret = vhost_dev_check_owner(&vsock->dev); if (ret) goto err; @@ -653,7 +647,6 @@ static int vhost_vsock_stop(struct vhost_vsock *vsock) } err: - mutex_unlock(&vsock->dev.mutex); return ret; } @@ -821,7 +814,6 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features) if (features & ~VHOST_VSOCK_FEATURES) return -EOPNOTSUPP; - mutex_lock(&vsock->dev.mutex); if ((features & (1 << VHOST_F_LOG_ALL)) && !vhost_log_access_ok(&vsock->dev)) { goto err; @@ -841,11 +833,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features) vq->acked_features = features; mutex_unlock(&vq->mutex); } - mutex_unlock(&vsock->dev.mutex); return 0; err: - mutex_unlock(&vsock->dev.mutex); return -EFAULT; } @@ -893,13 +883,11 @@ static long vhost_vsock_dev_ioctl(struct vhost_dev *dev, unsigned int ioctl, vhost_set_backend_features(&vsock->dev, features); return 0; default: - mutex_lock(&vsock->dev.mutex); r = vhost_dev_ioctl(&vsock->dev, ioctl, argp); if (r == -ENOIOCTLCMD) r = vhost_vring_ioctl(&vsock->dev, ioctl, argp); else vhost_vsock_flush(vsock); - mutex_unlock(&vsock->dev.mutex); return r; } } -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 07/10] vhost: add support for kernel control
Add functions to allow vhost buffers to be placed in kernel space and for the vhost driver to be controlled from a kernel driver after initial setup by userspace. The kernel control is only possible on new /dev/vhost-*-kernel devices, and on these devices userspace cannot write to the iotlb, nor can it control the placement and attributes of the virtqueues, nor start/stop the virtqueue handling after the kernel starts using it. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/common.c | 201 +++++++++++++++++++++++++++++++++++++++++ drivers/vhost/vhost.c | 92 +++++++++++++++++-- drivers/vhost/vhost.h | 3 + include/linux/vhost.h | 23 +++++ 4 files changed, 310 insertions(+), 9 deletions(-) create mode 100644 include/linux/vhost.h diff --git a/drivers/vhost/common.c b/drivers/vhost/common.c index a5722ad65e24..f9758920a33a 100644 --- a/drivers/vhost/common.c +++ b/drivers/vhost/common.c @@ -25,7 +25,9 @@ struct vhost_ops; struct vhost { + char kernelname[128]; struct miscdevice misc; + struct miscdevice kernelmisc; const struct vhost_ops *ops; }; @@ -46,6 +48,24 @@ static int vhost_open(struct inode *inode, struct file *file) return 0; } +static int vhost_kernel_open(struct inode *inode, struct file *file) +{ + struct miscdevice *misc = file->private_data; + struct vhost *vhost = container_of(misc, struct vhost, kernelmisc); + struct vhost_dev *dev; + + dev = vhost->ops->open(vhost); + if (IS_ERR(dev)) + return PTR_ERR(dev); + + dev->vhost = vhost; + dev->file = file; + dev->kernel = true; + file->private_data = dev; + + return 0; +} + static int vhost_release(struct inode *inode, struct file *file) { struct vhost_dev *dev = file->private_data; @@ -69,6 +89,46 @@ static long vhost_ioctl(struct file *file, unsigned int ioctl, unsigned long arg return ret; } +static long vhost_kernel_ioctl(struct file *file, unsigned int ioctl, unsigned long arg) +{ + struct vhost_dev *dev = file->private_data; + struct vhost *vhost = dev->vhost; + long ret; + + /* Only the kernel is allowed to control virtqueue attributes */ + switch (ioctl) { + case VHOST_SET_VRING_NUM: + case VHOST_SET_VRING_ADDR: + case VHOST_SET_VRING_BASE: + case VHOST_SET_VRING_ENDIAN: + case VHOST_SET_MEM_TABLE: + case VHOST_SET_LOG_BASE: + case VHOST_SET_LOG_FD: + return -EPERM; + } + + mutex_lock(&dev->mutex); + + /* + * Userspace should perform all reqired setup on the vhost device + * _before_ asking the kernel to start using it. + * + * Note that ->kernel_attached is never reset, if userspace wants to + * attach again it should open the device again. + */ + if (dev->kernel_attached) { + ret = -EPERM; + goto out_unlock; + } + + ret = vhost->ops->ioctl(dev, ioctl, arg); + +out_unlock: + mutex_unlock(&dev->mutex); + + return ret; +} + static ssize_t vhost_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; @@ -105,6 +165,129 @@ static const struct file_operations vhost_fops = { .poll = vhost_poll, }; +static const struct file_operations vhost_kernel_fops = { + .owner = THIS_MODULE, + .open = vhost_kernel_open, + .release = vhost_release, + .llseek = noop_llseek, + .unlocked_ioctl = vhost_kernel_ioctl, + .compat_ioctl = compat_ptr_ioctl, +}; + +static void vhost_dev_lock_vqs(struct vhost_dev *d) +{ + int i; + + for (i = 0; i < d->nvqs; ++i) + mutex_lock_nested(&d->vqs[i]->mutex, i); +} + +static void vhost_dev_unlock_vqs(struct vhost_dev *d) +{ + int i; + + for (i = 0; i < d->nvqs; ++i) + mutex_unlock(&d->vqs[i]->mutex); +} + +struct vhost_dev *vhost_dev_get(int fd) +{ + struct file *file; + struct vhost_dev *dev; + struct vhost_dev *ret; + int err; + int i; + + file = fget(fd); + if (!file) + return ERR_PTR(-EBADF); + + if (file->f_op != &vhost_kernel_fops) { + ret = ERR_PTR(-EINVAL); + goto err_fput; + } + + dev = file->private_data; + + mutex_lock(&dev->mutex); + vhost_dev_lock_vqs(dev); + + err = vhost_dev_check_owner(dev); + if (err) { + ret = ERR_PTR(err); + goto err_unlock; + } + + if (dev->kernel_attached) { + ret = ERR_PTR(-EBUSY); + goto err_unlock; + } + + if (!dev->iotlb) { + ret = ERR_PTR(-EINVAL); + goto err_unlock; + } + + for (i = 0; i < dev->nvqs; i++) { + struct vhost_virtqueue *vq = dev->vqs[i]; + + if (vq->private_data) { + ret = ERR_PTR(-EBUSY); + goto err_unlock; + } + } + + dev->kernel_attached = true; + + vhost_dev_unlock_vqs(dev); + mutex_unlock(&dev->mutex); + + return dev; + +err_unlock: + vhost_dev_unlock_vqs(dev); + mutex_unlock(&dev->mutex); +err_fput: + fput(file); + return ret; +} +EXPORT_SYMBOL_GPL(vhost_dev_get); + +void vhost_dev_start_vq(struct vhost_dev *dev, u16 idx) +{ + struct vhost *vhost = dev->vhost; + + mutex_lock(&dev->mutex); + vhost->ops->start_vq(dev, idx); + mutex_unlock(&dev->mutex); +} +EXPORT_SYMBOL_GPL(vhost_dev_start_vq); + +void vhost_dev_stop_vq(struct vhost_dev *dev, u16 idx) +{ + struct vhost *vhost = dev->vhost; + + mutex_lock(&dev->mutex); + vhost->ops->stop_vq(dev, idx); + mutex_unlock(&dev->mutex); +} +EXPORT_SYMBOL_GPL(vhost_dev_stop_vq); + +void vhost_dev_put(struct vhost_dev *dev) +{ + /* The virtqueues should already be stopped. */ + fput(dev->file); +} +EXPORT_SYMBOL_GPL(vhost_dev_put); + +static bool vhost_kernel_supported(const struct vhost_ops *ops) +{ + if (!IS_ENABLED(CONFIG_VHOST_KERNEL)) + return false; + + return ops->start_vq && ops->stop_vq; +} + struct vhost *vhost_register(const struct vhost_ops *ops) { struct vhost *vhost; @@ -125,12 +308,30 @@ struct vhost *vhost_register(const struct vhost_ops *ops) return ERR_PTR(ret); } + if (vhost_kernel_supported(ops)) { + snprintf(vhost->kernelname, sizeof(vhost->kernelname), + "%s-kernel", ops->name); + + vhost->kernelmisc.minor = MISC_DYNAMIC_MINOR; + vhost->kernelmisc.name = vhost->kernelname; + vhost->kernelmisc.fops = &vhost_kernel_fops; + + ret = misc_register(&vhost->kernelmisc); + if (ret) { + misc_deregister(&vhost->misc); + kfree(vhost); + return ERR_PTR(ret); + } + } + return vhost; } EXPORT_SYMBOL_GPL(vhost_register); void vhost_unregister(struct vhost *vhost) { + if (vhost_kernel_supported(vhost->ops)) + misc_deregister(&vhost->kernelmisc); misc_deregister(&vhost->misc); kfree(vhost); } diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9d6496b7ad85..56a69ecfd910 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -486,6 +486,7 @@ void vhost_dev_init(struct vhost_dev *dev, dev->mm = NULL; dev->worker = NULL; dev->kernel = false; + dev->kernel_attached = false; dev->iov_limit = iov_limit; dev->weight = weight; dev->byte_weight = byte_weight; @@ -1329,6 +1330,30 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev, return ret; } + +int vhost_dev_iotlb_update(struct vhost_dev *dev, u64 iova, u64 size, u64 kaddr, unsigned int perm) +{ + int ret = 0; + + mutex_lock(&dev->mutex); + vhost_dev_lock_vqs(dev); + + if (!dev->iotlb) { + ret = -EINVAL; + goto out_unlock; + } + + if (vhost_iotlb_add_range(dev->iotlb, iova, iova + size - 1, kaddr, perm)) + ret = -ENOMEM; + +out_unlock: + vhost_dev_unlock_vqs(dev); + mutex_unlock(&dev->mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(vhost_dev_iotlb_update); + ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct iov_iter *from) { @@ -1677,27 +1702,35 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m) return -EFAULT; } -static long vhost_vring_set_num(struct vhost_dev *d, +static int __vhost_vring_set_num(struct vhost_dev *d, struct vhost_virtqueue *vq, - void __user *argp) + unsigned int num) { - struct vhost_vring_state s; - /* Resizing ring with an active backend? * You don't want to do that. */ if (vq->private_data) return -EBUSY; - if (copy_from_user(&s, argp, sizeof s)) - return -EFAULT; - - if (!s.num || s.num > 0xffff || (s.num & (s.num - 1))) + if (!num || num > 0xffff || (num & (num - 1))) return -EINVAL; - vq->num = s.num; + + vq->num = num; return 0; } +static long vhost_vring_set_num(struct vhost_dev *d, + struct vhost_virtqueue *vq, + void __user *argp) +{ + struct vhost_vring_state s; + + if (copy_from_user(&s, argp, sizeof(s))) + return -EFAULT; + + return __vhost_vring_set_num(d, vq, s.num); +} + static long vhost_vring_set_addr(struct vhost_dev *d, struct vhost_virtqueue *vq, void __user *argp) @@ -1750,6 +1783,47 @@ static long vhost_vring_set_addr(struct vhost_dev *d, return 0; } +int vhost_dev_set_vring_num(struct vhost_dev *dev, unsigned int idx, unsigned int num) +{ + struct vhost_virtqueue *vq; + int ret; + + if (idx >= dev->nvqs) + return -ENOBUFS; + + vq = dev->vqs[idx]; + + mutex_lock(&vq->mutex); + ret = __vhost_vring_set_num(dev, vq, num); + mutex_unlock(&vq->mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(vhost_dev_set_vring_num); + +int vhost_dev_set_num_addr(struct vhost_dev *dev, unsigned int idx, void *desc, + void *avail, void *used) +{ + struct vhost_virtqueue *vq; + int ret = 0; + + if (idx >= dev->nvqs) + return -ENOBUFS; + + vq = dev->vqs[idx]; + + mutex_lock(&vq->mutex); + vq->kern.desc = desc; + vq->kern.avail = avail; + vq->kern.used = used; + vq->last_avail_idx = 0; + vq->avail_idx = vq->last_avail_idx; + mutex_unlock(&vq->mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(vhost_dev_set_num_addr); + static long vhost_vring_set_num_addr(struct vhost_dev *d, struct vhost_virtqueue *vq, unsigned int ioctl, diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 408ff243ed31..6cd5d6b0d644 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -23,6 +23,8 @@ struct vhost_ops { struct vhost_dev * (*open)(struct vhost *vhost); long (*ioctl)(struct vhost_dev *dev, unsigned int ioctl, unsigned long arg); void (*release)(struct vhost_dev *dev); + void (*start_vq)(struct vhost_dev *dev, u16 idx); + void (*stop_vq)(struct vhost_dev *dev, u16 idx); }; struct vhost *vhost_register(const struct vhost_ops *ops); @@ -191,6 +193,7 @@ struct vhost_dev { u64 kcov_handle; bool use_worker; bool kernel; + bool kernel_attached; int (*msg_handler)(struct vhost_dev *dev, struct vhost_iotlb_msg *msg); }; diff --git a/include/linux/vhost.h b/include/linux/vhost.h new file mode 100644 index 000000000000..cdfe244c776b --- /dev/null +++ b/include/linux/vhost.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef _INCLUDE_LINUX_VHOST_H +#define _INCLUDE_LINUX_VHOST_H + +#include <uapi/linux/vhost.h> + +struct vhost_dev; + +struct vhost_dev *vhost_dev_get(int fd); +void vhost_dev_put(struct vhost_dev *dev); + +int vhost_dev_set_vring_num(struct vhost_dev *dev, unsigned int idx, + unsigned int num); +int vhost_dev_set_num_addr(struct vhost_dev *dev, unsigned int idx, void *desc, + void *avail, void *used); + +void vhost_dev_start_vq(struct vhost_dev *dev, u16 idx); +void vhost_dev_stop_vq(struct vhost_dev *dev, u16 idx); + +int vhost_dev_iotlb_update(struct vhost_dev *dev, u64 iova, u64 size, + u64 kaddr, unsigned int perm); + +#endif -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 08/10] vhost: net: add support for kernel control
Add support for kernel control to virtio-net. For the vhost-*-kernel devices, the ioctl to set the backend only provides the socket to vhost-net but does not start the handling of the virtqueues. The handling of the virtqueues is started and stopped by the kernel. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vhost/net.c | 106 ++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 98 insertions(+), 8 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index b5590b7862a9..977cfa89b216 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -144,6 +144,9 @@ struct vhost_net { struct page_frag page_frag; /* Refcount bias of page frag */ int refcnt_bias; + /* Used for storing backend sockets when stopped under kernel control */ + struct socket *tx_sock; + struct socket *rx_sock; }; static unsigned vhost_net_zcopy_mask __read_mostly; @@ -1293,6 +1296,8 @@ static struct vhost_dev *vhost_net_open(struct vhost *vhost) n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!n) return ERR_PTR(-ENOMEM); + n->tx_sock = NULL; + n->rx_sock = NULL; vqs = kmalloc_array(VHOST_NET_VQ_MAX, sizeof(*vqs), GFP_KERNEL); if (!vqs) { kvfree(n); @@ -1364,6 +1369,20 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n, return sock; } +/* Stops the virtqueue but doesn't unconsume the tap ring */ +static struct socket *__vhost_net_stop_vq(struct vhost_net *n, + struct vhost_virtqueue *vq) +{ + struct socket *sock; + + mutex_lock(&vq->mutex); + sock = vhost_vq_get_backend(vq); + vhost_net_disable_vq(n, vq); + vhost_vq_set_backend(vq, NULL); + mutex_unlock(&vq->mutex); + return sock; +} + static void vhost_net_stop(struct vhost_net *n, struct socket **tx_sock, struct socket **rx_sock) { @@ -1394,6 +1413,57 @@ static void vhost_net_flush(struct vhost_net *n) } } +static void vhost_net_start_vq(struct vhost_net *n, struct vhost_virtqueue *vq, + struct socket *sock) +{ + mutex_lock(&vq->mutex); + vhost_vq_set_backend(vq, sock); + vhost_vq_init_access(vq); + vhost_net_enable_vq(n, vq); + mutex_unlock(&vq->mutex); +} + +static void vhost_net_dev_start_vq(struct vhost_dev *dev, u16 idx) +{ + struct vhost_net *n = container_of(dev, struct vhost_net, dev); + + if (WARN_ON(idx >= ARRAY_SIZE(n->vqs))) + return; + + if (idx == VHOST_NET_VQ_RX) { + vhost_net_start_vq(n, &n->vqs[idx].vq, n->rx_sock); + n->rx_sock = NULL; + } else if (idx == VHOST_NET_VQ_TX) { + vhost_net_start_vq(n, &n->vqs[idx].vq, n->tx_sock); + n->tx_sock = NULL; + } + + vhost_net_flush_vq(n, idx); +} + +static void vhost_net_dev_stop_vq(struct vhost_dev *dev, u16 idx) +{ + struct vhost_net *n = container_of(dev, struct vhost_net, dev); + struct socket *sock; + + if (WARN_ON(idx >= ARRAY_SIZE(n->vqs))) + return; + + if (!vhost_vq_get_backend(&n->vqs[idx].vq)) + return; + + sock = __vhost_net_stop_vq(n, &n->vqs[idx].vq); + + vhost_net_flush(n); + synchronize_rcu(); + vhost_net_flush(n); + + if (idx == VHOST_NET_VQ_RX) + n->rx_sock = sock; + else if (idx == VHOST_NET_VQ_TX) + n->tx_sock = sock; +} + static void vhost_net_release(struct vhost_dev *dev) { struct vhost_net *n = container_of(dev, struct vhost_net, dev); @@ -1405,6 +1475,14 @@ static void vhost_net_release(struct vhost_dev *dev) vhost_dev_stop(&n->dev); vhost_dev_cleanup(&n->dev); vhost_net_vq_reset(n); + if (n->tx_sock) { + WARN_ON(tx_sock); + tx_sock = n->tx_sock; + } + if (n->rx_sock) { + WARN_ON(rx_sock); + rx_sock = n->rx_sock; + } if (tx_sock) sockfd_put(tx_sock); if (rx_sock) @@ -1518,7 +1596,7 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) mutex_lock(&vq->mutex); /* Verify that ring has been setup correctly. */ - if (!vhost_vq_access_ok(vq)) { + if (!vhost_kernel(vq) && !vhost_vq_access_ok(vq)) { r = -EFAULT; goto err_vq; } @@ -1539,14 +1617,17 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) } vhost_net_disable_vq(n, vq); - vhost_vq_set_backend(vq, sock); + if (!vhost_kernel(vq)) + vhost_vq_set_backend(vq, sock); vhost_net_buf_unproduce(nvq); - r = vhost_vq_init_access(vq); - if (r) - goto err_used; - r = vhost_net_enable_vq(n, vq); - if (r) - goto err_used; + if (!vhost_kernel(vq)) { + r = vhost_vq_init_access(vq); + if (r) + goto err_used; + r = vhost_net_enable_vq(n, vq); + if (r) + goto err_used; + } if (index == VHOST_NET_VQ_RX) nvq->rx_ring = get_tap_ptr_ring(fd); @@ -1572,6 +1653,13 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd) sockfd_put(oldsock); } + if (vhost_kernel(vq)) { + if (index == VHOST_NET_VQ_TX) + n->tx_sock = sock; + else if (index == VHOST_NET_VQ_RX) + n->rx_sock = sock; + } + return 0; err_used: @@ -1733,6 +1821,8 @@ static const struct vhost_ops vhost_net_ops = { .open = vhost_net_open, .release = vhost_net_release, .ioctl = vhost_net_ioctl, + .start_vq = vhost_net_dev_start_vq, + .stop_vq = vhost_net_dev_stop_vq, }; static struct vhost *vhost_net; -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 09/10] vdpa: add test driver for kernel buffers in vhost
Add a driver which uses the kernel buffer support in vhost to allow virtio-net and vhost-net to be run in a looback setup on the same system. While this feature could be useful on its own (for example for development of the vhost/virtio drivers), this driver is primarily intended to be used for testing the support for kernel buffers in vhost. A selftest which uses this driver will be added. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- drivers/vdpa/Kconfig | 8 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vhost_kernel_test/Makefile | 2 + .../vhost_kernel_test/vhost_kernel_test.c | 575 ++++++++++++++++++ 4 files changed, 586 insertions(+) create mode 100644 drivers/vdpa/vhost_kernel_test/Makefile create mode 100644 drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 3d91982d8371..308e5f11d2a9 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -43,6 +43,14 @@ config VDPA_USER With VDUSE it is possible to emulate a vDPA Device in a userspace program. +config VHOST_KERNEL_TEST + tristate "vhost kernel test driver" + depends on EVENTFD + select VHOST + select VHOST_KERNEL + help + Test driver for the vhost kernel-space buffer support. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index f02ebed33f19..4ba8a4b350c4 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ obj-$(CONFIG_VDPA_USER) += vdpa_user/ +obj-$(CONFIG_VHOST_KERNEL_TEST) += vhost_kernel_test/ obj-$(CONFIG_IFCVF) += ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA) += virtio_pci/ diff --git a/drivers/vdpa/vhost_kernel_test/Makefile b/drivers/vdpa/vhost_kernel_test/Makefile new file mode 100644 index 000000000000..7e0c7bdb3c0e --- /dev/null +++ b/drivers/vdpa/vhost_kernel_test/Makefile @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_VHOST_KERNEL_TEST) += vhost_kernel_test.o diff --git a/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c b/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c new file mode 100644 index 000000000000..82364cd02667 --- /dev/null +++ b/drivers/vdpa/vhost_kernel_test/vhost_kernel_test.c @@ -0,0 +1,575 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__ +#include <linux/interrupt.h> +#include <linux/module.h> +#include <linux/vdpa.h> +#include <linux/vhost.h> +#include <linux/virtio.h> +#include <linux/virtio_config.h> +#include <linux/virtio_ring.h> +#include <linux/eventfd.h> +#include <linux/dma-mapping.h> +#include <linux/dma-map-ops.h> +#include <linux/miscdevice.h> +#include <linux/slab.h> +#include <linux/wait.h> +#include <linux/poll.h> +#include <linux/file.h> +#include <linux/irq_work.h> +#include <uapi/linux/virtio_ids.h> +#include <uapi/linux/virtio_net.h> +#include <uapi/linux/vhost.h> + +struct vktest_vq { + struct vktest *vktest; + struct eventfd_ctx *kick; + struct eventfd_ctx *call; + u64 desc_addr; + u64 device_addr; + u64 driver_addr; + u32 num; + bool ready; + wait_queue_entry_t call_wait; + wait_queue_head_t *wqh; + poll_table call_pt; + struct vdpa_callback cb; + struct irq_work irq_work; +}; + +struct vktest { + struct vdpa_device vdpa; + struct mutex mutex; + struct vhost_dev *vhost; + struct virtio_net_config config; + struct vktest_vq vqs[2]; + u8 status; +}; + +static struct vktest *vdpa_to_vktest(struct vdpa_device *vdpa) +{ + return container_of(vdpa, struct vktest, vdpa); +} + +static int vktest_set_vq_address(struct vdpa_device *vdpa, u16 idx, + u64 desc_area, u64 driver_area, + u64 device_area) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vktest_vq *vq = &vktest->vqs[idx]; + + vq->desc_addr = desc_area; + vq->driver_addr = driver_area; + vq->device_addr = device_area; + + return 0; +} + +static void vktest_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vktest_vq *vq = &vktest->vqs[idx]; + + vq->num = num; +} + +static void vktest_kick_vq(struct vdpa_device *vdpa, u16 idx) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vktest_vq *vq = &vktest->vqs[idx]; + + if (vq->kick) + eventfd_signal(vq->kick, 1); +} + +static void vktest_set_vq_cb(struct vdpa_device *vdpa, u16 idx, + struct vdpa_callback *cb) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vktest_vq *vq = &vktest->vqs[idx]; + + vq->cb = *cb; +} + +static void vktest_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vktest_vq *vq = &vktest->vqs[idx]; + struct vhost_dev *vhost = vktest->vhost; + + if (!ready) { + vq->ready = false; + vhost_dev_stop_vq(vhost, idx); + return; + } + + vq->ready = true; + vhost_dev_set_num_addr(vhost, idx, (void *)vq->desc_addr, + (void *)vq->driver_addr, + (void *)vq->device_addr); + vhost_dev_set_vring_num(vhost, idx, vq->num); + vhost_dev_start_vq(vhost, idx); +} + +static bool vktest_get_vq_ready(struct vdpa_device *vdpa, u16 idx) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vktest_vq *vq = &vktest->vqs[idx]; + + return vq->ready; +} + +static int vktest_set_vq_state(struct vdpa_device *vdpa, u16 idx, + const struct vdpa_vq_state *state) +{ + return 0; +} + +static int vktest_get_vq_state(struct vdpa_device *vdpa, u16 idx, + struct vdpa_vq_state *state) +{ + return 0; +} + +static u32 vktest_get_vq_align(struct vdpa_device *vdpa) +{ + return PAGE_SIZE; +} + +static u64 vktest_get_features(struct vdpa_device *vdpa) +{ + return 1llu << VIRTIO_F_ACCESS_PLATFORM | 1llu << VIRTIO_F_VERSION_1; +} + +static int vktest_set_features(struct vdpa_device *vdpa, u64 features) +{ + return 0; +} + +static void vktest_set_config_cb(struct vdpa_device *vdpa, + struct vdpa_callback *cb) +{ +} + +static u16 vktest_get_vq_num_max(struct vdpa_device *vdpa) +{ + return 256; +} + +static u32 vktest_get_device_id(struct vdpa_device *vdpa) +{ + return VIRTIO_ID_NET; +} + +static u32 vktest_get_vendor_id(struct vdpa_device *vdpa) +{ + return 0; +} + +static u8 vktest_get_status(struct vdpa_device *vdpa) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + + return vktest->status; +} + +static int vktest_reset(struct vdpa_device *vdpa) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vhost_dev *vhost = vktest->vhost; + + if (vhost) { + int i; + + for (i = 0; i < ARRAY_SIZE(vktest->vqs); i++) + vhost_dev_stop_vq(vhost, i); + } + + vktest->status = 0; + + return 0; +} + +static void vktest_set_status(struct vdpa_device *vdpa, u8 status) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + + vktest->status = status; +} + +static size_t vktest_get_config_size(struct vdpa_device *vdpa) +{ + return sizeof(vdpa->config); +} + +static void vktest_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + + if (offset + len > sizeof(vktest->config)) + return; + + memcpy(buf, (void *)&vktest->config + offset, len); +} + +static void vktest_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) +{ +} + +static void vktest_free(struct vdpa_device *vdpa) +{ + struct vktest *vktest = vdpa_to_vktest(vdpa); + struct vhost_dev *vhost = vktest->vhost; + int i; + + for (i = 0; i < ARRAY_SIZE(vktest->vqs); i++) { + struct vktest_vq *vq = &vktest->vqs[i]; + + if (vq->wqh) { + remove_wait_queue(vq->wqh, &vq->call_wait); + vq->wqh = NULL; + } + + irq_work_sync(&vq->irq_work); + } + + if (vhost) + vhost_dev_put(vhost); + + for (i = 0; i < ARRAY_SIZE(vktest->vqs); i++) { + struct vktest_vq *vq = &vktest->vqs[i]; + + if (vq->kick) + eventfd_ctx_put(vq->kick); + if (vq->call) + eventfd_ctx_put(vq->call); + + vq->kick = NULL; + vq->call = NULL; + } +} + +/* + * By not implementing ->set_dma() and ->dma_map() and by using a dma_dev which is + * not tied to any hardware we ensure that vhost-vdpa cannot be opened if it + * binds to this vDPA driver (it will fail in vhost_vdpa_alloc_domain()). This + * ensures that only kernel code (virtio-vdpa) will be able to control VQ + * addresses, etc. + */ +static const struct vdpa_config_ops vktest_config_ops = { + .set_vq_address = vktest_set_vq_address, + .set_vq_num = vktest_set_vq_num, + .kick_vq = vktest_kick_vq, + .set_vq_cb = vktest_set_vq_cb, + .set_vq_ready = vktest_set_vq_ready, + .get_vq_ready = vktest_get_vq_ready, + .set_vq_state = vktest_set_vq_state, + .get_vq_state = vktest_get_vq_state, + .get_vq_align = vktest_get_vq_align, + .get_features = vktest_get_features, + .set_features = vktest_set_features, + .set_config_cb = vktest_set_config_cb, + .get_vq_num_max = vktest_get_vq_num_max, + .get_device_id = vktest_get_device_id, + .get_vendor_id = vktest_get_vendor_id, + .get_status = vktest_get_status, + .set_status = vktest_set_status, + .reset = vktest_reset, + .get_config_size = vktest_get_config_size, + .get_config = vktest_get_config, + .set_config = vktest_set_config, + .free = vktest_free, +}; + +static dma_addr_t vktest_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + return (dma_addr_t)page_to_virt(page) + offset; +} + +static void vktest_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ +} + +static void *vktest_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag, + unsigned long attrs) +{ + void *p; + + p = kvmalloc(size, flag); + if (!p) { + *dma_addr = DMA_MAPPING_ERROR; + return NULL; + } + + *dma_addr = (dma_addr_t)p; + + return p; +} + +static void vktest_free_coherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_addr, unsigned long attrs) +{ + kvfree(vaddr); +} + +static const struct dma_map_ops vktest_dma_ops = { + .map_page = vktest_map_page, + .unmap_page = vktest_unmap_page, + .alloc = vktest_alloc_coherent, + .free = vktest_free_coherent, +}; + +static void vktest_call_notify(struct vktest_vq *vq) +{ + struct vdpa_callback *cb = &vq->cb; + + if (cb->callback) + cb->callback(cb->private); +} + +static void do_up_read(struct irq_work *entry) +{ + struct vktest_vq *vq = container_of(entry, struct vktest_vq, irq_work); + + vktest_call_notify(vq); +} + +static int vktest_open(struct inode *inode, struct file *file) +{ + struct vktest *vktest; + struct device *dev; + int ret = 0; + int i; + + vktest = vdpa_alloc_device(struct vktest, vdpa, NULL, + &vktest_config_ops, NULL, false); + if (IS_ERR(vktest)) + return PTR_ERR(vktest); + + for (i = 0; i < ARRAY_SIZE(vktest->vqs); i++) { + struct vktest_vq *vq = &vktest->vqs[i]; + + init_irq_work(&vq->irq_work, do_up_read); + } + + dev = &vktest->vdpa.dev; + dev->dma_mask = &dev->coherent_dma_mask; + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); + if (ret) + goto err_put_device; + + dev->dma_mask = &dev->coherent_dma_mask; + set_dma_ops(dev, &vktest_dma_ops); + + vktest->vdpa.dma_dev = dev; + + mutex_init(&vktest->mutex); + file->private_data = vktest; + + return ret; + +err_put_device: + put_device(dev); + return ret; +} + +static int vktest_release(struct inode *inode, struct file *file) +{ + struct vktest *vktest = file->private_data; + struct vhost_dev *vhost = vktest->vhost; + + /* The device is not registered until a vhost is attached. */ + if (vhost) + vdpa_unregister_device(&vktest->vdpa); + else + put_device(&vktest->vdpa.dev); + + return 0; +} + +#define VKTEST_ATTACH_VHOST _IOW(0xbf, 0x31, int) + +static int vktest_attach_vhost(struct vktest *vktest, int fd) +{ + struct vhost_dev *vhost; + int ret; + int i; + + if (vktest->vhost) + return -EBUSY; + + for (i = 0; i < ARRAY_SIZE(vktest->vqs); i++) { + struct vktest_vq *vq = &vktest->vqs[i]; + + if (!vq->kick || !vq->call) + return -EINVAL; + } + + vhost = vhost_dev_get(fd); + if (IS_ERR(vhost)) + return PTR_ERR(vhost); + + vktest->vhost = vhost; + + /* 1:1 mapping */ + ret = vhost_dev_iotlb_update(vhost, 0, ULLONG_MAX, 0, VHOST_ACCESS_RW); + if (ret) + goto put_vhost; + + ret = vdpa_register_device(&vktest->vdpa, ARRAY_SIZE(vktest->vqs)); + if (ret) + goto put_vhost; + + return 0; + +put_vhost: + vhost_dev_put(vktest->vhost); + vktest->vhost = NULL; + return ret; +} + +static int vktest_set_vring_kick(struct vktest *vktest, + const struct vhost_vring_file *vringf) +{ + unsigned int idx = vringf->index; + struct eventfd_ctx *kick; + + if (idx >= sizeof(vktest->vqs)) + return -EINVAL; + + kick = eventfd_ctx_fdget(vringf->fd); + if (IS_ERR(kick)) + return PTR_ERR(kick); + + vktest->vqs[idx].kick = kick; + + return 0; +} + +static int vktest_call_wakeup(wait_queue_entry_t *wait, unsigned int mode, + int sync, void *key) +{ + struct vktest_vq *vq = container_of(wait, struct vktest_vq, call_wait); + unsigned long flags = (unsigned long)key; + + if (flags & POLLIN) + irq_work_queue(&vq->irq_work); + + return 0; +} + +static void vktest_call_queue_proc(struct file *file, wait_queue_head_t *wqh, + poll_table *pt) +{ + struct vktest_vq *vq = container_of(pt, struct vktest_vq, call_pt); + + vq->wqh = wqh; + add_wait_queue(wqh, &vq->call_wait); +} + +static int vktest_set_vring_call(struct vktest *vktest, + const struct vhost_vring_file *vringf) +{ + unsigned int idx = vringf->index; + struct fd eventfd; + struct eventfd_ctx *call; + struct vktest_vq *vq; + __poll_t events; + + if (idx >= sizeof(vktest->vqs)) + return -EINVAL; + + eventfd = fdget(vringf->fd); + if (!eventfd.file) + return -EBADF; + + call = eventfd_ctx_fileget(eventfd.file); + if (IS_ERR(call)) { + fdput(eventfd); + return PTR_ERR(call); + } + + vq = &vktest->vqs[idx]; + vq->call = call; + + init_waitqueue_func_entry(&vq->call_wait, vktest_call_wakeup); + init_poll_funcptr(&vq->call_pt, vktest_call_queue_proc); + + events = vfs_poll(eventfd.file, &vq->call_pt); + if (events & POLLIN) + vktest_call_notify(vq); + + return 0; +} + +static long vktest_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg) +{ + struct vktest *vktest = file->private_data; + void __user *userp = (void __user *)arg; + struct vhost_vring_file vringf; + long ret = -ENOIOCTLCMD; + + mutex_lock(&vktest->mutex); + + switch (ioctl) { + case VKTEST_ATTACH_VHOST: + ret = vktest_attach_vhost(vktest, arg); + break; + case VHOST_SET_VRING_KICK: + if (copy_from_user(&vringf, userp, sizeof(vringf))) { + ret = -EFAULT; + break; + } + ret = vktest_set_vring_kick(vktest, &vringf); + break; + case VHOST_SET_VRING_CALL: + if (copy_from_user(&vringf, userp, sizeof(vringf))) { + ret = -EFAULT; + break; + } + ret = vktest_set_vring_call(vktest, &vringf); + break; + } + + mutex_unlock(&vktest->mutex); + + return ret; +} + +static const struct file_operations vktest_fops = { + .owner = THIS_MODULE, + .release = vktest_release, + .unlocked_ioctl = vktest_ioctl, + .open = vktest_open, + .llseek = noop_llseek, +}; + +static struct miscdevice vktest_misc = { + MISC_DYNAMIC_MINOR, + "vktest", + &vktest_fops, +}; + +static int __init vktest_init(void) +{ + return misc_register(&vktest_misc); +} + +static void __exit vktest_exit(void) +{ + misc_deregister(&vktest_misc); +} + +module_init(vktest_init); +module_exit(vktest_exit); + +MODULE_LICENSE("GPL v2"); -- 2.28.0
Vincent Whitchurch
2021-Sep-29 15:11 UTC
[RFC PATCH 10/10] selftests: add vhost_kernel tests
Add a test which uses the vhost_kernel_test driver to test the vhost kernel buffers support. The test uses virtio-net and vhost-net and sets up a loopback network and then tests that ping works between the interface. It also checks that unbinding/rebinding of devices and closing the involved file descriptors in different sequences during active use works correctly. Signed-off-by: Vincent Whitchurch <vincent.whitchurch at axis.com> --- tools/testing/selftests/Makefile | 1 + .../vhost_kernel/vhost_kernel_test.c | 287 ++++++++++++++++++ .../vhost_kernel/vhost_kernel_test.sh | 125 ++++++++ 3 files changed, 413 insertions(+) create mode 100644 tools/testing/selftests/vhost_kernel/vhost_kernel_test.c create mode 100755 tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c852eb40c4f7..14a8349e3dc1 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -73,6 +73,7 @@ TARGETS += tmpfs TARGETS += tpm2 TARGETS += user TARGETS += vDSO +TARGETS += vhost_kernel TARGETS += vm TARGETS += x86 TARGETS += zram diff --git a/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c new file mode 100644 index 000000000000..b0f889bd2f72 --- /dev/null +++ b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.c @@ -0,0 +1,287 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE +#include <err.h> +#include <errno.h> +#include <fcntl.h> +#include <getopt.h> +#include <linux/if_tun.h> +#include <linux/virtio_net.h> +#include <linux/vhost.h> +#include <net/if.h> +#include <netdb.h> +#include <netinet/in.h> +#include <stdio.h> +#include <string.h> +#include <signal.h> +#include <stdbool.h> +#include <stdlib.h> +#include <sys/eventfd.h> +#include <sys/ioctl.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <unistd.h> + +#ifndef VIRTIO_F_ACCESS_PLATFORM +#define VIRTIO_F_ACCESS_PLATFORM 33 +#endif + +#ifndef VKTEST_ATTACH_VHOST +#define VKTEST_ATTACH_VHOST _IOW(0xbf, 0x31, int) +#endif + +static int vktest; +static const int num_vqs = 2; + +static int tun_alloc(char *dev) +{ + int hdrsize = sizeof(struct virtio_net_hdr_mrg_rxbuf); + struct ifreq ifr = { + .ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR, + }; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd < 0) + err(1, "open /dev/net/tun"); + + strncpy(ifr.ifr_name, dev, IFNAMSIZ); + + ret = ioctl(fd, TUNSETIFF, &ifr); + if (ret < 0) + err(1, "TUNSETIFF"); + + ret = ioctl(fd, TUNSETOFFLOAD, + TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | TUN_F_TSO_ECN); + if (ret < 0) + err(1, "TUNSETOFFLOAD"); + + ret = ioctl(fd, TUNSETVNETHDRSZ, &hdrsize); + if (ret < 0) + err(1, "TUNSETVNETHDRSZ"); + + strncpy(dev, ifr.ifr_name, IFNAMSIZ); + + return fd; +} + +static void handle_signal(int signum) +{ + if (signum == SIGUSR1) + close(vktest); +} + +static void vhost_net_set_backend(int vhost) +{ + char if_name[IFNAMSIZ]; + int tap_fd; + + snprintf(if_name, IFNAMSIZ, "vhostkernel%d", 0); + + tap_fd = tun_alloc(if_name); + + for (int i = 0; i < num_vqs; i++) { + struct vhost_vring_file txbackend = { + .index = i, + .fd = tap_fd, + }; + int ret; + + ret = ioctl(vhost, VHOST_NET_SET_BACKEND, &txbackend); + if (ret < 0) + err(1, "VHOST_NET_SET_BACKEND"); + } +} + +static void prepare_vhost_vktest(int vhost, int vktest) +{ + uint64_t features = 1llu << VIRTIO_F_ACCESS_PLATFORM | 1llu << VIRTIO_F_VERSION_1; + int ret; + + for (int i = 0; i < num_vqs; i++) { + int kickfd = eventfd(0, EFD_CLOEXEC); + + if (kickfd < 0) + err(1, "eventfd"); + + struct vhost_vring_file kick = { + .index = i, + .fd = kickfd, + }; + + ret = ioctl(vktest, VHOST_SET_VRING_KICK, &kick); + if (ret < 0) + err(1, "VHOST_SET_VRING_KICK"); + + ret = ioctl(vhost, VHOST_SET_VRING_KICK, &kick); + if (ret < 0) + err(1, "VHOST_SET_VRING_KICK"); + } + + for (int i = 0; i < num_vqs; i++) { + int callfd = eventfd(0, EFD_CLOEXEC); + + if (callfd < 0) + err(1, "eventfd"); + + struct vhost_vring_file call = { + .index = i, + .fd = callfd, + }; + + ret = ioctl(vktest, VHOST_SET_VRING_CALL, &call); + if (ret < 0) + err(1, "VHOST_SET_VRING_CALL"); + + ret = ioctl(vhost, VHOST_SET_VRING_CALL, &call); + if (ret < 0) + err(1, "VHOST_SET_VRING_CALL"); + } + + ret = ioctl(vhost, VHOST_SET_FEATURES, &features); + if (ret < 0) + err(1, "VHOST_SET_FEATURES"); +} + +static void test_attach(void) +{ + int vktest, vktest2; + int vhost; + int ret; + + vhost = open("/dev/vhost-net-kernel", O_RDONLY); + if (vhost < 0) + err(1, "vhost"); + + vktest = open("/dev/vktest", O_RDONLY); + if (vktest < 0) + err(1, "vhost"); + + ret = ioctl(vhost, VHOST_SET_OWNER); + if (ret < 0) + err(1, "VHOST_SET_OWNER"); + + prepare_vhost_vktest(vhost, vktest); + + ret = ioctl(vktest, VKTEST_ATTACH_VHOST, vhost); + if (ret < 0) + err(1, "VKTEST_ATTACH_VHOST"); + + vktest2 = open("/dev/vktest", O_RDONLY); + if (vktest2 < 0) + err(1, "vktest"); + + ret = ioctl(vktest2, VKTEST_ATTACH_VHOST, vhost); + if (ret == 0) + errx(1, "Second attach did not fail"); + + close(vktest2); + close(vktest); + close(vhost); +} + +int main(int argc, char *argv[]) +{ + bool serve = false; + uint64_t features; + int vhost; + struct option options[] = { + { "serve", no_argument, NULL, 's' }, + {} + }; + + while (1) { + int c; + + c = getopt_long_only(argc, argv, "", options, NULL); + if (c == -1) + break; + + switch (c) { + case 's': + serve = true; + break; + case '?': + default: + errx(1, "usage %s [--serve]", argv[0]); + } + }; + + if (!serve) { + test_attach(); + return 0; + } + + vhost = open("/dev/vhost-net-kernel", O_RDONLY); + if (vhost < 0) + err(1, "vhost"); + + int ret; + + ret = ioctl(vhost, VHOST_SET_OWNER); + if (ret < 0) + err(1, "VHOST_SET_OWNER"); + + vktest = open("/dev/vktest", O_RDONLY); + if (vktest < 0) + err(1, "vktest"); + + for (int i = 0; i < num_vqs; i++) { + int kickfd; + + kickfd = eventfd(0, EFD_CLOEXEC); + if (kickfd < 0) + err(1, "eventfd"); + + struct vhost_vring_file kick = { + .index = i, + .fd = kickfd, + }; + + ret = ioctl(vktest, VHOST_SET_VRING_KICK, &kick); + if (ret < 0) + err(1, "VHOST_SET_VRING_KICK"); + + ret = ioctl(vhost, VHOST_SET_VRING_KICK, &kick); + if (ret < 0) + err(1, "VHOST_SET_VRING_KICK"); + } + + for (int i = 0; i < num_vqs; i++) { + int callfd; + + callfd = eventfd(0, EFD_CLOEXEC); + if (callfd < 0) + err(1, "eventfd"); + + struct vhost_vring_file call = { + .index = i, + .fd = callfd, + }; + + ret = ioctl(vktest, VHOST_SET_VRING_CALL, &call); + if (ret < 0) + err(1, "VHOST_SET_VRING_CALL"); + + ret = ioctl(vhost, VHOST_SET_VRING_CALL, &call); + if (ret < 0) + err(1, "VHOST_SET_VRING_CALL"); + } + + features = 1llu << VIRTIO_F_ACCESS_PLATFORM | 1llu << VIRTIO_F_VERSION_1; + ret = ioctl(vhost, VHOST_SET_FEATURES, &features); + if (ret < 0) + err(1, "VHOST_SET_FEATURES"); + + vhost_net_set_backend(vhost); + + ret = ioctl(vktest, VKTEST_ATTACH_VHOST, vhost); + if (ret < 0) + err(1, "VKTEST_ATTACH_VHOST"); + + signal(SIGUSR1, handle_signal); + + while (1) + pause(); + + return 0; +} diff --git a/tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh new file mode 100755 index 000000000000..82b7896cea68 --- /dev/null +++ b/tools/testing/selftests/vhost_kernel/vhost_kernel_test.sh @@ -0,0 +1,125 @@ +#!/bin/sh -ex +# SPDX-License-Identifier: GPL-2.0-only + +cleanup() { + [ -z "$PID" ] || kill $PID 2>/dev/null || : + [ -z "$PINGPID0" ] || kill $PINGPID0 2>/dev/null || : + [ -z "$PINGPID1" ] || kill $PINGPID1 2>/dev/null || : + ip netns del g2h 2>/dev/null || : + ip netns del h2g 2>/dev/null || : +} + +fail() { + echo "FAIL: $*" + exit 1 +} + +./vhost_kernel_test || fail "Sanity test failed" + +cleanup +trap cleanup EXIT + +test_one() { + ls /sys/class/net/ > before + echo > new + + ./vhost_kernel_test --serve & + PID=$! + + echo 'Waiting for interfaces' + + timeout=5 + while [ $timeout -gt 0 ]; do + timeout=$(($timeout - 1)) + sleep 1 + ls /sys/class/net/ > after + grep -F -x -v -f before after > new || continue + [ $(wc -l < new) -eq 2 ] || continue + break + done + + g2h+ h2g+ + while IFS= read -r iface; do + case $iface in + vhostkernel*) + h2g=$iface + ;; + *) + # Assumed to be virtio-net + g2h=$iface + ;; + esac + + done<new + + [ "$g2h" ] || fail "Did not find guest-to-host interface" + [ "$h2g" ] || fail "Did not find host-to-guest interface" + + # IPv6 link-local addresses prevent short-circuit delivery. + hostip=fe80::0 + guestip=fe80::1 + + # Move the interfaces out of the default namespaces to prevent network manager + # daemons from messing with them. + ip netns add g2h + ip netns add h2g + + ip link set dev $h2g netns h2g + ip netns exec h2g ip addr add dev $h2g scope link $hostip + ip netns exec h2g ip link set dev $h2g up + + ip link set dev $g2h netns g2h + ip netns exec g2h ip addr add dev $g2h scope link $guestip + ip netns exec g2h ip link set dev $g2h up + + # ip netns exec g2h tcpdump -i $g2h -w $g2h.pcap & + # ip netns exec h2g tcpdump -i $h2g -w $h2g.pcap & + + ip netns exec h2g ping6 -c10 -A -s 20000 $guestip%$h2g + ip netns exec g2h ping6 -c10 -A -s 20000 $hostip%$g2h +} + +start_background_flood() { + ip netns exec h2g ping6 -f $guestip%$h2g & + PINGPID0=$! + ip netns exec g2h ping6 -f $hostip%$g2h & + PINGPID1=$! + sleep 1 +} + +echo TEST: Basic test +test_one +# Trigger cleanup races +start_background_flood +cleanup + +echo TEST: Close vhost_test fd before vhost +test_one +start_background_flood +kill -USR1 $PID +PID+cleanup + +echo TEST: Unbind virtio_net and close +test_one +start_background_flood +echo virtio0 > /sys/bus/virtio/drivers/virtio_net/unbind +cleanup + +echo TEST: Unbind and rebind virtio_net +test_one +start_background_flood +echo virtio0 > /sys/bus/virtio/drivers/virtio_net/unbind +echo virtio0 > /sys/bus/virtio/drivers/virtio_net/bind +# We assume that $g2h is the same after the new probe +ip link set dev $g2h netns g2h +ip netns exec g2h ip addr add dev $g2h scope link $guestip +ip netns exec g2h ip link set dev $g2h up +ip netns exec g2h ping6 -c10 -A -s 20000 $hostip%$g2h +cleanup + +trap - EXIT + +echo OK -- 2.28.0