Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 00/13] vdpa: add vdpa simulator for block device
v3: - added new patches - 'vringh: explain more about cleaning riov and wiov' - 'vdpa: add return value to get_config/set_config callbacks' - 'vhost/vdpa: remove vhost_vdpa_config_validate()' - split Xie's patch 'vhost/vdpa: Remove the restriction that only supports virtio-net devices' - updated Mellanox copyright to NVIDIA [Max] - explained in the 'vdpa: add vdpa simulator for block device' commit message that inputs are validated in subsequent patches [Stefan] v2: https://lore.kernel.org/lkml/20210128144127.113245-1-sgarzare at redhat.com/ v1: https://lore.kernel.org/lkml/93f207c0-61e6-3696-f218-e7d7ea9a7c93 at redhat.com/ This series is the second part of the v1 linked above. The first part with refactoring of vdpa_sim has already been merged. The patches are based on Max Gurtovoy's work and extend the block simulator to have a ramdisk behaviour. As mentioned in the v1 there was 2 issues and I fixed them in this series: 1. The identical mapping in the IOMMU used until now in vdpa_sim created issues when mapping different virtual pages with the same physical address. Fixed by patch "vdpa_sim: use iova module to allocate IOVA addresses" 2. There was a race accessing the IOMMU between the vdpasim_blk_work() and the device driver that map/unmap DMA regions. Fixed by patch "vringh: add 'iotlb_lock' to synchronize iotlb accesses" I used the Xie's patch coming from VDUSE series to allow vhost-vdpa to use block devices. As Jason suggested I split it into two patches and I added a return value to get_config()/set_config() callbacks. The series also includes small fixes for vringh, vdpa, and vdpa_sim that I discovered while implementing and testing the block simulator. Thanks for your feedback, Stefano Max Gurtovoy (1): vdpa: add vdpa simulator for block device Stefano Garzarella (11): vdpa_sim: use iova module to allocate IOVA addresses vringh: add 'iotlb_lock' to synchronize iotlb accesses vringh: reset kiov 'consumed' field in __vringh_iov() vringh: explain more about cleaning riov and wiov vringh: implement vringh_kiov_advance() vringh: add vringh_kiov_length() helper vdpa_sim: cleanup kiovs in vdpasim_free() vdpa: add return value to get_config/set_config callbacks vhost/vdpa: remove vhost_vdpa_config_validate() vdpa_sim_blk: implement ramdisk behaviour vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID Xie Yongji (1): vhost/vdpa: Remove the restriction that only supports virtio-net devices drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 + include/linux/vdpa.h | 18 +- include/linux/vringh.h | 19 +- drivers/vdpa/ifcvf/ifcvf_main.c | 24 ++- drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 134 ++++++++----- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 288 +++++++++++++++++++++++++++ drivers/vhost/vdpa.c | 47 ++--- drivers/vhost/vringh.c | 69 +++++-- drivers/vdpa/Kconfig | 8 + drivers/vdpa/vdpa_sim/Makefile | 1 + 11 files changed, 504 insertions(+), 123 deletions(-) create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 01/13] vdpa_sim: use iova module to allocate IOVA addresses
The identical mapping used until now created issues when mapping different virtual pages with the same physical address. To solve this issue, we can use the iova module, to handle the IOVA allocation. For simplicity we use an IOVA allocator with byte granularity. We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(), to handle the IOVA allocation and the registration into the IOMMU/IOTLB. These functions are used by dma_map_ops callbacks. Acked-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- v2: - used ULONG_MAX instead of ~0UL [Jason] - fixed typos in comment and patch description [Jason] --- drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 + drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++++++++++++++++++------------ drivers/vdpa/Kconfig | 1 + 3 files changed, 69 insertions(+), 42 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index 6d75444f9948..cd58e888bcf3 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -6,6 +6,7 @@ #ifndef _VDPA_SIM_H #define _VDPA_SIM_H +#include <linux/iova.h> #include <linux/vringh.h> #include <linux/vdpa.h> #include <linux/virtio_byteorder.h> @@ -57,6 +58,7 @@ struct vdpasim { /* virtio config according to device type */ void *config; struct vhost_iotlb *iommu; + struct iova_domain iova; void *buffer; u32 status; u32 generation; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index d5942842432d..2183a833fcf4 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -17,6 +17,7 @@ #include <linux/vringh.h> #include <linux/vdpa.h> #include <linux/vhost_iotlb.h> +#include <linux/iova.h> #include "vdpa_sim.h" @@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir) return perm; } +static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr, + size_t size, unsigned int perm) +{ + struct iova *iova; + dma_addr_t dma_addr; + int ret; + + /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */ + iova = alloc_iova(&vdpasim->iova, size, ULONG_MAX - 1, true); + if (!iova) + return DMA_MAPPING_ERROR; + + dma_addr = iova_dma_addr(&vdpasim->iova, iova); + + spin_lock(&vdpasim->iommu_lock); + ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr, + (u64)dma_addr + size - 1, (u64)paddr, perm); + spin_unlock(&vdpasim->iommu_lock); + + if (ret) { + __free_iova(&vdpasim->iova, iova); + return DMA_MAPPING_ERROR; + } + + return dma_addr; +} + +static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr, + size_t size) +{ + spin_lock(&vdpasim->iommu_lock); + vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr, + (u64)dma_addr + size - 1); + spin_unlock(&vdpasim->iommu_lock); + + free_iova(&vdpasim->iova, iova_pfn(&vdpasim->iova, dma_addr)); +} + static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset; - int ret, perm = dir_to_perm(dir); + phys_addr_t paddr = page_to_phys(page) + offset; + int perm = dir_to_perm(dir); if (perm < 0) return DMA_MAPPING_ERROR; - /* For simplicity, use identical mapping to avoid e.g iova - * allocator. - */ - spin_lock(&vdpasim->iommu_lock); - ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1, - pa, dir_to_perm(dir)); - spin_unlock(&vdpasim->iommu_lock); - if (ret) - return DMA_MAPPING_ERROR; - - return (dma_addr_t)(pa); + return vdpasim_map_range(vdpasim, paddr, size, perm); } static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, @@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - spin_lock(&vdpasim->iommu_lock); - vhost_iotlb_del_range(iommu, (u64)dma_addr, - (u64)dma_addr + size - 1); - spin_unlock(&vdpasim->iommu_lock); + vdpasim_unmap_range(vdpasim, dma_addr, size); } static void *vdpasim_alloc_coherent(struct device *dev, size_t size, @@ -172,27 +196,22 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - void *addr = kmalloc(size, flag); - int ret; + phys_addr_t paddr; + void *addr; - spin_lock(&vdpasim->iommu_lock); + addr = kmalloc(size, flag); if (!addr) { *dma_addr = DMA_MAPPING_ERROR; - } else { - u64 pa = virt_to_phys(addr); - - ret = vhost_iotlb_add_range(iommu, (u64)pa, - (u64)pa + size - 1, - pa, VHOST_MAP_RW); - if (ret) { - *dma_addr = DMA_MAPPING_ERROR; - kfree(addr); - addr = NULL; - } else - *dma_addr = (dma_addr_t)pa; + return NULL; + } + + paddr = virt_to_phys(addr); + + *dma_addr = vdpasim_map_range(vdpasim, paddr, size, VHOST_MAP_RW); + if (*dma_addr == DMA_MAPPING_ERROR) { + kfree(addr); + return NULL; } - spin_unlock(&vdpasim->iommu_lock); return addr; } @@ -202,14 +221,10 @@ static void vdpasim_free_coherent(struct device *dev, size_t size, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - spin_lock(&vdpasim->iommu_lock); - vhost_iotlb_del_range(iommu, (u64)dma_addr, - (u64)dma_addr + size - 1); - spin_unlock(&vdpasim->iommu_lock); + vdpasim_unmap_range(vdpasim, dma_addr, size); - kfree(phys_to_virt((uintptr_t)dma_addr)); + kfree(vaddr); } static const struct dma_map_ops vdpasim_dma_ops = { @@ -271,6 +286,13 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr) for (i = 0; i < dev_attr->nvqs; i++) vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu); + ret = iova_cache_get(); + if (ret) + goto err_iommu; + + /* For simplicity we use an IOVA allocator with byte granularity */ + init_iova_domain(&vdpasim->iova, 1, 0); + vdpasim->vdpa.dma_dev = dev; return vdpasim; @@ -541,6 +563,8 @@ static void vdpasim_free(struct vdpa_device *vdpa) struct vdpasim *vdpasim = vdpa_to_sim(vdpa); cancel_work_sync(&vdpasim->work); + put_iova_domain(&vdpasim->iova); + iova_cache_put(); kvfree(vdpasim->buffer); if (vdpasim->iommu) vhost_iotlb_free(vdpasim->iommu); diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index ffd1e098bfd2..21a23500f430 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -14,6 +14,7 @@ config VDPA_SIM depends on RUNTIME_TESTING_MENU && HAS_DMA select DMA_OPS select VHOST_RING + select IOMMU_IOVA help Enable this module to support vDPA device simulators. These devices are used for testing, prototyping and development of vDPA. -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 02/13] vringh: add 'iotlb_lock' to synchronize iotlb accesses
Usually iotlb accesses are synchronized with a spinlock. Let's request it as a new parameter in vringh_set_iotlb() and hold it when we navigate the iotlb in iotlb_translate() to avoid race conditions with any new additions/deletions of ranges from the ioltb. Acked-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- include/linux/vringh.h | 6 +++++- drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 ++- drivers/vhost/vringh.c | 9 ++++++++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 59bd50f99291..9c077863c8f6 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -46,6 +46,9 @@ struct vringh { /* IOTLB for this vring */ struct vhost_iotlb *iotlb; + /* spinlock to synchronize IOTLB accesses */ + spinlock_t *iotlb_lock; + /* The function to call to notify the guest about added buffers */ void (*notify)(struct vringh *); }; @@ -258,7 +261,8 @@ static inline __virtio64 cpu_to_vringh64(const struct vringh *vrh, u64 val) #if IS_REACHABLE(CONFIG_VHOST_IOTLB) -void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb); +void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb, + spinlock_t *iotlb_lock); int vringh_init_iotlb(struct vringh *vrh, u64 features, unsigned int num, bool weak_barriers, diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 2183a833fcf4..53238989713d 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -284,7 +284,8 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr) goto err_iommu; for (i = 0; i < dev_attr->nvqs; i++) - vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu); + vringh_set_iotlb(&vdpasim->vqs[i].vring, vdpasim->iommu, + &vdpasim->iommu_lock); ret = iova_cache_get(); if (ret) diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index 85d85faba058..f68122705719 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -1074,6 +1074,8 @@ static int iotlb_translate(const struct vringh *vrh, int ret = 0; u64 s = 0; + spin_lock(vrh->iotlb_lock); + while (len > s) { u64 size, pa, pfn; @@ -1103,6 +1105,8 @@ static int iotlb_translate(const struct vringh *vrh, ++ret; } + spin_unlock(vrh->iotlb_lock); + return ret; } @@ -1262,10 +1266,13 @@ EXPORT_SYMBOL(vringh_init_iotlb); * vringh_set_iotlb - initialize a vringh for a ring with IOTLB. * @vrh: the vring * @iotlb: iotlb associated with this vring + * @iotlb_lock: spinlock to synchronize the iotlb accesses */ -void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb) +void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb, + spinlock_t *iotlb_lock) { vrh->iotlb = iotlb; + vrh->iotlb_lock = iotlb_lock; } EXPORT_SYMBOL(vringh_set_iotlb); -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 03/13] vringh: reset kiov 'consumed' field in __vringh_iov()
__vringh_iov() overwrites the contents of riov and wiov, in fact it resets the 'i' and 'used' fields, but also the 'consumed' field should be reset to avoid an inconsistent state. Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- drivers/vhost/vringh.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index f68122705719..bee63d68201a 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -290,9 +290,9 @@ __vringh_iov(struct vringh *vrh, u16 i, return -EINVAL; if (riov) - riov->i = riov->used = 0; + riov->i = riov->used = riov->consumed = 0; if (wiov) - wiov->i = wiov->used = 0; + wiov->i = wiov->used = wiov->consumed = 0; for (;;) { void *addr; -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 04/13] vringh: explain more about cleaning riov and wiov
riov and wiov can be reused with subsequent calls of vringh_getdesc_*(). Let's add a paragraph in the documentation of these functions to better explain when riov and wiov need to be cleaned up. Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- drivers/vhost/vringh.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index bee63d68201a..2a88e087afd8 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -662,7 +662,10 @@ EXPORT_SYMBOL(vringh_init_user); * *head will be vrh->vring.num. You may be able to ignore an invalid * descriptor, but there's not much you can do with an invalid ring. * - * Note that you may need to clean up riov and wiov, even on error! + * Note that you can reuse riov and wiov with subsequent calls. Content is + * overwritten and memory reallocated if more space is needed. + * When you don't have to use riov and wiov anymore, you should clean up them + * calling vringh_iov_cleanup() to release the memory, even on error! */ int vringh_getdesc_user(struct vringh *vrh, struct vringh_iov *riov, @@ -932,7 +935,10 @@ EXPORT_SYMBOL(vringh_init_kern); * *head will be vrh->vring.num. You may be able to ignore an invalid * descriptor, but there's not much you can do with an invalid ring. * - * Note that you may need to clean up riov and wiov, even on error! + * Note that you can reuse riov and wiov with subsequent calls. Content is + * overwritten and memory reallocated if more space is needed. + * When you don't have to use riov and wiov anymore, you should clean up them + * calling vringh_kiov_cleanup() to release the memory, even on error! */ int vringh_getdesc_kern(struct vringh *vrh, struct vringh_kiov *riov, @@ -1292,7 +1298,10 @@ EXPORT_SYMBOL(vringh_set_iotlb); * *head will be vrh->vring.num. You may be able to ignore an invalid * descriptor, but there's not much you can do with an invalid ring. * - * Note that you may need to clean up riov and wiov, even on error! + * Note that you can reuse riov and wiov with subsequent calls. Content is + * overwritten and memory reallocated if more space is needed. + * When you don't have to use riov and wiov anymore, you should clean up them + * calling vringh_kiov_cleanup() to release the memory, even on error! */ int vringh_getdesc_iotlb(struct vringh *vrh, struct vringh_kiov *riov, -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 05/13] vringh: implement vringh_kiov_advance()
In some cases, it may be useful to provide a way to skip a number of bytes in a vringh_kiov. Let's implement vringh_kiov_advance() for this purpose, reusing the code from vringh_iov_xfer(). We replace that code calling the new vringh_kiov_advance(). Acked-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- include/linux/vringh.h | 2 ++ drivers/vhost/vringh.c | 41 +++++++++++++++++++++++++++++------------ 2 files changed, 31 insertions(+), 12 deletions(-) diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 9c077863c8f6..755211ebd195 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -199,6 +199,8 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov) kiov->iov = NULL; } +void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len); + int vringh_getdesc_kern(struct vringh *vrh, struct vringh_kiov *riov, struct vringh_kiov *wiov, diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index 2a88e087afd8..4af8fa259d65 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -75,6 +75,34 @@ static inline int __vringh_get_head(const struct vringh *vrh, return head; } +/** + * vringh_kiov_advance - skip bytes from vring_kiov + * @iov: an iov passed to vringh_getdesc_*() (updated as we consume) + * @len: the maximum length to advance + */ +void vringh_kiov_advance(struct vringh_kiov *iov, size_t len) +{ + while (len && iov->i < iov->used) { + size_t partlen = min(iov->iov[iov->i].iov_len, len); + + iov->consumed += partlen; + iov->iov[iov->i].iov_len -= partlen; + iov->iov[iov->i].iov_base += partlen; + + if (!iov->iov[iov->i].iov_len) { + /* Fix up old iov element then increment. */ + iov->iov[iov->i].iov_len = iov->consumed; + iov->iov[iov->i].iov_base -= iov->consumed; + + iov->consumed = 0; + iov->i++; + } + + len -= partlen; + } +} +EXPORT_SYMBOL(vringh_kiov_advance); + /* Copy some bytes to/from the iovec. Returns num copied. */ static inline ssize_t vringh_iov_xfer(struct vringh *vrh, struct vringh_kiov *iov, @@ -95,19 +123,8 @@ static inline ssize_t vringh_iov_xfer(struct vringh *vrh, done += partlen; len -= partlen; ptr += partlen; - iov->consumed += partlen; - iov->iov[iov->i].iov_len -= partlen; - iov->iov[iov->i].iov_base += partlen; - if (!iov->iov[iov->i].iov_len) { - /* Fix up old iov element then increment. */ - iov->iov[iov->i].iov_len = iov->consumed; - iov->iov[iov->i].iov_base -= iov->consumed; - - - iov->consumed = 0; - iov->i++; - } + vringh_kiov_advance(iov, partlen); } return done; } -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 06/13] vringh: add vringh_kiov_length() helper
This new helper returns the total number of bytes covered by a vringh_kiov. Suggested-by: Jason Wang <jasowang at redhat.com> Acked-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- include/linux/vringh.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 755211ebd195..84db7b8f912f 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -199,6 +199,17 @@ static inline void vringh_kiov_cleanup(struct vringh_kiov *kiov) kiov->iov = NULL; } +static inline size_t vringh_kiov_length(struct vringh_kiov *kiov) +{ + size_t len = 0; + int i; + + for (i = kiov->i; i < kiov->used; i++) + len += kiov->iov[i].iov_len; + + return len; +} + void vringh_kiov_advance(struct vringh_kiov *kiov, size_t len); int vringh_getdesc_kern(struct vringh *vrh, -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 07/13] vdpa_sim: cleanup kiovs in vdpasim_free()
vringh_getdesc_iotlb() allocates memory to store the kvec, that is freed with vringh_kiov_cleanup(). vringh_getdesc_iotlb() is able to reuse a kvec previously allocated, so in order to avoid to allocate the kvec for each request, we are not calling vringh_kiov_cleanup() when we finished to handle a request, but we should call it when we free the entire device. Acked-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 53238989713d..a7aeb5d01c3e 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -562,8 +562,15 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, u64 iova, u64 size) static void vdpasim_free(struct vdpa_device *vdpa) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + int i; cancel_work_sync(&vdpasim->work); + + for (i = 0; i < vdpasim->dev_attr.nvqs; i++) { + vringh_kiov_cleanup(&vdpasim->vqs[i].out_iov); + vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov); + } + put_iova_domain(&vdpasim->iova); iova_cache_put(); kvfree(vdpasim->buffer); -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 08/13] vdpa: add return value to get_config/set_config callbacks
All implementations of these callbacks already validate inputs. Let's return an error from these callbacks, so the caller doesn't need to validate the input anymore. We update all implementations to return -EINVAL in case of invalid input. Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- include/linux/vdpa.h | 18 ++++++++++-------- drivers/vdpa/ifcvf/ifcvf_main.c | 24 ++++++++++++++++-------- drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 +++++++++++------ drivers/vdpa/vdpa_sim/vdpa_sim.c | 16 ++++++++++------ 4 files changed, 47 insertions(+), 28 deletions(-) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 4ab5494503a8..0e0cbd5fb41b 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -157,6 +157,7 @@ struct vdpa_iova_range { * @buf: buffer used to read to * @len: the length to read from * configuration space + * Returns integer: success (0) or error (< 0) * @set_config: Write to device specific configuration space * @vdev: vdpa device * @offset: offset from the beginning of @@ -164,6 +165,7 @@ struct vdpa_iova_range { * @buf: buffer used to write from * @len: the length to write to * configuration space + * Returns integer: success (0) or error (< 0) * @get_generation: Get device config generation (optional) * @vdev: vdpa device * Returns u32: device generation @@ -231,10 +233,10 @@ struct vdpa_config_ops { u32 (*get_vendor_id)(struct vdpa_device *vdev); u8 (*get_status)(struct vdpa_device *vdev); void (*set_status)(struct vdpa_device *vdev, u8 status); - void (*get_config)(struct vdpa_device *vdev, unsigned int offset, - void *buf, unsigned int len); - void (*set_config)(struct vdpa_device *vdev, unsigned int offset, - const void *buf, unsigned int len); + int (*get_config)(struct vdpa_device *vdev, unsigned int offset, + void *buf, unsigned int len); + int (*set_config)(struct vdpa_device *vdev, unsigned int offset, + const void *buf, unsigned int len); u32 (*get_generation)(struct vdpa_device *vdev); struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev); @@ -329,8 +331,8 @@ static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features) } -static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset, - void *buf, unsigned int len) +static inline int vdpa_get_config(struct vdpa_device *vdev, unsigned offset, + void *buf, unsigned int len) { const struct vdpa_config_ops *ops = vdev->config; @@ -339,8 +341,8 @@ static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset, * If it does happen we assume a legacy guest. */ if (!vdev->features_valid) - vdpa_set_features(vdev, 0); - ops->get_config(vdev, offset, buf, len); + return vdpa_set_features(vdev, 0); + return ops->get_config(vdev, offset, buf, len); } /** diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 7c8bbfcf6c3e..f5e6a90d8114 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -332,24 +332,32 @@ static u32 ifcvf_vdpa_get_vq_align(struct vdpa_device *vdpa_dev) return IFCVF_QUEUE_ALIGNMENT; } -static void ifcvf_vdpa_get_config(struct vdpa_device *vdpa_dev, - unsigned int offset, - void *buf, unsigned int len) +static int ifcvf_vdpa_get_config(struct vdpa_device *vdpa_dev, + unsigned int offset, + void *buf, unsigned int len) { struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); - WARN_ON(offset + len > sizeof(struct virtio_net_config)); + if (offset + len > sizeof(struct virtio_net_config)) + return -EINVAL; + ifcvf_read_net_config(vf, offset, buf, len); + + return 0; } -static void ifcvf_vdpa_set_config(struct vdpa_device *vdpa_dev, - unsigned int offset, const void *buf, - unsigned int len) +static int ifcvf_vdpa_set_config(struct vdpa_device *vdpa_dev, + unsigned int offset, const void *buf, + unsigned int len) { struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); - WARN_ON(offset + len > sizeof(struct virtio_net_config)); + if (offset + len > sizeof(struct virtio_net_config)) + return -EINVAL; + ifcvf_write_net_config(vf, offset, buf, len); + + return 0; } static void ifcvf_vdpa_set_config_cb(struct vdpa_device *vdpa_dev, diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 029822060017..9323b5ff7988 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -1796,20 +1796,25 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status) ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED; } -static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf, - unsigned int len) +static int mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf, + unsigned int len) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); - if (offset + len < sizeof(struct virtio_net_config)) - memcpy(buf, (u8 *)&ndev->config + offset, len); + if (offset + len > sizeof(struct virtio_net_config)) + return -EINVAL; + + memcpy(buf, (u8 *)&ndev->config + offset, len); + + return 0 } -static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf, - unsigned int len) +static int mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf, + unsigned int len) { /* not supported */ + return 0; } static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index a7aeb5d01c3e..3808b01ac703 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -462,32 +462,36 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status) spin_unlock(&vdpasim->lock); } -static void vdpasim_get_config(struct vdpa_device *vdpa, unsigned int offset, - void *buf, unsigned int len) +static int vdpasim_get_config(struct vdpa_device *vdpa, unsigned int offset, + void *buf, unsigned int len) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); if (offset + len > vdpasim->dev_attr.config_size) - return; + return -EINVAL; if (vdpasim->dev_attr.get_config) vdpasim->dev_attr.get_config(vdpasim, vdpasim->config); memcpy(buf, vdpasim->config + offset, len); + + return 0; } -static void vdpasim_set_config(struct vdpa_device *vdpa, unsigned int offset, - const void *buf, unsigned int len) +static int vdpasim_set_config(struct vdpa_device *vdpa, unsigned int offset, + const void *buf, unsigned int len) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); if (offset + len > vdpasim->dev_attr.config_size) - return; + return -EINVAL; memcpy(vdpasim->config + offset, buf, len); if (vdpasim->dev_attr.set_config) vdpasim->dev_attr.set_config(vdpasim, vdpasim->config); + + return 0; } static u32 vdpasim_get_generation(struct vdpa_device *vdpa) -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 09/13] vhost/vdpa: remove vhost_vdpa_config_validate()
get_config() and set_config() callbacks in the 'struct vdpa_config_ops' usually already validated the inputs. Also now they can return an error, so we don't need to validate them here anymore. Let's use the return value of these callbacks and return it in case of error in vhost_vdpa_get_config() and vhost_vdpa_set_config(). Originally-by: Xie Yongji <xieyongji at bytedance.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- drivers/vhost/vdpa.c | 41 +++++++++++++---------------------------- 1 file changed, 13 insertions(+), 28 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index ef688c8c0e0e..d61e779000a8 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -185,51 +185,35 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp) return 0; } -static int vhost_vdpa_config_validate(struct vhost_vdpa *v, - struct vhost_vdpa_config *c) -{ - long size = 0; - - switch (v->virtio_id) { - case VIRTIO_ID_NET: - size = sizeof(struct virtio_net_config); - break; - } - - if (c->len == 0) - return -EINVAL; - - if (c->len > size - c->off) - return -E2BIG; - - return 0; -} - static long vhost_vdpa_get_config(struct vhost_vdpa *v, struct vhost_vdpa_config __user *c) { struct vdpa_device *vdpa = v->vdpa; struct vhost_vdpa_config config; unsigned long size = offsetof(struct vhost_vdpa_config, buf); + long ret; u8 *buf; if (copy_from_user(&config, c, size)) return -EFAULT; - if (vhost_vdpa_config_validate(v, &config)) + if (config.len == 0) return -EINVAL; buf = kvzalloc(config.len, GFP_KERNEL); if (!buf) return -ENOMEM; - vdpa_get_config(vdpa, config.off, buf, config.len); + ret = vdpa_get_config(vdpa, config.off, buf, config.len); + if (ret) + goto out; if (copy_to_user(c->buf, buf, config.len)) { - kvfree(buf); - return -EFAULT; + ret = -EFAULT; + goto out; } +out: kvfree(buf); - return 0; + return ret; } static long vhost_vdpa_set_config(struct vhost_vdpa *v, @@ -239,21 +223,22 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v, const struct vdpa_config_ops *ops = vdpa->config; struct vhost_vdpa_config config; unsigned long size = offsetof(struct vhost_vdpa_config, buf); + long ret; u8 *buf; if (copy_from_user(&config, c, size)) return -EFAULT; - if (vhost_vdpa_config_validate(v, &config)) + if (config.len == 0) return -EINVAL; buf = vmemdup_user(c->buf, config.len); if (IS_ERR(buf)) return PTR_ERR(buf); - ops->set_config(vdpa, config.off, buf, config.len); + ret = ops->set_config(vdpa, config.off, buf, config.len); kvfree(buf); - return 0; + return ret; } static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep) -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 10/13] vhost/vdpa: Remove the restriction that only supports virtio-net devices
From: Xie Yongji <xieyongji at bytedance.com> Since the config checks are done by the vDPA drivers, we can remove the virtio-net restriction and we should be able to support all kinds of virtio devices. <linux/virtio_net.h> is not needed anymore, but we need to include <linux/slab.h> to avoid compilation failures. Signed-off-by: Xie Yongji <xieyongji at bytedance.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- drivers/vhost/vdpa.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index d61e779000a8..3879b1ad12dd 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -16,12 +16,12 @@ #include <linux/cdev.h> #include <linux/device.h> #include <linux/mm.h> +#include <linux/slab.h> #include <linux/iommu.h> #include <linux/uuid.h> #include <linux/vdpa.h> #include <linux/nospec.h> #include <linux/vhost.h> -#include <linux/virtio_net.h> #include "vhost.h" @@ -1006,10 +1006,6 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa) int minor; int r; - /* Currently, we only accept the network devices. */ - if (ops->get_device_id(vdpa) != VIRTIO_ID_NET) - return -ENOTSUPP; - v = kzalloc(sizeof(*v), GFP_KERNEL | __GFP_RETRY_MAYFAIL); if (!v) return -ENOMEM; -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 11/13] vdpa: add vdpa simulator for block device
From: Max Gurtovoy <mgurtovoy at nvidia.com> This will allow running vDPA for virtio block protocol. It's a preliminary implementation with a simple request handling: for each request, only the status (last byte) is set. It's always set to VIRTIO_BLK_S_OK. Also input validation is missing and will be added in the next commits. Signed-off-by: Max Gurtovoy <mgurtovoy at nvidia.com> [sgarzare: various cleanups/fixes] Acked-by: Jason Wang <jasowang at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- v3: - updated Mellanox copyright to NVIDIA [Max] - explained in the commit message that inputs are validated in subsequent patches [Stefan] v2: - rebased on top of other changes (dev_attr, get_config(), notify(), etc.) - memset to 0 the config structure in vdpasim_blk_get_config() - used vdpasim pointer in vdpasim_blk_get_config() v1: - Removed unused headers - Used cpu_to_vdpasim*() to store config fields - Replaced 'select VDPA_SIM' with 'depends on VDPA_SIM' since selected option can not depend on other [Jason] - Start with a single queue for now [Jason] - Add comments to memory barriers --- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 145 +++++++++++++++++++++++++++ drivers/vdpa/Kconfig | 7 ++ drivers/vdpa/vdpa_sim/Makefile | 1 + 3 files changed, 153 insertions(+) create mode 100644 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c new file mode 100644 index 000000000000..9822f9edc511 --- /dev/null +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDPA simulator for block device. + * + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. + * + */ + +#include <linux/init.h> +#include <linux/module.h> +#include <linux/device.h> +#include <linux/kernel.h> +#include <linux/sched.h> +#include <linux/vringh.h> +#include <linux/vdpa.h> +#include <uapi/linux/virtio_blk.h> + +#include "vdpa_sim.h" + +#define DRV_VERSION "0.1" +#define DRV_AUTHOR "Max Gurtovoy <mgurtovoy at nvidia.com>" +#define DRV_DESC "vDPA Device Simulator for block device" +#define DRV_LICENSE "GPL v2" + +#define VDPASIM_BLK_FEATURES (VDPASIM_FEATURES | \ + (1ULL << VIRTIO_BLK_F_SIZE_MAX) | \ + (1ULL << VIRTIO_BLK_F_SEG_MAX) | \ + (1ULL << VIRTIO_BLK_F_BLK_SIZE) | \ + (1ULL << VIRTIO_BLK_F_TOPOLOGY) | \ + (1ULL << VIRTIO_BLK_F_MQ)) + +#define VDPASIM_BLK_CAPACITY 0x40000 +#define VDPASIM_BLK_SIZE_MAX 0x1000 +#define VDPASIM_BLK_SEG_MAX 32 +#define VDPASIM_BLK_VQ_NUM 1 + +static struct vdpasim *vdpasim_blk_dev; + +static void vdpasim_blk_work(struct work_struct *work) +{ + struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); + u8 status = VIRTIO_BLK_S_OK; + int i; + + spin_lock(&vdpasim->lock); + + if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) + goto out; + + for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) { + struct vdpasim_virtqueue *vq = &vdpasim->vqs[i]; + + if (!vq->ready) + continue; + + while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, + &vq->in_iov, &vq->head, + GFP_ATOMIC) > 0) { + int write; + + vq->in_iov.i = vq->in_iov.used - 1; + write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, + &status, 1); + if (write <= 0) + break; + + /* Make sure data is wrote before advancing index */ + smp_wmb(); + + vringh_complete_iotlb(&vq->vring, vq->head, write); + + /* Make sure used is visible before rasing the interrupt. */ + smp_wmb(); + + local_bh_disable(); + if (vringh_need_notify_iotlb(&vq->vring) > 0) + vringh_notify(&vq->vring); + local_bh_enable(); + } + } +out: + spin_unlock(&vdpasim->lock); +} + +static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config) +{ + struct virtio_blk_config *blk_config + (struct virtio_blk_config *)config; + + memset(config, 0, sizeof(struct virtio_blk_config)); + + blk_config->capacity = cpu_to_vdpasim64(vdpasim, VDPASIM_BLK_CAPACITY); + blk_config->size_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SIZE_MAX); + blk_config->seg_max = cpu_to_vdpasim32(vdpasim, VDPASIM_BLK_SEG_MAX); + blk_config->num_queues = cpu_to_vdpasim16(vdpasim, VDPASIM_BLK_VQ_NUM); + blk_config->min_io_size = cpu_to_vdpasim16(vdpasim, 1); + blk_config->opt_io_size = cpu_to_vdpasim32(vdpasim, 1); + blk_config->blk_size = cpu_to_vdpasim32(vdpasim, SECTOR_SIZE); +} + +static int __init vdpasim_blk_init(void) +{ + struct vdpasim_dev_attr dev_attr = {}; + int ret; + + dev_attr.id = VIRTIO_ID_BLOCK; + dev_attr.supported_features = VDPASIM_BLK_FEATURES; + dev_attr.nvqs = VDPASIM_BLK_VQ_NUM; + dev_attr.config_size = sizeof(struct virtio_blk_config); + dev_attr.get_config = vdpasim_blk_get_config; + dev_attr.work_fn = vdpasim_blk_work; + dev_attr.buffer_size = PAGE_SIZE; + + vdpasim_blk_dev = vdpasim_create(&dev_attr); + if (IS_ERR(vdpasim_blk_dev)) { + ret = PTR_ERR(vdpasim_blk_dev); + goto out; + } + + ret = vdpa_register_device(&vdpasim_blk_dev->vdpa); + if (ret) + goto put_dev; + + return 0; + +put_dev: + put_device(&vdpasim_blk_dev->vdpa.dev); +out: + return ret; +} + +static void __exit vdpasim_blk_exit(void) +{ + struct vdpa_device *vdpa = &vdpasim_blk_dev->vdpa; + + vdpa_unregister_device(vdpa); +} + +module_init(vdpasim_blk_init) +module_exit(vdpasim_blk_exit) + +MODULE_VERSION(DRV_VERSION); +MODULE_LICENSE(DRV_LICENSE); +MODULE_AUTHOR(DRV_AUTHOR); +MODULE_DESCRIPTION(DRV_DESC); diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 21a23500f430..b8bd92cf04f9 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -26,6 +26,13 @@ config VDPA_SIM_NET help vDPA networking device simulator which loops TX traffic back to RX. +config VDPA_SIM_BLOCK + tristate "vDPA simulator for block device" + depends on VDPA_SIM + help + vDPA block device simulator which terminates IO request in a + memory buffer. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/vdpa_sim/Makefile b/drivers/vdpa/vdpa_sim/Makefile index 79d4536d347e..d458103302f2 100644 --- a/drivers/vdpa/vdpa_sim/Makefile +++ b/drivers/vdpa/vdpa_sim/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA_SIM) += vdpa_sim.o obj-$(CONFIG_VDPA_SIM_NET) += vdpa_sim_net.o +obj-$(CONFIG_VDPA_SIM_BLOCK) += vdpa_sim_blk.o -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 12/13] vdpa_sim_blk: implement ramdisk behaviour
The previous implementation wrote only the status of each request. This patch implements a more accurate block device simulator, providing a ramdisk-like behavior and adding input validation. Acked-by: Jason Wang <jasowang at redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- v2: - used %zd %zx to print size_t and ssize_t variables in dev_err() - removed unnecessary new line [Jason] - moved VIRTIO_BLK_T_GET_ID in another patch [Jason] - used push/pull instead of write/read terminology - added vdpasim_blk_check_range() to avoid overflows [Stefan] - use vdpasim*_to_cpu instead of le*_to_cpu - used vringh_kiov_length() helper [Jason] --- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 164 ++++++++++++++++++++++++--- 1 file changed, 146 insertions(+), 18 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c index 9822f9edc511..2652a499fb34 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c @@ -3,6 +3,7 @@ * VDPA simulator for block device. * * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2021, Red Hat Inc. All rights reserved. * */ @@ -13,6 +14,7 @@ #include <linux/sched.h> #include <linux/vringh.h> #include <linux/vdpa.h> +#include <linux/blkdev.h> #include <uapi/linux/virtio_blk.h> #include "vdpa_sim.h" @@ -36,10 +38,151 @@ static struct vdpasim *vdpasim_blk_dev; +static bool vdpasim_blk_check_range(u64 start_sector, size_t range_size) +{ + u64 range_sectors = range_size >> SECTOR_SHIFT; + + if (range_size > VDPASIM_BLK_SIZE_MAX * VDPASIM_BLK_SEG_MAX) + return false; + + if (start_sector > VDPASIM_BLK_CAPACITY) + return false; + + if (range_sectors > VDPASIM_BLK_CAPACITY - start_sector) + return false; + + return true; +} + +/* Returns 'true' if the request is handled (with or without an I/O error) + * and the status is correctly written in the last byte of the 'in iov', + * 'false' otherwise. + */ +static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim, + struct vdpasim_virtqueue *vq) +{ + size_t pushed = 0, to_pull, to_push; + struct virtio_blk_outhdr hdr; + ssize_t bytes; + loff_t offset; + u64 sector; + u8 status; + u32 type; + int ret; + + ret = vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, &vq->in_iov, + &vq->head, GFP_ATOMIC); + if (ret != 1) + return false; + + if (vq->out_iov.used < 1 || vq->in_iov.used < 1) { + dev_err(&vdpasim->vdpa.dev, "missing headers - out_iov: %u in_iov %u\n", + vq->out_iov.used, vq->in_iov.used); + return false; + } + + if (vq->in_iov.iov[vq->in_iov.used - 1].iov_len < 1) { + dev_err(&vdpasim->vdpa.dev, "request in header too short\n"); + return false; + } + + /* The last byte is the status and we checked if the last iov has + * enough room for it. + */ + to_push = vringh_kiov_length(&vq->in_iov) - 1; + + to_pull = vringh_kiov_length(&vq->out_iov); + + bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, &hdr, + sizeof(hdr)); + if (bytes != sizeof(hdr)) { + dev_err(&vdpasim->vdpa.dev, "request out header too short\n"); + return false; + } + + to_pull -= bytes; + + type = vdpasim32_to_cpu(vdpasim, hdr.type); + sector = vdpasim64_to_cpu(vdpasim, hdr.sector); + offset = sector << SECTOR_SHIFT; + status = VIRTIO_BLK_S_OK; + + switch (type) { + case VIRTIO_BLK_T_IN: + if (!vdpasim_blk_check_range(sector, to_push)) { + dev_err(&vdpasim->vdpa.dev, + "reading over the capacity - offset: 0x%llx len: 0x%zx\n", + offset, to_push); + status = VIRTIO_BLK_S_IOERR; + break; + } + + bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, + vdpasim->buffer + offset, + to_push); + if (bytes < 0) { + dev_err(&vdpasim->vdpa.dev, + "vringh_iov_push_iotlb() error: %zd offset: 0x%llx len: 0x%zx\n", + bytes, offset, to_push); + status = VIRTIO_BLK_S_IOERR; + break; + } + + pushed += bytes; + break; + + case VIRTIO_BLK_T_OUT: + if (!vdpasim_blk_check_range(sector, to_pull)) { + dev_err(&vdpasim->vdpa.dev, + "writing over the capacity - offset: 0x%llx len: 0x%zx\n", + offset, to_pull); + status = VIRTIO_BLK_S_IOERR; + break; + } + + bytes = vringh_iov_pull_iotlb(&vq->vring, &vq->out_iov, + vdpasim->buffer + offset, + to_pull); + if (bytes < 0) { + dev_err(&vdpasim->vdpa.dev, + "vringh_iov_pull_iotlb() error: %zd offset: 0x%llx len: 0x%zx\n", + bytes, offset, to_pull); + status = VIRTIO_BLK_S_IOERR; + break; + } + break; + + default: + dev_warn(&vdpasim->vdpa.dev, + "Unsupported request type %d\n", type); + status = VIRTIO_BLK_S_IOERR; + break; + } + + /* If some operations fail, we need to skip the remaining bytes + * to put the status in the last byte + */ + if (to_push - pushed > 0) + vringh_kiov_advance(&vq->in_iov, to_push - pushed); + + /* Last byte is the status */ + bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, &status, 1); + if (bytes != 1) + return false; + + pushed += bytes; + + /* Make sure data is wrote before advancing index */ + smp_wmb(); + + vringh_complete_iotlb(&vq->vring, vq->head, pushed); + + return true; +} + static void vdpasim_blk_work(struct work_struct *work) { struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); - u8 status = VIRTIO_BLK_S_OK; int i; spin_lock(&vdpasim->lock); @@ -53,22 +196,7 @@ static void vdpasim_blk_work(struct work_struct *work) if (!vq->ready) continue; - while (vringh_getdesc_iotlb(&vq->vring, &vq->out_iov, - &vq->in_iov, &vq->head, - GFP_ATOMIC) > 0) { - int write; - - vq->in_iov.i = vq->in_iov.used - 1; - write = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, - &status, 1); - if (write <= 0) - break; - - /* Make sure data is wrote before advancing index */ - smp_wmb(); - - vringh_complete_iotlb(&vq->vring, vq->head, write); - + while (vdpasim_blk_handle_req(vdpasim, vq)) { /* Make sure used is visible before rasing the interrupt. */ smp_wmb(); @@ -109,7 +237,7 @@ static int __init vdpasim_blk_init(void) dev_attr.config_size = sizeof(struct virtio_blk_config); dev_attr.get_config = vdpasim_blk_get_config; dev_attr.work_fn = vdpasim_blk_work; - dev_attr.buffer_size = PAGE_SIZE; + dev_attr.buffer_size = VDPASIM_BLK_CAPACITY << SECTOR_SHIFT; vdpasim_blk_dev = vdpasim_create(&dev_attr); if (IS_ERR(vdpasim_blk_dev)) { -- 2.29.2
Stefano Garzarella
2021-Feb-04 17:22 UTC
[PATCH v3 13/13] vdpa_sim_blk: handle VIRTIO_BLK_T_GET_ID
Handle VIRTIO_BLK_T_GET_ID request, always answering the "vdpa_blk_sim" string. Acked-by: Jason Wang <jasowang at redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha at redhat.com> Signed-off-by: Stefano Garzarella <sgarzare at redhat.com> --- v2: - made 'vdpasim_blk_id' static [Jason] --- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c index 2652a499fb34..4e4112dda616 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c @@ -37,6 +37,7 @@ #define VDPASIM_BLK_VQ_NUM 1 static struct vdpasim *vdpasim_blk_dev; +static char vdpasim_blk_id[VIRTIO_BLK_ID_BYTES] = "vdpa_blk_sim"; static bool vdpasim_blk_check_range(u64 start_sector, size_t range_size) { @@ -152,6 +153,20 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim, } break; + case VIRTIO_BLK_T_GET_ID: + bytes = vringh_iov_push_iotlb(&vq->vring, &vq->in_iov, + vdpasim_blk_id, + VIRTIO_BLK_ID_BYTES); + if (bytes < 0) { + dev_err(&vdpasim->vdpa.dev, + "vringh_iov_push_iotlb() error: %zd\n", bytes); + status = VIRTIO_BLK_S_IOERR; + break; + } + + pushed += bytes; + break; + default: dev_warn(&vdpasim->vdpa.dev, "Unsupported request type %d\n", type); -- 2.29.2