Michael S. Tsirkin
2021-Apr-14 07:34 UTC
[PATCH v6 00/10] Introduce VDUSE - vDPA Device in Userspace
On Wed, Mar 31, 2021 at 04:05:09PM +0800, Xie Yongji wrote:> This series introduces a framework, which can be used to implement > vDPA Devices in a userspace program. The work consist of two parts: > control path forwarding and data path offloading. > > In the control path, the VDUSE driver will make use of message > mechnism to forward the config operation from vdpa bus driver > to userspace. Userspace can use read()/write() to receive/reply > those control messages. > > In the data path, the core is mapping dma buffer into VDUSE > daemon's address space, which can be implemented in different ways > depending on the vdpa bus to which the vDPA device is attached. > > In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma > buffer is reside in a userspace memory region which can be shared to the > VDUSE userspace processs via transferring the shmfd. > > The details and our user case is shown below: > > ------------------------ ------------------------- ---------------------------------------------- > | Container | | QEMU(VM) | | VDUSE daemon | > | --------- | | ------------------- | | ------------------------- ---------------- | > | |dev/vdx| | | |/dev/vhost-vdpa-x| | | | vDPA device emulation | | block driver | | > ------------+----------- -----------+------------ -------------+----------------------+--------- > | | | | > | | | | > ------------+---------------------------+----------------------------+----------------------+--------- > | | block device | | vhost device | | vduse driver | | TCP/IP | | > | -------+-------- --------+-------- -------+-------- -----+---- | > | | | | | | > | ----------+---------- ----------+----------- -------+------- | | > | | virtio-blk driver | | vhost-vdpa driver | | vdpa device | | | > | ----------+---------- ----------+----------- -------+------- | | > | | virtio bus | | | | > | --------+----+----------- | | | | > | | | | | | > | ----------+---------- | | | | > | | virtio-blk device | | | | | > | ----------+---------- | | | | > | | | | | | > | -----------+----------- | | | | > | | virtio-vdpa driver | | | | | > | -----------+----------- | | | | > | | | | vdpa bus | | > | -----------+----------------------+---------------------------+------------ | | > | ---+--- | > -----------------------------------------------------------------------------------------| NIC |------ > ---+--- > | > ---------+--------- > | Remote Storages | > -------------------This all looks quite similar to vhost-user-block except that one does not need any kernel support at all. So I am still scratching my head about its advantages over vhost-user-block.> We make use of it to implement a block device connecting to > our distributed storage, which can be used both in containers and > VMs. Thus, we can have an unified technology stack in this two cases.Maybe the container part is the answer. How does that stack look?> To test it with null-blk: > > $ qemu-storage-daemon \ > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ > --monitor chardev=charmonitor \ > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \ > --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128 > > The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse > > Future work: > - Improve performance > - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm) > > V5 to V6: > - Export receive_fd() instead of __receive_fd() > - Factor out the unmapping logic of pa and va separatedly > - Remove the logic of bounce page allocation in page fault handler > - Use PAGE_SIZE as IOVA allocation granule > - Add EPOLLOUT support > - Enable setting API version in userspace > - Fix some bugs > > V4 to V5: > - Remove the patch for irq binding > - Use a single IOTLB for all types of mapping > - Factor out vhost_vdpa_pa_map() > - Add some sample codes in document > - Use receice_fd_user() to pass file descriptor > - Fix some bugs > > V3 to V4: > - Rebase to vhost.git > - Split some patches > - Add some documents > - Use ioctl to inject interrupt rather than eventfd > - Enable config interrupt support > - Support binding irq to the specified cpu > - Add two module parameter to limit bounce/iova size > - Create char device rather than anon inode per vduse > - Reuse vhost IOTLB for iova domain > - Rework the message mechnism in control path > > V2 to V3: > - Rework the MMU-based IOMMU driver > - Use the iova domain as iova allocator instead of genpool > - Support transferring vma->vm_file in vhost-vdpa > - Add SVA support in vhost-vdpa > - Remove the patches on bounce pages reclaim > > V1 to V2: > - Add vhost-vdpa support > - Add some documents > - Based on the vdpa management tool > - Introduce a workqueue for irq injection > - Replace interval tree with array map to store the iova_map > > Xie Yongji (10): > file: Export receive_fd() to modules > eventfd: Increase the recursion depth of eventfd_signal() > vhost-vdpa: protect concurrent access to vhost device iotlb > vhost-iotlb: Add an opaque pointer for vhost IOTLB > vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() > vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() > vdpa: Support transferring virtual addressing during DMA mapping > vduse: Implement an MMU-based IOMMU driver > vduse: Introduce VDUSE - vDPA Device in Userspace > Documentation: Add documentation for VDUSE > > Documentation/userspace-api/index.rst | 1 + > Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + > Documentation/userspace-api/vduse.rst | 212 +++ > drivers/vdpa/Kconfig | 10 + > drivers/vdpa/Makefile | 1 + > drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- > drivers/vdpa/vdpa.c | 9 +- > drivers/vdpa/vdpa_sim/vdpa_sim.c | 8 +- > drivers/vdpa/vdpa_user/Makefile | 5 + > drivers/vdpa/vdpa_user/iova_domain.c | 521 ++++++++ > drivers/vdpa/vdpa_user/iova_domain.h | 70 + > drivers/vdpa/vdpa_user/vduse_dev.c | 1362 ++++++++++++++++++++ > drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- > drivers/vhost/iotlb.c | 20 +- > drivers/vhost/vdpa.c | 154 ++- > fs/eventfd.c | 2 +- > fs/file.c | 6 + > include/linux/eventfd.h | 5 +- > include/linux/file.h | 7 +- > include/linux/vdpa.h | 21 +- > include/linux/vhost_iotlb.h | 3 + > include/uapi/linux/vduse.h | 175 +++ > 23 files changed, 2548 insertions(+), 51 deletions(-) > create mode 100644 Documentation/userspace-api/vduse.rst > create mode 100644 drivers/vdpa/vdpa_user/Makefile > create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c > create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h > create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c > create mode 100644 include/uapi/linux/vduse.h > > -- > 2.11.0
Jason Wang
2021-Apr-14 07:49 UTC
[PATCH v6 00/10] Introduce VDUSE - vDPA Device in Userspace
? 2021/4/14 ??3:34, Michael S. Tsirkin ??:> On Wed, Mar 31, 2021 at 04:05:09PM +0800, Xie Yongji wrote: >> This series introduces a framework, which can be used to implement >> vDPA Devices in a userspace program. The work consist of two parts: >> control path forwarding and data path offloading. >> >> In the control path, the VDUSE driver will make use of message >> mechnism to forward the config operation from vdpa bus driver >> to userspace. Userspace can use read()/write() to receive/reply >> those control messages. >> >> In the data path, the core is mapping dma buffer into VDUSE >> daemon's address space, which can be implemented in different ways >> depending on the vdpa bus to which the vDPA device is attached. >> >> In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with >> bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma >> buffer is reside in a userspace memory region which can be shared to the >> VDUSE userspace processs via transferring the shmfd. >> >> The details and our user case is shown below: >> >> ------------------------ ------------------------- ---------------------------------------------- >> | Container | | QEMU(VM) | | VDUSE daemon | >> | --------- | | ------------------- | | ------------------------- ---------------- | >> | |dev/vdx| | | |/dev/vhost-vdpa-x| | | | vDPA device emulation | | block driver | | >> ------------+----------- -----------+------------ -------------+----------------------+--------- >> | | | | >> | | | | >> ------------+---------------------------+----------------------------+----------------------+--------- >> | | block device | | vhost device | | vduse driver | | TCP/IP | | >> | -------+-------- --------+-------- -------+-------- -----+---- | >> | | | | | | >> | ----------+---------- ----------+----------- -------+------- | | >> | | virtio-blk driver | | vhost-vdpa driver | | vdpa device | | | >> | ----------+---------- ----------+----------- -------+------- | | >> | | virtio bus | | | | >> | --------+----+----------- | | | | >> | | | | | | >> | ----------+---------- | | | | >> | | virtio-blk device | | | | | >> | ----------+---------- | | | | >> | | | | | | >> | -----------+----------- | | | | >> | | virtio-vdpa driver | | | | | >> | -----------+----------- | | | | >> | | | | vdpa bus | | >> | -----------+----------------------+---------------------------+------------ | | >> | ---+--- | >> -----------------------------------------------------------------------------------------| NIC |------ >> ---+--- >> | >> ---------+--------- >> | Remote Storages | >> ------------------- > This all looks quite similar to vhost-user-block except that one > does not need any kernel support at all. > > So I am still scratching my head about its advantages over > vhost-user-block. > > >> We make use of it to implement a block device connecting to >> our distributed storage, which can be used both in containers and >> VMs. Thus, we can have an unified technology stack in this two cases. > Maybe the container part is the answer. How does that stack look?Yong Ji may add more and I think this has been demonstrated in the above figure: the userspace vDPA device can provide a kenrel virito-blk device via virtio_vdpa driver. Thanks> >> To test it with null-blk: >> >> $ qemu-storage-daemon \ >> --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ >> --monitor chardev=charmonitor \ >> --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \ >> --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128 >> >> The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse >> >> Future work: >> - Improve performance >> - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm) >> >> V5 to V6: >> - Export receive_fd() instead of __receive_fd() >> - Factor out the unmapping logic of pa and va separatedly >> - Remove the logic of bounce page allocation in page fault handler >> - Use PAGE_SIZE as IOVA allocation granule >> - Add EPOLLOUT support >> - Enable setting API version in userspace >> - Fix some bugs >> >> V4 to V5: >> - Remove the patch for irq binding >> - Use a single IOTLB for all types of mapping >> - Factor out vhost_vdpa_pa_map() >> - Add some sample codes in document >> - Use receice_fd_user() to pass file descriptor >> - Fix some bugs >> >> V3 to V4: >> - Rebase to vhost.git >> - Split some patches >> - Add some documents >> - Use ioctl to inject interrupt rather than eventfd >> - Enable config interrupt support >> - Support binding irq to the specified cpu >> - Add two module parameter to limit bounce/iova size >> - Create char device rather than anon inode per vduse >> - Reuse vhost IOTLB for iova domain >> - Rework the message mechnism in control path >> >> V2 to V3: >> - Rework the MMU-based IOMMU driver >> - Use the iova domain as iova allocator instead of genpool >> - Support transferring vma->vm_file in vhost-vdpa >> - Add SVA support in vhost-vdpa >> - Remove the patches on bounce pages reclaim >> >> V1 to V2: >> - Add vhost-vdpa support >> - Add some documents >> - Based on the vdpa management tool >> - Introduce a workqueue for irq injection >> - Replace interval tree with array map to store the iova_map >> >> Xie Yongji (10): >> file: Export receive_fd() to modules >> eventfd: Increase the recursion depth of eventfd_signal() >> vhost-vdpa: protect concurrent access to vhost device iotlb >> vhost-iotlb: Add an opaque pointer for vhost IOTLB >> vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() >> vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() >> vdpa: Support transferring virtual addressing during DMA mapping >> vduse: Implement an MMU-based IOMMU driver >> vduse: Introduce VDUSE - vDPA Device in Userspace >> Documentation: Add documentation for VDUSE >> >> Documentation/userspace-api/index.rst | 1 + >> Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + >> Documentation/userspace-api/vduse.rst | 212 +++ >> drivers/vdpa/Kconfig | 10 + >> drivers/vdpa/Makefile | 1 + >> drivers/vdpa/ifcvf/ifcvf_main.c | 2 +- >> drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +- >> drivers/vdpa/vdpa.c | 9 +- >> drivers/vdpa/vdpa_sim/vdpa_sim.c | 8 +- >> drivers/vdpa/vdpa_user/Makefile | 5 + >> drivers/vdpa/vdpa_user/iova_domain.c | 521 ++++++++ >> drivers/vdpa/vdpa_user/iova_domain.h | 70 + >> drivers/vdpa/vdpa_user/vduse_dev.c | 1362 ++++++++++++++++++++ >> drivers/vdpa/virtio_pci/vp_vdpa.c | 2 +- >> drivers/vhost/iotlb.c | 20 +- >> drivers/vhost/vdpa.c | 154 ++- >> fs/eventfd.c | 2 +- >> fs/file.c | 6 + >> include/linux/eventfd.h | 5 +- >> include/linux/file.h | 7 +- >> include/linux/vdpa.h | 21 +- >> include/linux/vhost_iotlb.h | 3 + >> include/uapi/linux/vduse.h | 175 +++ >> 23 files changed, 2548 insertions(+), 51 deletions(-) >> create mode 100644 Documentation/userspace-api/vduse.rst >> create mode 100644 drivers/vdpa/vdpa_user/Makefile >> create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c >> create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h >> create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c >> create mode 100644 include/uapi/linux/vduse.h >> >> -- >> 2.11.0