Michael S. Tsirkin
2015-Sep-01 08:17 UTC
rfc: vhost user enhancements for vm2vm communication
On Mon, Aug 31, 2015 at 11:35:55AM -0700, Nakajima, Jun wrote:> On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > Hello! > > During the KVM forum, we discussed supporting virtio on top > > of ivshmem. I have considered it, and came up with an alternative > > that has several advantages over that - please see below. > > Comments welcome. > > Hi Michael, > > I like this, and it should be able to achieve what I presented at KVM > Forum (vhost-user-shmem). > Comments below. > > > > > ----- > > > > Existing solutions to userspace switching between VMs on the > > same host are vhost-user and ivshmem. > > > > vhost-user works by mapping memory of all VMs being bridged into the > > switch memory space. > > > > By comparison, ivshmem works by exposing a shared region of memory to all VMs. > > VMs are required to use this region to store packets. The switch only > > needs access to this region. > > > > Another difference between vhost-user and ivshmem surfaces when polling > > is used. With vhost-user, the switch is required to handle > > data movement between VMs, if using polling, this means that 1 host CPU > > needs to be sacrificed for this task. > > > > This is easiest to understand when one of the VMs is > > used with VF pass-through. This can be schematically shown below: > > > > +-- VM1 --------------+ +---VM2-----------+ > > | virtio-pci +-vhost-user-+ virtio-pci -- VF | -- VFIO -- IOMMU -- NIC > > +---------------------+ +-----------------+ > > > > > > With ivshmem in theory communication can happen directly, with two VMs > > polling the shared memory region. > > > > > > I won't spend time listing advantages of vhost-user over ivshmem. > > Instead, having identified two advantages of ivshmem over vhost-user, > > below is a proposal to extend vhost-user to gain the advantages > > of ivshmem. > > > > > > 1: virtio in guest can be extended to allow support > > for IOMMUs. This provides guest with full flexibility > > about memory which is readable or write able by each device. > > I assume that you meant VFIO only for virtio by "use of VFIO". To get > VFIO working for general direct-I/O (including VFs) in guests, as you > know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt > remapping table on x86 (i.e. nested VT-d).Not necessarily: if pmd is used, mappings stay mostly static, and there are no interrupts, so existing IOMMU emulation in qemu will do the job.> > By setting up a virtio device for each other VM we need to > > communicate to, guest gets full control of its security, from > > mapping all memory (like with current vhost-user) to only > > mapping buffers used for networking (like ivshmem) to > > transient mappings for the duration of data transfer only. > > And I think that we can use VMFUNC to have such transient mappings.Interesting. There are two points to make here: 1. To create transient mappings, VMFUNC isn't strictly required. Instead, mappings can be created when first access by VM2 within BAR triggers a page fault. I guess VMFUNC could remove this first pagefault by hypervisor mapping host PTE into the alternative view, then VMFUNC making VM2 PTE valid - might be important if mappings are very dynamic so there are many pagefaults. 2. To invalidate mappings, VMFUNC isn't sufficient since translation cache of other CPUs needs to be invalidated. I don't think VMFUNC can do this.> > This also allows use of VFIO within guests, for improved > > security. > > > > vhost user would need to be extended to send the > > mappings programmed by guest IOMMU. > > Right. We need to think about cases where other VMs (VM3, etc.) join > the group or some existing VM leaves. > PCI hot-plug should work there (as you point out at "Advantages over > ivshmem" below). > > > > > 2. qemu can be extended to serve as a vhost-user client: > > remote VM mappings over the vhost-user protocol, and > > map them into another VM's memory. > > This mapping can take, for example, the form of > > a BAR of a pci device, which I'll call here vhost-pci - > > with bus address allowed > > by VM1's IOMMU mappings being translated into > > offsets within this BAR within VM2's physical > > memory space. > > I think it's sensible. > > > > > Since the translation can be a simple one, VM2 > > can perform it within its vhost-pci device driver. > > > > While this setup would be the most useful with polling, > > VM1's ioeventfd can also be mapped to > > another VM2's irqfd, and vice versa, such that VMs > > can trigger interrupts to each other without need > > for a helper thread on the host. > > > > > > The resulting channel might look something like the following: > > > > +-- VM1 --------------+ +---VM2-----------+ > > | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NIC > > +---------------------+ +-----------------+ > > > > comparing the two diagrams, a vhost-user thread on the host is > > no longer required, reducing the host CPU utilization when > > polling is active. At the same time, VM2 can not access all of VM1's > > memory - it is limited by the iommu configuration setup by VM1. > > > > > > Advantages over ivshmem: > > > > - more flexibility, endpoint VMs do not have to place data at any > > specific locations to use the device, in practice this likely > > means less data copies. > > - better standardization/code reuse > > virtio changes within guests would be fairly easy to implement > > and would also benefit other backends, besides vhost-user > > standard hotplug interfaces can be used to add and remove these > > channels as VMs are added or removed. > > - migration support > > It's easy to implement since ownership of memory is well defined. > > For example, during migration VM2 can notify hypervisor of VM1 > > by updating dirty bitmap each time is writes into VM1 memory. > > Also, the ivshmem functionality could be implemented by this proposal: > - vswitch (or some VM) allocates memory regions in its address space, and > - it sets up that IOMMU mappings on the VMs be translated into the regionsI agree it's possible, but that's not something that exists on real hardware. It's not clear to me what are the security implications of having VM2 control IOMMU of VM1. Having each VM control its own IOMMU seems more straight-forward.> > > > Thanks, > > > > -- > > MST > > _______________________________________________ > > Virtualization mailing list > > Virtualization at lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization > > > -- > Jun > Intel Open Source Technology Center
My previous email has been bounced by virtio-dev at lists.oasis-open.org. I tried to subscribed it, but to no avail... On Tue, Sep 1, 2015 at 1:17 AM, Michael S. Tsirkin <mst at redhat.com> wrote:> On Mon, Aug 31, 2015 at 11:35:55AM -0700, Nakajima, Jun wrote: >> On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin <mst at redhat.com> wrote:>> > 1: virtio in guest can be extended to allow support >> > for IOMMUs. This provides guest with full flexibility >> > about memory which is readable or write able by each device. >> >> I assume that you meant VFIO only for virtio by "use of VFIO". To get >> VFIO working for general direct-I/O (including VFs) in guests, as you >> know, we need to virtualize IOMMU (e.g. VT-d) and the interrupt >> remapping table on x86 (i.e. nested VT-d). > > Not necessarily: if pmd is used, mappings stay mostly static, > and there are no interrupts, so existing IOMMU emulation in qemu > will do the job.OK. It would work, although we need to engage additional/complex code in the guests when we are making just memory operations under the hood.>> > By setting up a virtio device for each other VM we need to >> > communicate to, guest gets full control of its security, from >> > mapping all memory (like with current vhost-user) to only >> > mapping buffers used for networking (like ivshmem) to >> > transient mappings for the duration of data transfer only. >> >> And I think that we can use VMFUNC to have such transient mappings. > > Interesting. There are two points to make here: > > > 1. To create transient mappings, VMFUNC isn't strictly required. > Instead, mappings can be created when first access by VM2 > within BAR triggers a page fault. > I guess VMFUNC could remove this first pagefault by hypervisor mapping > host PTE into the alternative view, then VMFUNC making > VM2 PTE valid - might be important if mappings are very dynamic > so there are many pagefaults.I agree that VMFUNC isn't strictly required. It would provide performance optimization. And I think it can add some level of protection as well because you might want to keep mapping guest physical memory (which is partial or entire VM1's memory) at BAR of VM2 all the time. IOMMU on VM1 can limit the address ranges accessed by VM2, but such restriction becomes loose as you want them static and thus large enough.> > 2. To invalidate mappings, VMFUNC isn't sufficient since > translation cache of other CPUs needs to be invalidated. > I don't think VMFUNC can do this.I don't think we need to invalidate mappings often. And if we do, we need to invalidate EPT anyway.>> >> Also, the ivshmem functionality could be implemented by this proposal: >> - vswitch (or some VM) allocates memory regions in its address space, and >> - it sets up that IOMMU mappings on the VMs be translated into the regions > > I agree it's possible, but that's not something that exists on real > hardware. It's not clear to me what are the security implications > of having VM2 control IOMMU of VM1. Having each VM control its own IOMMU > seems more straight-forward.I meant the vswitch's IOMMU. It can a bare-metal (or host) process or VM. For a bare-metal process, it's basically VFIO, where virtual address is used as bus address. Each VM accesses the shared memory using vhost-pci BAR + bus (i.e. virtual) address. -- Jun Intel Open Source Technology Center
Possibly Parallel Threads
- rfc: vhost user enhancements for vm2vm communication
- rfc: vhost user enhancements for vm2vm communication
- rfc: vhost user enhancements for vm2vm communication
- [Qemu-devel] rfc: vhost user enhancements for vm2vm communication
- [Qemu-devel] rfc: vhost user enhancements for vm2vm communication