Hi Joerg, On 12/12/2018 10:35, Joerg Roedel wrote:> Hi, > > to make progress on this, we should first agree on the protocol used > between guest and host. I have a few points to discuss on the protocol > first. > > On Tue, Dec 11, 2018 at 06:20:57PM +0000, Jean-Philippe Brucker wrote: >> [1] Virtio-iommu specification v0.9, sources and pdf >>???? git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9 >>???? http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf > > Looking at this I wonder why it doesn't make the IOTLB visible to the > guest. the UNMAP requests seem to require that the TLB is already > flushed to make the unmap visible. > > I think that will cost significant performance for both, vfio and > dma-iommu use-cases which both do (vfio at least to some degree), > deferred flushing.We already do deferred flush: UNMAP requests are added to the queue by iommu_unmap(), and then flushed out by iotlb_sync(). So we switch to the host only on iotlb_sync(), or when the request queue is full.> I also wonder whether the protocol should implement a > protocol version handshake and iommu-feature set queries.With the virtio transport there is a handshake when the device (IOMMU) is initialized, through feature bits and global config fields. Feature bits are made of both transport-specific features, including the version number, and device-specific features defined in section 2.3 of the above document (the transport is described in the virtio 1.0 specification). The device presents features that it supports in a register, and the driver masks out the feature bits that it doesn't support. Then the driver sets the global status to FEATURES_OK and initialization continues. In addition virtio-iommu has per-endpoint features through the PROBE request, since the vIOMMU may manage hardware (VFIO) and software (virtio) endpoints at the same time, which don't have the same DMA capabilities (different IOVA ranges, page granularity, reserved ranges, pgtable sharing, etc). At the moment this is a one-way probe, not a handshake. The device simply fills the properties of each endpoint, but the driver doesn't have to ack them. Initially there was a way to negotiate each PROBE property but it was deemed unnecessary during review. By leaving a few spare bits in the property headers I made sure it can be added back with a feature bit if we ever need it.>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 >>???? git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 > > Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing > in this patch-set to make this work on x86?You should be able to access it here: http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel That branch contains missing bits for x86 support: * ACPI support. We have the code but it's waiting for an IORT spec update, to reserve the IORT node ID. I expect it to take a while, given that I'm alone requesting a change for something that's not upstream or in hardware. * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm not sure how to implement the glue that sets dma_ops properly. Thanks, Jean
On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm > not sure how to implement the glue that sets dma_ops properly.http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops
On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote:> >> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 > >>???? git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 > > > > Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing > > in this patch-set to make this work on x86? > > You should be able to access it here: > http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel > > That branch contains missing bits for x86 support: > > * ACPI support. We have the code but it's waiting for an IORT spec > update, to reserve the IORT node ID. I expect it to take a while, given > that I'm alone requesting a change for something that's not upstream or > in hardware.Frankly I think you should take a hard look at just getting the data needed from the PCI device itself. You don't need to depend on virtio, it can be a small driver that gets you that data from the device config space and then just goes away. If you want help with writing such a small driver let me know. If there's an advantage to virtio-iommu then that would be its portability, and it all goes out of the window because of dependencies on ACPI and DT and OF and the rest of the zoo.> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm > not sure how to implement the glue that sets dma_ops properly. > > Thanks, > JeanOK so IIUC you are looking into Christoph's suggestions to fix that up? There's still a bit of time left before the merge window, maybe you can make above changes. -- MST
On 19/12/2018 23:09, Michael S. Tsirkin wrote:> On Thu, Dec 13, 2018 at 12:50:29PM +0000, Jean-Philippe Brucker wrote: >>>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9.1 >>>> ???? git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9 >>> >>> Unfortunatly gitweb seems to be broken on linux-arm.org. What is missing >>> in this patch-set to make this work on x86? >> >> You should be able to access it here: >> http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/virtio-iommu/devel >> >> That branch contains missing bits for x86 support: >> >> * ACPI support. We have the code but it's waiting for an IORT spec >> update, to reserve the IORT node ID. I expect it to take a while, given >> that I'm alone requesting a change for something that's not upstream or >> in hardware. > > Frankly I think you should take a hard look at just getting the data > needed from the PCI device itself. You don't need to depend on virtio, > it can be a small driver that gets you that data from the device config > space and then just goes away. > > If you want help with writing such a small driver let me know. > > If there's an advantage to virtio-iommu then that would be its > portability, and it all goes out of the window because > of dependencies on ACPI and DT and OF and the rest of the zoo.But the portable solutions are ACPI and DT. Describing the DMA dependency through a device would require the guest to probe the device before all others. How do we communicate this? * pass a kernel parameter saying something like "probe_first=00:01.0" * make sure that the PCI root complex is probed before any other platform device (since the IOMMU can manage DMA of platform devices). * change DT, ACPI and PCI core code to handle this probe_first kernel parameter. Better go with something standard, that any OS and hypervisor knows how to use, and that other IOMMU devices already use.>> * DMA ops for x86 (see "HACK" commit). I'd like to use dma-iommu but I'm >> not sure how to implement the glue that sets dma_ops properly. >> >> Thanks, >> Jean > > OK so IIUC you are looking into Christoph's suggestions to fix that up?Eventually yes. I'll give it a try next year, once the dma-iommu changes are on the list. It's not a priority for me, given that x86 already has a pvIOMMU with VT-d, and that Arm still needs one. It shouldn't block this series.> There's still a bit of time left before the merge window, > maybe you can make above changes.I'll wait to see if Joerg has other concerns about the design or the code, and resend in January. I think that IOMMU driver changes should go through his tree. Thanks, Jean