Andy Lutomirski
2014-Sep-17 16:07 UTC
[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
On Sep 17, 2014 7:13 AM, "Michael S. Tsirkin" <mst at redhat.com> wrote:> > On Wed, Sep 17, 2014 at 08:02:31AM -0400, Benjamin Herrenschmidt wrote: > > On Tue, 2014-09-16 at 22:22 -0700, Andy Lutomirski wrote: > > > On non-PPC systems, virtio_pci should use the DMA API. This fixes > > > virtio_pci on Xen. On PPC, using the DMA API would break things, so > > > we need to preserve the old behavior. > > > > > > The big comment in this patch explains the considerations in more > > > detail. > > > > I still disagree with using CONFIG_PPC as a trigger here. > > > > Fundamentally, the qemu implementation today bypasses IOMMUs on all > > platforms as far as I can tell. > > > > If that changes, we'll have a backward compatibility problem. > > > > The virtio device should advertise whether it's using that bypass > > mode of operation and virtio_pci should react accordingly. > > Well if there's a way to detect that - that's outside the > device, then we probably shouldn't use device-specific > interfaces for this capability.I still think that this is a property of the bus, not the device. x86 has such a mechanism, and this patch uses it transparently. (NB: According to David Woodhouse about ten minutes ago, this should work with a bit of care right now on x86 via ACPI as long as QEMU configures the tables correctly.)> > > > There is a demand for being able to operate on top of an IOMMU on > > powerpc as well for some embedded stuff using PCI as a transport so your > > patch precludes that. > > > > Cheers, > > Ben. > > As far as I can see, nothing changes on PPC so this patch > does not preclude anything that was working? >It shouldn't. That being said, at some point this problem will need solving on PPC, and this patch doesn't help much, other than adding the virtio_ring piece. I'd really like to see the generic or arch IOMMU code handle this so that the PPC special case in the virtio-pci driver can go away. Another reason that this kind of special casing doesn't belong in virtio-pci: if anyone ever binds vfio to a virtio-pci device, if the DMA API doesn't work right, then the kernel will go boom or be compromised. --Andy> > > > > > > Signed-off-by: Andy Lutomirski <luto at amacapital.net> > > > --- > > > drivers/virtio/virtio_pci.c | 90 ++++++++++++++++++++++++++++++++++++++++----- > > > 1 file changed, 81 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c > > > index a1f299fa4626..8ddb0a641878 100644 > > > --- a/drivers/virtio/virtio_pci.c > > > +++ b/drivers/virtio/virtio_pci.c > > > @@ -80,8 +80,10 @@ struct virtio_pci_vq_info > > > /* the number of entries in the queue */ > > > int num; > > > > > > - /* the virtual address of the ring queue */ > > > - void *queue; > > > + /* the ring queue */ > > > + void *queue; /* virtual address */ > > > + dma_addr_t queue_dma_addr; /* bus address */ > > > + bool use_dma_api; /* are we using the DMA API? */ > > > > > > /* the list node for the virtqueues list */ > > > struct list_head node; > > > @@ -388,6 +390,50 @@ static int vp_request_intx(struct virtio_device *vdev) > > > return err; > > > } > > > > > > +static bool vp_use_dma_api(void) > > > +{ > > > + /* > > > + * Due to limitations of the DMA API, we only have two choices: > > > + * use the DMA API (e.g. set up IOMMU mappings or apply Xen's > > > + * physical-to-machine translation) or use direct physical > > > + * addressing. Furthermore, there's no sensible way yet for the > > > + * PCI bus code to tell us whether we're supposed to act like a > > > + * normal PCI device (and use the DMA API) or to do something > > > + * else. So we're stuck with heuristics here. > > > + * > > > + * In general, we would prefer to use the DMA API, since we > > > + * might be driving a physical device, and such devices *must* > > > + * use the DMA API if there is an IOMMU involved. > > > + * > > > + * On x86, there are no physically-mapped emulated virtio PCI > > > + * devices that live behind an IOMMU. On ARM, there don't seem > > > + * to be any hypervisors that use virtio_pci (as opposed to > > > + * virtio_mmio) that also emulate an IOMMU. So using the DMI > > > + * API is safe. > > > + * > > > + * On PowerPC, it's the other way around. There usually is an > > > + * IOMMU between us and the virtio PCI device, but the device is > > > + * probably emulated and ignores the IOMMU. Unfortunately, we > > > + * can't tell whether we're talking to an emulated device or to > > > + * a physical device that really lives behind the IOMMU. That > > > + * means that we're stuck with ignoring the DMA API. > > > + */ > > > + > > > +#ifdef CONFIG_PPC > > > + return false; > > > +#else > > > + /* > > > + * Minor optimization: if the platform promises to have physical > > > + * PCI DMA, we turn off DMA mapping in virtio_ring. If the > > > + * platform's DMA API implementation is well optimized, this > > > + * should have almost no effect, but we already have a branch in > > > + * the vring code, and we can avoid any further indirection with > > > + * very little effort. > > > + */ > > > + return !PCI_DMA_BUS_IS_PHYS; > > > +#endif > > > +} > > > + > > > static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, > > > void (*callback)(struct virtqueue *vq), > > > const char *name, > > > @@ -416,21 +462,30 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, > > > > > > info->num = num; > > > info->msix_vector = msix_vec; > > > + info->use_dma_api = vp_use_dma_api(); > > > > > > - size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN)); > > > - info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO); > > > + size = vring_size(num, VIRTIO_PCI_VRING_ALIGN); > > > + if (info->use_dma_api) { > > > + info->queue = dma_zalloc_coherent(vdev->dev.parent, size, > > > + &info->queue_dma_addr, > > > + GFP_KERNEL); > > > + } else { > > > + info->queue = alloc_pages_exact(PAGE_ALIGN(size), > > > + GFP_KERNEL|__GFP_ZERO); > > > + info->queue_dma_addr = virt_to_phys(info->queue); > > > + } > > > if (info->queue == NULL) { > > > err = -ENOMEM; > > > goto out_info; > > > } > > > > > > /* activate the queue */ > > > - iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, > > > + iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, > > > vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > > > > > > /* create the vring */ > > > vq = vring_new_virtqueue(index, info->num, VIRTIO_PCI_VRING_ALIGN, vdev, > > > - true, false, info->queue, > > > + true, info->use_dma_api, info->queue, > > > vp_notify, callback, name); > > > if (!vq) { > > > err = -ENOMEM; > > > @@ -463,7 +518,12 @@ out_assign: > > > vring_del_virtqueue(vq); > > > out_activate_queue: > > > iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > > > - free_pages_exact(info->queue, size); > > > + if (info->use_dma_api) { > > > + dma_free_coherent(vdev->dev.parent, size, > > > + info->queue, info->queue_dma_addr); > > > + } else { > > > + free_pages_exact(info->queue, PAGE_ALIGN(size)); > > > + } > > > out_info: > > > kfree(info); > > > return ERR_PTR(err); > > > @@ -493,8 +553,13 @@ static void vp_del_vq(struct virtqueue *vq) > > > /* Select and deactivate the queue */ > > > iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); > > > > > > - size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN)); > > > - free_pages_exact(info->queue, size); > > > + size = vring_size(info->num, VIRTIO_PCI_VRING_ALIGN); > > > + if (info->use_dma_api) { > > > + dma_free_coherent(vq->vdev->dev.parent, size, > > > + info->queue, info->queue_dma_addr); > > > + } else { > > > + free_pages_exact(info->queue, PAGE_ALIGN(size)); > > > + } > > > kfree(info); > > > } > > > > > > @@ -713,6 +778,13 @@ static int virtio_pci_probe(struct pci_dev *pci_dev, > > > if (err) > > > goto out; > > > > > > + err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64)); > > > + if (err) > > > + err = dma_set_mask_and_coherent(&pci_dev->dev, > > > + DMA_BIT_MASK(32)); > > > + if (err) > > > + dev_warn(&pci_dev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); > > > + > > > err = pci_request_regions(pci_dev, "virtio-pci"); > > > if (err) > > > goto out_enable_device; > >
David Woodhouse
2014-Sep-17 16:49 UTC
[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
On Wed, 2014-09-17 at 09:07 -0700, Andy Lutomirski wrote:> > I still think that this is a property of the bus, not the device. x86 > has such a mechanism, and this patch uses it transparently.Right. A device driver should use the DMA API. Always. The platform's implementation of the DMA API is then free to observe that the device in question isn't actually served by any IOMMU and give a trivial 1:1 mapping.> > > The virtio device should advertise whether it's using that bypass > > > mode of operation and virtio_pci should react accordingly. > > > > Well if there's a way to detect that - that's outside the > > device, then we probably shouldn't use device-specific > > interfaces for this capability. > > (NB: According to David Woodhouse about ten minutes ago, this should > work with a bit of care right now on x86 via ACPI as long as QEMU > configures the tables correctly.)Well, we don't currently expose a virtual IOMMU to x86 guests at the moment although we're going to want to in the near future. My answer was based on the assumption that it'll actually be an Intel IOMMU that we expose, rather than some paravirtualised IOMMU interface which allows us to support both AMD and Intel hosts. That may not be a valid assumption; I don't think we've really explored this at all. But *given* that assumption, the VT-d ACPI tables allow a given IOMMU unit to be associated either with a list of specific devices, *or* to have a 'catch-all' bit set which means that it owns all otherwise-unmatched devices in that PCI domain. So... if you want some devices to be seen as *not* behind an IOMMU then they need to be in a PCI domain which *doesn't* have a catch-all IOMMU unit. So either you put them in a separate PCI domain all by themselves, or you explicitly list all (possible) PCI BDFs under the IOMMU unit and don't use the catch-all facility. Having said that... don't. Just put the damn thing behind an IOMMU. The guest OS doesn't *have* to use it; it either eschew the IOMMU completely or it can choose to use pass-through mode for specific devices. -- David Woodhouse Open Source Technology Centre David.Woodhouse at intel.com Intel Corporation -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5745 bytes Desc: not available URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20140917/075be87d/attachment.bin>
Benjamin Herrenschmidt
2014-Sep-19 21:28 UTC
[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
On Wed, 2014-09-17 at 09:49 -0700, David Woodhouse wrote:> On Wed, 2014-09-17 at 09:07 -0700, Andy Lutomirski wrote: > > > > I still think that this is a property of the bus, not the device. x86 > > has such a mechanism, and this patch uses it transparently. > > Right. A device driver should use the DMA API. Always.I do agree, and that's why I suggested that instead of doing what it does, that patch should instead: - Always use dma_map_* - Make the dma "ops" of the virtio bus/device point to a set of "null ops" when necessary. This would also make my life easier on powerpc since I will too have cases where I will want virtio to use the dma mapping API, it's just that the *current* implementation is such that qemu always bypasses iommus and we want to keep that mode of operations as an option (even the default) for performances. But qemu isn't the only other implementation of virtio ...> The platform's implementation of the DMA API is then free to observe > that the device in question isn't actually served by any IOMMU and give > a trivial 1:1 mapping.Right.> > > > The virtio device should advertise whether it's using that bypass > > > > mode of operation and virtio_pci should react accordingly. > > > > > > Well if there's a way to detect that - that's outside the > > > device, then we probably shouldn't use device-specific > > > interfaces for this capability. > > > > (NB: According to David Woodhouse about ten minutes ago, this should > > work with a bit of care right now on x86 via ACPI as long as QEMU > > configures the tables correctly.) > > Well, we don't currently expose a virtual IOMMU to x86 guests at the > moment although we're going to want to in the near future. > > My answer was based on the assumption that it'll actually be an Intel > IOMMU that we expose, rather than some paravirtualised IOMMU interface > which allows us to support both AMD and Intel hosts. That may not be a > valid assumption; I don't think we've really explored this at all. > > But *given* that assumption, the VT-d ACPI tables allow a given IOMMU > unit to be associated either with a list of specific devices, *or* to > have a 'catch-all' bit set which means that it owns all > otherwise-unmatched devices in that PCI domain. > > So... if you want some devices to be seen as *not* behind an IOMMU then > they need to be in a PCI domain which *doesn't* have a catch-all IOMMU > unit. So either you put them in a separate PCI domain all by themselves, > or you explicitly list all (possible) PCI BDFs under the IOMMU unit and > don't use the catch-all facility. > > Having said that... don't. Just put the damn thing behind an IOMMU. The > guest OS doesn't *have* to use it; it either eschew the IOMMU completely > or it can choose to use pass-through mode for specific devices.Ben.
Benjamin Herrenschmidt
2014-Sep-19 21:33 UTC
[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
On Wed, 2014-09-17 at 09:07 -0700, Andy Lutomirski wrote:> It shouldn't. That being said, at some point this problem will need > solving on PPC, and this patch doesn't help much, other than adding > the virtio_ring piece. > > I'd really like to see the generic or arch IOMMU code handle this so > that the PPC special case in the virtio-pci driver can go away. > > Another reason that this kind of special casing doesn't belong in > virtio-pci: if anyone ever binds vfio to a virtio-pci device, if the > DMA API doesn't work right, then the kernel will go boom or be > compromised.Well, that's also for these reasons that I think your patch shouldn't go through those virtio_* variants that test that "use_dma_api" flag or whatever you called it last time I looked :-) You should just unconditionally call the DMA API and we should fixup the special cases by overriding the hooks. I can help you do the override hack for PPC for now, until we can do something saner accross all architectures. The main problem is s390 but I had a quick chat with Utz Bacher earlier this week and he seemed to think having s390 use the DMA API shouldn't be a big deal, they can also hookup some transparent set of ops if necessary. Cheers, Ben.
Andy Lutomirski
2014-Sep-20 05:59 UTC
[PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
On Sep 19, 2014 2:33 PM, "Benjamin Herrenschmidt" <benh at kernel.crashing.org> wrote:> > On Wed, 2014-09-17 at 09:07 -0700, Andy Lutomirski wrote: > > It shouldn't. That being said, at some point this problem will need > > solving on PPC, and this patch doesn't help much, other than adding > > the virtio_ring piece. > > > > I'd really like to see the generic or arch IOMMU code handle this so > > that the PPC special case in the virtio-pci driver can go away. > > > > Another reason that this kind of special casing doesn't belong in > > virtio-pci: if anyone ever binds vfio to a virtio-pci device, if the > > DMA API doesn't work right, then the kernel will go boom or be > > compromised. > > Well, that's also for these reasons that I think your patch shouldn't go > through those virtio_* variants that test that "use_dma_api" flag or > whatever you called it last time I looked :-) > > You should just unconditionally call the DMA API and we should fixup the > special cases by overriding the hooks.Unconditionally for virtio-pci or unconditionally for all drivers? I wouldn't be surprised if some of the virtio-mmio archs have trouble, too. Aarch64, at least, seems to be okay with using the DMA API on virtio_mmio devices, even with IOMMU support compiled in. Are there other relevant architectures that are easy to test?> > I can help you do the override hack for PPC for now, until we can do > something saner accross all architectures.Sure. The question is: should the patches go in to 3.18 as is our should they wait? It would be straightforward to remove the use_dma_api switch PPC, s390, and virtio_mmio are ready. --Andy> > The main problem is s390 but I had a quick chat with Utz Bacher earlier > this week and he seemed to think having s390 use the DMA API shouldn't > be a big deal, they can also hookup some transparent set of ops if > necessary. > > Cheers, > Ben. > >
Maybe Matching Threads
- [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
- [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
- [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
- [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible
- [PATCH v5 2/3] virtio_pci: Use the DMA API for virtqueues when possible