Andy Lutomirski
2014-Sep-03 07:50 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at rustcorp.com.au> wrote:> > Andy Lutomirski <luto at amacapital.net> writes: > > There really are virtio devices that are pieces of silicon and not > > figments of a hypervisor's imagination [1]. > > Hi Andy, > > As you're discovering, there's a reason no one has done the DMA > API before. > > So the problem is that ppc64's IOMMU is a platform thing, not a bus > thing. They really do carve out an exception for virtio devices, > because performance (LOTS of performance). It remains to be seen if > other platforms have the same performance issues, but in absence of > other evidence, the answer is yes. > > It's a hack. But having specific virtual-only devices are an even > bigger hack. > > Physical virtio devices have been talked about, but don't actually exist > in Real Life. And someone a virtio PCI card is going to have serious > performance issues: mainly because they'll want the rings in the card's > MMIO region, not allocated by the driver. Being broken on PPC is really > the least of their problems. > > So, what do we do? It'd be nice if Linux virtio Just Worked under Xen, > though Xen's IOMMU is outside the virtio spec. Since virtio_pci can be > a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer > exposed by virtio_pci.c is out.Xen does expose dma_ops. The trick is knowing when to use it.> > I think the best approach is to have a new feature bit (25 is free), > VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to > use the mapping for the bus it is on. A real device would set this, > or it won't work behind an IOMMU. A Xen device would also set this.The devices I care about aren't actually Xen devices. They're devices supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes the virtio device (along with every other PCI device) through to dom0. So this is exactly the same virtio device that regular x86 KVM guests would see. The reason that current code fails is that Xen guest physical addresses aren't the same as the addresses seen by the outer hypervisor. These devices don't know that physical addresses != bus addresses, so they can't advertise that fact. If we ever end up with a virtio_pci device with physical addressing, behind an IOMMU (but ignoring it), on Xen, we'll have a problem, since neither "physical" addressing nor dma ops will work. That being said, there are also proposals for virtio devices supplied by Xen dom0 to domU, and these will presumably work the same way, except that the device implementation will know that it's on Xen. Grr. This is mostly a result of the fact that virtio_pci devices aren't really PCI devices. I still think that virtio_pci shouldn't have to worry about this; ideally this would all be handled higher up in the device hierarchy. x86 already gets this right. Are there any hypervisors except PPC that use virtio_pci, have IOMMUs on the pci slot that virtio_pci lives in, and that use physical addressing? If not, I think that just quirking PPC will work (at least until someone wants IOMMU support in virtio_pci on PPC, in which case doing something using devicetree seems like a reasonable solution). --Andy> > Thoughts? > Rusty. > > PS. I cc'd OASIS virtio-dev: it's subscriber only for IP reasons (to > subscribe you have to promise we can use your suggestion in the > standard). Feel free to remove in any replies, but it's part of > the world we live in...
Rusty Russell
2014-Sep-05 02:31 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
Andy Lutomirski <luto at amacapital.net> writes:> On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at rustcorp.com.au> wrote: >> >> Andy Lutomirski <luto at amacapital.net> writes: >> > There really are virtio devices that are pieces of silicon and not >> > figments of a hypervisor's imagination [1]. >> >> Hi Andy, >> >> As you're discovering, there's a reason no one has done the DMA >> API before. >> >> So the problem is that ppc64's IOMMU is a platform thing, not a bus >> thing. They really do carve out an exception for virtio devices, >> because performance (LOTS of performance). It remains to be seen if >> other platforms have the same performance issues, but in absence of >> other evidence, the answer is yes. >> >> It's a hack. But having specific virtual-only devices are an even >> bigger hack. >> >> Physical virtio devices have been talked about, but don't actually exist >> in Real Life. And someone a virtio PCI card is going to have serious >> performance issues: mainly because they'll want the rings in the card's >> MMIO region, not allocated by the driver. Being broken on PPC is really >> the least of their problems. >> >> So, what do we do? It'd be nice if Linux virtio Just Worked under Xen, >> though Xen's IOMMU is outside the virtio spec. Since virtio_pci can be >> a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer >> exposed by virtio_pci.c is out. > > Xen does expose dma_ops. The trick is knowing when to use it. > >> >> I think the best approach is to have a new feature bit (25 is free), >> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to >> use the mapping for the bus it is on. A real device would set this, >> or it won't work behind an IOMMU. A Xen device would also set this. > > The devices I care about aren't actually Xen devices. They're devices > supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes > the virtio device (along with every other PCI device) through to dom0. > So this is exactly the same virtio device that regular x86 KVM guests > would see. The reason that current code fails is that Xen guest > physical addresses aren't the same as the addresses seen by the outer > hypervisor. > > These devices don't know that physical addresses != bus addresses, so > they can't advertise that fact.Ah, I see. Then we will need a Xen-specific hack.> Grr. This is mostly a result of the fact that virtio_pci devices > aren't really PCI devices. I still think that virtio_pci shouldn't > have to worry about this; ideally this would all be handled higher up > in the device hierarchy. x86 already gets this right.Yes. Adding a feature to say "I am a real PCI device" is possible, but has other issues (particularly as Michael Tsirkin pointed out, what do you do if the driver doesn't understand the feature).> Are there any hypervisors except PPC that use virtio_pci, have IOMMUs > on the pci slot that virtio_pci lives in, and that use physical > addressing? If not, I think that just quirking PPC will work (at > least until someone wants IOMMU support in virtio_pci on PPC, in which > case doing something using devicetree seems like a reasonable > solution).We can either patch to make PPC weird or make Xen weird. I'm on the fence. Two questions for Paulo: 1) When QEMU support IOMMU on x86, will the virtio devices behind it respect the IOMMU (do they use the right memory access primitives?). 2) Are we really going to be able to exclude virtio devices from using the x86 IOMMU in a portable way which will always work? If it's per-bus granularity, will qemu really put them on their own PCI bus and get this right? Or will it sometimes get it wrong and users will end up using virtio devices via IOMMU by accident? If the answers are both "yes", then x86 is going to be able to use virtio+IOMMU, so PPC looks like the odd one out. Otherwise it looks like we're really going to want to stick with the "ignore IOMMU" rule until (handwave future), and we make an exception for Xen. Cheers, Rusty.
Andy Lutomirski
2014-Sep-05 02:57 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Thu, Sep 4, 2014 at 7:31 PM, Rusty Russell <rusty at rustcorp.com.au> wrote:> Andy Lutomirski <luto at amacapital.net> writes: >> On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at rustcorp.com.au> wrote: >>> >>> Andy Lutomirski <luto at amacapital.net> writes: >>> > There really are virtio devices that are pieces of silicon and not >>> > figments of a hypervisor's imagination [1]. >>> >>> Hi Andy, >>> >>> As you're discovering, there's a reason no one has done the DMA >>> API before. >>> >>> So the problem is that ppc64's IOMMU is a platform thing, not a bus >>> thing. They really do carve out an exception for virtio devices, >>> because performance (LOTS of performance). It remains to be seen if >>> other platforms have the same performance issues, but in absence of >>> other evidence, the answer is yes. >>> >>> It's a hack. But having specific virtual-only devices are an even >>> bigger hack. >>> >>> Physical virtio devices have been talked about, but don't actually exist >>> in Real Life. And someone a virtio PCI card is going to have serious >>> performance issues: mainly because they'll want the rings in the card's >>> MMIO region, not allocated by the driver. Being broken on PPC is really >>> the least of their problems. >>> >>> So, what do we do? It'd be nice if Linux virtio Just Worked under Xen, >>> though Xen's IOMMU is outside the virtio spec. Since virtio_pci can be >>> a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer >>> exposed by virtio_pci.c is out. >> >> Xen does expose dma_ops. The trick is knowing when to use it. >> >>> >>> I think the best approach is to have a new feature bit (25 is free), >>> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to >>> use the mapping for the bus it is on. A real device would set this, >>> or it won't work behind an IOMMU. A Xen device would also set this. >> >> The devices I care about aren't actually Xen devices. They're devices >> supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes >> the virtio device (along with every other PCI device) through to dom0. >> So this is exactly the same virtio device that regular x86 KVM guests >> would see. The reason that current code fails is that Xen guest >> physical addresses aren't the same as the addresses seen by the outer >> hypervisor. >> >> These devices don't know that physical addresses != bus addresses, so >> they can't advertise that fact. > > Ah, I see. Then we will need a Xen-specific hack. > >> Grr. This is mostly a result of the fact that virtio_pci devices >> aren't really PCI devices. I still think that virtio_pci shouldn't >> have to worry about this; ideally this would all be handled higher up >> in the device hierarchy. x86 already gets this right. > > Yes. Adding a feature to say "I am a real PCI device" is possible, but > has other issues (particularly as Michael Tsirkin pointed out, what do > you do if the driver doesn't understand the feature). > >> Are there any hypervisors except PPC that use virtio_pci, have IOMMUs >> on the pci slot that virtio_pci lives in, and that use physical >> addressing? If not, I think that just quirking PPC will work (at >> least until someone wants IOMMU support in virtio_pci on PPC, in which >> case doing something using devicetree seems like a reasonable >> solution). > > We can either patch to make PPC weird or make Xen weird. I'm on the > fence. > > Two questions for Paulo: > 1) When QEMU support IOMMU on x86, will the virtio devices behind it > respect the IOMMU (do they use the right memory access primitives?). > > 2) Are we really going to be able to exclude virtio devices from using > the x86 IOMMU in a portable way which will always work? If it's > per-bus granularity, will qemu really put them on their own PCI bus > and get this right? Or will it sometimes get it wrong and users will > end up using virtio devices via IOMMU by accident? > > If the answers are both "yes", then x86 is going to be able to use > virtio+IOMMU, so PPC looks like the odd one out. Otherwise it looks > like we're really going to want to stick with the "ignore IOMMU" rule > until (handwave future), and we make an exception for Xen.There's a third option: try to make virtio-mmio work everywhere (except s390), at least in the long run. This other benefits: it makes minimal hypervisors simpler, I think it'll get rid of the limits on the number of virtio devices in a system. ARM is already going this direction, and I imagine that PPC support would be straightforward (it's already using devicetree). Does virtio-mmio have any reasonable way of doing hotplug? It could also eventually make sense to have a standard for virtio on virtio. --Andy
Benjamin Herrenschmidt
2014-Sep-05 05:16 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Fri, 2014-09-05 at 12:01 +0930, Rusty Russell wrote:> If the answers are both "yes", then x86 is going to be able to use > virtio+IOMMU, so PPC looks like the odd one out.Well, yes and no ... ppc will be able to do that too, it's just pointless and will suck performances. Additionally, it will be incompatible with existing guests since today, the guest assumes physical (doesn't use the dma mapping routines), so even if x86 grows the ability to have virtio behind an iommu in qemu, that will break existing guests.> Otherwise it looks > like we're really going to want to stick with the "ignore IOMMU" rule > until (handwave future), and we make an exception for Xen.Either that or we have a capability that can be negociated. There are other reasons for wanting to allow the use of the DMA ops, such as people using virtio as a transport between two physically connected machines (such as a CPU running a PCIe endpoint to a CPU running a PCIe host, or two hosts connected to a non-transparent switch, essentially using PCIe as a fast network fabric). Cheers, Ben.
Michael S. Tsirkin
2014-Sep-14 08:58 UTC
[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
On Fri, Sep 05, 2014 at 12:01:33PM +0930, Rusty Russell wrote:> Andy Lutomirski <luto at amacapital.net> writes: > > On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at rustcorp.com.au> wrote: > >> > >> Andy Lutomirski <luto at amacapital.net> writes: > >> > There really are virtio devices that are pieces of silicon and not > >> > figments of a hypervisor's imagination [1]. > >> > >> Hi Andy, > >> > >> As you're discovering, there's a reason no one has done the DMA > >> API before. > >> > >> So the problem is that ppc64's IOMMU is a platform thing, not a bus > >> thing. They really do carve out an exception for virtio devices, > >> because performance (LOTS of performance). It remains to be seen if > >> other platforms have the same performance issues, but in absence of > >> other evidence, the answer is yes. > >> > >> It's a hack. But having specific virtual-only devices are an even > >> bigger hack. > >> > >> Physical virtio devices have been talked about, but don't actually exist > >> in Real Life. And someone a virtio PCI card is going to have serious > >> performance issues: mainly because they'll want the rings in the card's > >> MMIO region, not allocated by the driver. Being broken on PPC is really > >> the least of their problems. > >> > >> So, what do we do? It'd be nice if Linux virtio Just Worked under Xen, > >> though Xen's IOMMU is outside the virtio spec. Since virtio_pci can be > >> a module, obvious hacks like having xen_arch_setup initialize a dma_ops pointer > >> exposed by virtio_pci.c is out. > > > > Xen does expose dma_ops. The trick is knowing when to use it. > > > >> > >> I think the best approach is to have a new feature bit (25 is free), > >> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants to > >> use the mapping for the bus it is on. A real device would set this, > >> or it won't work behind an IOMMU. A Xen device would also set this. > > > > The devices I care about aren't actually Xen devices. They're devices > > supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes > > the virtio device (along with every other PCI device) through to dom0. > > So this is exactly the same virtio device that regular x86 KVM guests > > would see. The reason that current code fails is that Xen guest > > physical addresses aren't the same as the addresses seen by the outer > > hypervisor. > > > > These devices don't know that physical addresses != bus addresses, so > > they can't advertise that fact. > > Ah, I see. Then we will need a Xen-specific hack. > > > Grr. This is mostly a result of the fact that virtio_pci devices > > aren't really PCI devices. I still think that virtio_pci shouldn't > > have to worry about this; ideally this would all be handled higher up > > in the device hierarchy. x86 already gets this right. > > Yes. Adding a feature to say "I am a real PCI device" is possible, but > has other issues (particularly as Michael Tsirkin pointed out, what do > you do if the driver doesn't understand the feature). > > > Are there any hypervisors except PPC that use virtio_pci, have IOMMUs > > on the pci slot that virtio_pci lives in, and that use physical > > addressing? If not, I think that just quirking PPC will work (at > > least until someone wants IOMMU support in virtio_pci on PPC, in which > > case doing something using devicetree seems like a reasonable > > solution). > > We can either patch to make PPC weird or make Xen weird. I'm on the > fence. > > Two questions for Paulo: > 1) When QEMU support IOMMU on x86, will the virtio devices behind it > respect the IOMMU (do they use the right memory access primitives?). > > 2) Are we really going to be able to exclude virtio devices from using > the x86 IOMMU in a portable way which will always work? If it's > per-bus granularity, will qemu really put them on their own PCI bus > and get this right? Or will it sometimes get it wrong and users will > end up using virtio devices via IOMMU by accident? > > If the answers are both "yes", then x86 is going to be able to use > virtio+IOMMU, so PPC looks like the odd one out. Otherwise it looks > like we're really going to want to stick with the "ignore IOMMU" rule > until (handwave future), and we make an exception for Xen. > > Cheers, > Rusty.In theory, it's yes to both questions. In practice, with patches merged recently it's no to both questions :). It's a work in progress, but some extra effort to support miltiple PCI roots will be needed on the QEMU side. What problems will surface when we try to do multiple roots? Only time will tell. If it's felt that it's much cleaner to make PPC the odd one out, we can defer enabling iommu in qemu on x86 until ways to bypass it are implemented. But I would be inclined, for pre-1.0 drivers, to make Xen weird. For 1.0 drivers, we have a bit of time to consider this, and maybe PPC guys can come up with some way (can be PV) to tell guest "these devices bypass the IOMMU".> _______________________________________________ > Virtualization mailing list > Virtualization at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Possibly Parallel Threads
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API
- [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API