On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote:> On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > > So let's differenciate the two problems of having an IOMMU (real or > > emulated) which indeeds adds overhead etc... and using the DMA API. > > > > At the moment, virtio does this all over the place: > > > > if (use_dma_api) > > dma_map/alloc_something(...) > > else > > use_pa > > > > The idea of the patch set is to do two, somewhat orthogonal, changes > > that together achieve what we want. Let me know where you think there > > is "a bunch of issues" because I'm missing it: > > > > 1- Replace the above if/else constructs with just calling the DMA API, > > and have virtio, at initialization, hookup its own dma_ops that just > > "return pa" (roughly) when the IOMMU stuff isn't used. > > > > This adds an indirect function call to the path that previously didn't > > have one (the else case above). Is that a significant/measurable > > overhead ? > > If you call it often enough it does: > > https://www.spinics.net/lists/netdev/msg495413.html > > > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically.It seems reasonable to teach a platform to override dma-range for a specific device e.g. in case it knows about bugs in ACPI. -- MST
On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote:> It seems reasonable to teach a platform to override dma-range > for a specific device e.g. in case it knows about bugs in ACPI.A platform will be able override dma-range using the dev->bus_dma_mask field starting in 4.19. But we'll still need a way how to a) document in the virtio spec that all bus dma quirks are to be applied b) a way to document in a virtio-related spec how the bus handles dma for Ben's totally fucked up hypervisor. Without that there is not way we'll get interoperable implementations.
On Sat, Aug 04, 2018 at 01:15:00AM -0700, Christoph Hellwig wrote:> On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote: > > It seems reasonable to teach a platform to override dma-range > > for a specific device e.g. in case it knows about bugs in ACPI. > > A platform will be able override dma-range using the dev->bus_dma_mask > field starting in 4.19. But we'll still need a way how to > > a) document in the virtio spec that all bus dma quirks are to be > appliedI agree it's a good idea. In particular I suspect that PLATFORM_IOMMU should be extended to cover that. But see below.> b) a way to document in a virtio-related spec how the bus handles > dma for Ben's totally fucked up hypervisor. Without that there > is not way we'll get interoperable implementations.So in this case however I'm not sure what exactly do we want to add. It seems that from point of view of the device, there is nothing special - it just gets a PA and writes there. It also seems that guest does not need to get any info from the device either. Instead guest itself needs device to DMA into specific addresses, for its own reasons. It seems that the fact that within guest it's implemented using a bounce buffer and that it's easiest to do by switching virtio to use the DMA API isn't something virtio spec concerns itself with. I'm open to suggestions. -- MST
Benjamin Herrenschmidt
2018-Aug-05 00:53 UTC
[RFC 0/4] Virtio uses DMA API for all devices
On Sat, 2018-08-04 at 01:15 -0700, Christoph Hellwig wrote:> b) a way to document in a virtio-related spec how the bus handles > dma for Ben's totally fucked up hypervisor. Without that there > is not way we'll get interoperable implementations.Christoph, this isn't a totally fucked up hypervisor. It's not even about the hypervisor itself, I mean seriously, man, can you at least bother reading what I described is going on with the security architecture ? Anyway, Michael is onto what could possibly be an alternative approach, by having us tell qemu to flip to iommu mode at secure VM boot time. Let's see where that leads. Cheers, Ben.