thr3ads.net - Linux Virtualization - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API [Jul 2015]

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2015-Jul-28 21:16 UTC

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, Jul 28, 2015 at 12:33 PM, Jan Kiszka <jan.kiszka at siemens.com>
wrote:> On 2015-07-28 21:24, Andy Lutomirski wrote:
>> On Tue, Jul 28, 2015 at 12:06 PM, Jan Kiszka <jan.kiszka at
siemens.com> wrote:
>>> On 2015-07-28 20:22, Andy Lutomirski wrote:
>>>> On Tue, Jul 28, 2015 at 10:17 AM, Jan Kiszka <jan.kiszka at
siemens.com> wrote:
>>>>> On 2015-07-28 19:10, Andy Lutomirski wrote:
>>>>>> The trouble is that this is really a property of the
bus and not of
>>>>>> the device.  If you build a virtio device that
physically plugs into a
>>>>>> PCIe slot, the device has no concept of an IOMMU in the
first place.
>>>>>
>>>>> If one would build a real virtio device today, it would be
broken
>>>>> because every IOMMU would start to translate its requests.
Already from
>>>>> that POV, we really need to introduce a feature flag
"I will be
>>>>> IOMMU-translated" so that a potential physical
implementation can carry
>>>>> it unconditionally.
>>>>>
>>>>
>>>> Except that, with my patches, it would work correctly.  ISTM
the thing
>>>
>>> I haven't looked at your patches yet - they make the virtio PCI
driver
>>> in Linux IOMMU-compatible? Perfect - except for a compatibility
check,
>>> right?
>>
>> Yes.  (virtio_pci_legacy, anyway.  Presumably virtio_pci_modern is
>> easy to adapt, too.)
>>
>>>
>>>> that's broken right now is QEMU and the virtio_pci driver. 
My patches
>>>> fix the driver.  Last year that would have been the end of the
story
>>>> except for PPC.  Now we have to deal with QEMU.
>>>>
>>>>>> Similarly, if you take an L0-provided IOMMU-supporting
device and pass
>>>>>> it through to L2 using current QEMU on L1 (with Q35
emulation and
>>>>>> iommu enabled), then, from L2's perspective, the
device is 1:1 no
>>>>>> matter what the device thinks.
>>>>>>
>>>>>> IOW, I think the original design was wrong and now we
have to deal
>>>>>> with it.  I think the best solution would be to teach
QEMU to fix its
>>>>>> ACPI tables so that 1:1 virtio devices are actually
exposed as 1:1.
>>>>>
>>>>> Only the current drivers are broken. And we can easily tell
them apart
>>>>> from newer ones via feature flags. Sorry, don't get the
problem.
>>>>
>>>> I still don't see how feature flags solve the problem. 
Suppose we
>>>> added a feature flag meaning "respects IOMMU".
>>>>
>>>> Bad case 1:  Build a malicious device that advertises
>>>> non-IOMMU-respecting virtio.  Plug it in behind an IOMMU.  Host
starts
>>>> leaking physical addresses to the device (and the device
doesn't work,
>>>> of course).  Maybe that's only barely a security problem,
but still...
>>>
>>> I don't see right now how critical such a hypothetical case
could be.
>>> But the OS / its drivers could still decide to refuse talking to
such a
>>> device.
>>>
>>
>> How does OS know it's such a device as opposed to a QEMU-supplied
thing?
>
> It can restrict itself to virtio devices exposing the feature if it
> feels uncomfortable that it might be talking to some evil piece of
> silicon (instead of the hypervisor, which has to be trusted anyway).
>
>>
>>>>
>>>> Bad case 2: Some hypothetical well-behaved new QEMU provides a
virtio
>>>> device that *does* respect the IOMMU and sets the feature flag.
They
>>>> emulate Q35 with an IOMMU.  They boot Linux 4.1.  Data
corruption in
>>>> the guest.
>>>
>>> No. In that case, the feature negotiation of
"virtio-with-iommu-support"
>>> would have failed for older drivers, and the device would have
never
>>> been used by the guest.
>>
>> So are you suggesting that newer virtio devices always provide this
>> feature flag and, if supplied by QEMU with iommu=on, simply refuse to
>> operate of the driver doesn't support that flag?
>
> Exactly.
>
>>
>> That could work as long as QEMU with the current (broken?) iommu=on
>> never exposes such a device.
>
> QEMU would have to be adjusted first so that all its virtio-pci device
> models take IOMMUs into account - if they exist or not. Only then it
> could expose the feature and expect the guest to acknowledge it.
>
> For compat reasons, QEMU should still be able to expose virtio devices
> without the flag set - but then without any IOMMU emulation enabled as
> well. That would prevent the current setup we are using today, but it's
> trivial to update the guest kernel to a newer virtio driver which would
> restore our scenario again.
Seems reasonable.
>>
>> If we apply something similar enough to my patches, then even old
>> hypervisors (e.g. Amazon's hardware virt systems) will support Xen
>> with virtio devices passed in just fine.
>
> Then it seems we can make everyone happy - perfect. :)
Yay.

FWIW, I have no intention to touch the QEMU code for this.  I'm
willing to do the vring bit and the virtio-pci bit as long as it's
well specified.

--Andy

Andy Lutomirski

2015-Jul-28 22:43 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Let me try to summarize a proposal:

Add a feature flag that indicates IOMMU support.

New kernels acknowledge that flag on any device that advertises it.

New kernels always respect the IOMMU (except on PowerPC).  New kernels
optionally refuse to talk to devices that don't have that feature flag
if the device appears to be behind an IOMMU.  (This presumably
includes any device whatsoever on an x86 platform with an IOMMU,
including Xen's fake IOMMU.)

New QEMU always respects the IOMMU, if any, except on PPC.  New QEMU
always advertises this feature flag.  If iommu=on, QEMU's virtio
devices refuse to work unless the driver acknowledges the flag.

On PPC, new QEMU will not respect the IOMMU and will not set the flag.
New kernels will not talk to devices that set the flag.  If someone
wants to fix that, then they get to figure out how.

This results in:

New kernels work fine with old QEMU unless iommu=on.

New kernels work with new devices (QEMU and physical devices that set
the flag) under all circumstances, except on PPC where physical
devices are and remain broken.

Xen works work new QEMU and cleanly refuses to interoperate with old
QEMU.  (This is worse than with just my patches, but it's better than
the status quo in which the Xen guest corrupts itself and possibly
corrupts the Xen hypervisor.)

New kernels with old QEMU with iommu=on optionally refuses to interoperate.

Old kernels are oblivious.  They work exactly the same as they do
today except that they fail cleanly with new QEMU with iommu=on.  Old
kernels continue to fail with physical virtio devices if they're
behind an iommu.

Old physical virtio devices that don't advertise the flag fail cleanly
if the host uses an iommu.  The driver could optionally whitelist such
devices.

PPC works as well as it currently does.

I'm unsure about the arm64 situation.


Did I get this right?

--Andy

Benjamin Herrenschmidt

2015-Jul-28 23:21 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Tue, 2015-07-28 at 15:43 -0700, Andy Lutomirski
wrote:> Let me try to summarize a proposal:
> 
> Add a feature flag that indicates IOMMU support.
> 
> New kernels acknowledge that flag on any device that advertises it.
> 
> New kernels always respect the IOMMU (except on PowerPC).
Why ? I disagree, the flag should be honored when set in any
architecture. PowerPC is no different than any other platform in that
regard.
>   New kernels
> optionally refuse to talk to devices that don't have that feature flag
> if the device appears to be behind an IOMMU.  (This presumably
> includes any device whatsoever on an x86 platform with an IOMMU,
> including Xen's fake IOMMU.)
> 
> New QEMU always respects the IOMMU, if any, except on PPC.
This is just a matter of what is the default of the flag, ie we
should have a machine flag that indicates what the default is for
new virtio devices, otherwise, it should be specified per device
as an attribute of the device instance.

I would argue that we should default to "bypass IOMMU" on *all*
architecture due to the performance impact, and to essentially
default to the same behaviour as today. With things like DDW even
powerpc might be able to mostly alleviate the performance impact
so we might to change in the long term, but I tend to prefer
more incremental approaches.
>   New QEMU
> always advertises this feature flag.  If iommu=on, QEMU's virtio
> devices refuse to work unless the driver acknowledges the flag.
This should be configurable.
> On PPC, new QEMU will not respect the IOMMU and will not set the flag.
> New kernels will not talk to devices that set the flag.  If someone
> wants to fix that, then they get to figure out how.
I disagree with the kernel bit and I disagree with special casing PPC in
any shape or form in the code. The only difference should be a default
value for the iommu mode of virtio in qemu set per machine.

You can then feel free to change that default (in a separate patch for
bisectability) on x86 for the sake of Xen.

Ben.
> This results in:
> 
> New kernels work fine with old QEMU unless iommu=on.
> 
> New kernels work with new devices (QEMU and physical devices that set
> the flag) under all circumstances, except on PPC where physical
> devices are and remain broken.
> 
> Xen works work new QEMU and cleanly refuses to interoperate with old
> QEMU.  (This is worse than with just my patches, but it's better than
> the status quo in which the Xen guest corrupts itself and possibly
> corrupts the Xen hypervisor.)
> 
> New kernels with old QEMU with iommu=on optionally refuses to interoperate.
> 
> Old kernels are oblivious.  They work exactly the same as they do
> today except that they fail cleanly with new QEMU with iommu=on.  Old
> kernels continue to fail with physical virtio devices if they're
> behind an iommu.
> 
> Old physical virtio devices that don't advertise the flag fail cleanly
> if the host uses an iommu.  The driver could optionally whitelist such
> devices.
> 
> PPC works as well as it currently does.
> 
> I'm unsure about the arm64 situation.
> 
> 
> Did I get this right?
> 
> --Andy

Reasonably Related Threads

Search for more maybe matching threads

Linux Virtualization - Jul 2015 - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Reasonably Related Threads