thr3ads.net - Linux Virtualization - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Andy Lutomirski

2014-Sep-05 02:57 UTC

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Thu, Sep 4, 2014 at 7:31 PM, Rusty Russell <rusty at rustcorp.com.au>
wrote:> Andy Lutomirski <luto at amacapital.net> writes:
>> On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at
rustcorp.com.au> wrote:
>>>
>>> Andy Lutomirski <luto at amacapital.net> writes:
>>> > There really are virtio devices that are pieces of silicon and
not
>>> > figments of a hypervisor's imagination [1].
>>>
>>> Hi Andy,
>>>
>>>         As you're discovering, there's a reason no one has
done the DMA
>>> API before.
>>>
>>> So the problem is that ppc64's IOMMU is a platform thing, not a
bus
>>> thing.  They really do carve out an exception for virtio devices,
>>> because performance (LOTS of performance).  It remains to be seen
if
>>> other platforms have the same performance issues, but in absence of
>>> other evidence, the answer is yes.
>>>
>>> It's a hack.  But having specific virtual-only devices are an
even
>>> bigger hack.
>>>
>>> Physical virtio devices have been talked about, but don't
actually exist
>>> in Real Life.  And someone a virtio PCI card is going to have
serious
>>> performance issues: mainly because they'll want the rings in
the card's
>>> MMIO region, not allocated by the driver.  Being broken on PPC is
really
>>> the least of their problems.
>>>
>>> So, what do we do?  It'd be nice if Linux virtio Just Worked
under Xen,
>>> though Xen's IOMMU is outside the virtio spec.  Since
virtio_pci can be
>>> a module, obvious hacks like having xen_arch_setup initialize a
dma_ops pointer
>>> exposed by virtio_pci.c is out.
>>
>> Xen does expose dma_ops.  The trick is knowing when to use it.
>>
>>>
>>> I think the best approach is to have a new feature bit (25 is
free),
>>> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really wants
to
>>> use the mapping for the bus it is on.  A real device would set
this,
>>> or it won't work behind an IOMMU.  A Xen device would also set
this.
>>
>> The devices I care about aren't actually Xen devices.  They're
devices
>> supplied by QEMU/KVM, booting a Xen hypervisor, which in turn passes
>> the virtio device (along with every other PCI device) through to dom0.
>> So this is exactly the same virtio device that regular x86 KVM guests
>> would see.  The reason that current code fails is that Xen guest
>> physical addresses aren't the same as the addresses seen by the
outer
>> hypervisor.
>>
>> These devices don't know that physical addresses != bus addresses,
so
>> they can't advertise that fact.
>
> Ah, I see.  Then we will need a Xen-specific hack.
>
>> Grr.  This is mostly a result of the fact that virtio_pci devices
>> aren't really PCI devices.  I still think that virtio_pci
shouldn't
>> have to worry about this; ideally this would all be handled higher up
>> in the device hierarchy.  x86 already gets this right.
>
> Yes.  Adding a feature to say "I am a real PCI device" is
possible, but
> has other issues (particularly as Michael Tsirkin pointed out, what do
> you do if the driver doesn't understand the feature).
>
>> Are there any hypervisors except PPC that use virtio_pci, have IOMMUs
>> on the pci slot that virtio_pci lives in, and that use physical
>> addressing?  If not, I think that just quirking PPC will work (at
>> least until someone wants IOMMU support in virtio_pci on PPC, in which
>> case doing something using devicetree seems like a reasonable
>> solution).
>
> We can either patch to make PPC weird or make Xen weird.  I'm on the
> fence.
>
> Two questions for Paulo:
> 1) When QEMU support IOMMU on x86, will the virtio devices behind it
>    respect the IOMMU (do they use the right memory access primitives?).
>
> 2) Are we really going to be able to exclude virtio devices from using
>    the x86 IOMMU in a portable way which will always work?  If it's
>    per-bus granularity, will qemu really put them on their own PCI bus
>    and get this right?  Or will it sometimes get it wrong and users will
>    end up using virtio devices via IOMMU by accident?
>
> If the answers are both "yes", then x86 is going to be able to
use
> virtio+IOMMU, so PPC looks like the odd one out.  Otherwise it looks
> like we're really going to want to stick with the "ignore
IOMMU" rule
> until (handwave future), and we make an exception for Xen.
There's a third option: try to make virtio-mmio work everywhere
(except s390), at least in the long run.  This other benefits: it
makes minimal hypervisors simpler, I think it'll get rid of the limits
on the number of virtio devices in a system.  ARM is already going
this direction, and I imagine that PPC support would be
straightforward (it's already using devicetree).

Does virtio-mmio have any reasonable way of doing hotplug?  It could
also eventually make sense to have a standard for virtio on virtio.

--Andy

Benjamin Herrenschmidt

2014-Sep-05 05:20 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Thu, 2014-09-04 at 19:57 -0700, Andy Lutomirski wrote:
> There's a third option: try to make virtio-mmio work everywhere
> (except s390), at least in the long run.  This other benefits: it
> makes minimal hypervisors simpler, I think it'll get rid of the limits
> on the number of virtio devices in a system.  ARM is already going
> this direction, and I imagine that PPC support would be
> straightforward (it's already using devicetree).
PCI has advantages though. Management stacks know about PCI and nothing
else really. We already have all the infra to do hotplug with PCI,
etc...
> Does virtio-mmio have any reasonable way of doing hotplug?  It could
> also eventually make sense to have a standard for virtio on virtio.
That would be very platform specific.

Cheers,
Ben.

Christian Borntraeger

2014-Sep-05 07:33 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On 05/09/14 04:57, Andy Lutomirski wrote:> There's a third option: try to make virtio-mmio work everywhere
> (except s390), at least in the long run.  This other benefits: it
> makes minimal hypervisors simpler, I think it'll get rid of the limits
> on the number of virtio devices in a system.  ARM is already going
> this direction, and I imagine that PPC support would be
> straightforward (it's already using devicetree).
Well this chance is gone.
When virtio was first introduced we though about abstraction (mmio,hypercalls,
pci ops depending on the platform as part of the transport. There was even a
virtio over serial line as potential implementation), but we had to do a fully
PCI variant to please windows guests IIRC.

Christian

Christopher Covington

2014-Sep-10 15:36 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On 09/04/2014 10:57 PM, Andy Lutomirski wrote:> On Thu, Sep 4, 2014 at 7:31 PM, Rusty Russell <rusty at
rustcorp.com.au> wrote:
>> Andy Lutomirski <luto at amacapital.net> writes:
>>> On Sep 2, 2014 11:53 PM, "Rusty Russell" <rusty at
rustcorp.com.au> wrote:
>>>>
>>>> Andy Lutomirski <luto at amacapital.net> writes:
>>>>> There really are virtio devices that are pieces of silicon
and not
>>>>> figments of a hypervisor's imagination [1].
>>>>
>>>> Hi Andy,
>>>>
>>>>         As you're discovering, there's a reason no one
has done the DMA
>>>> API before.
>>>>
>>>> So the problem is that ppc64's IOMMU is a platform thing,
not a bus
>>>> thing.  They really do carve out an exception for virtio
devices,
>>>> because performance (LOTS of performance).  It remains to be
seen if
>>>> other platforms have the same performance issues, but in
absence of
>>>> other evidence, the answer is yes.
>>>>
>>>> It's a hack.  But having specific virtual-only devices are
an even
>>>> bigger hack.
>>>>
>>>> Physical virtio devices have been talked about, but don't
actually exist
>>>> in Real Life.  And someone a virtio PCI card is going to have
serious
>>>> performance issues: mainly because they'll want the rings
in the card's
>>>> MMIO region, not allocated by the driver.  Being broken on PPC
is really
>>>> the least of their problems.
>>>>
>>>> So, what do we do?  It'd be nice if Linux virtio Just
Worked under Xen,
>>>> though Xen's IOMMU is outside the virtio spec.  Since
virtio_pci can be
>>>> a module, obvious hacks like having xen_arch_setup initialize a
dma_ops pointer
>>>> exposed by virtio_pci.c is out.
>>>
>>> Xen does expose dma_ops.  The trick is knowing when to use it.
>>>
>>>>
>>>> I think the best approach is to have a new feature bit (25 is
free),
>>>> VIRTIO_F_USE_BUS_MAPPING which indicates that a device really
wants to
>>>> use the mapping for the bus it is on.  A real device would set
this,
>>>> or it won't work behind an IOMMU.  A Xen device would also
set this.
>>>
>>> The devices I care about aren't actually Xen devices. 
They're devices
>>> supplied by QEMU/KVM, booting a Xen hypervisor, which in turn
passes
>>> the virtio device (along with every other PCI device) through to
dom0.
>>> So this is exactly the same virtio device that regular x86 KVM
guests
>>> would see.  The reason that current code fails is that Xen guest
>>> physical addresses aren't the same as the addresses seen by the
outer
>>> hypervisor.
>>>
>>> These devices don't know that physical addresses != bus
addresses, so
>>> they can't advertise that fact.
>>
>> Ah, I see.  Then we will need a Xen-specific hack.
>>
>>> Grr.  This is mostly a result of the fact that virtio_pci devices
>>> aren't really PCI devices.  I still think that virtio_pci
shouldn't
>>> have to worry about this; ideally this would all be handled higher
up
>>> in the device hierarchy.  x86 already gets this right.
>>
>> Yes.  Adding a feature to say "I am a real PCI device" is
possible, but
>> has other issues (particularly as Michael Tsirkin pointed out, what do
>> you do if the driver doesn't understand the feature).
>>
>>> Are there any hypervisors except PPC that use virtio_pci, have
IOMMUs
>>> on the pci slot that virtio_pci lives in, and that use physical
>>> addressing?  If not, I think that just quirking PPC will work (at
>>> least until someone wants IOMMU support in virtio_pci on PPC, in
which
>>> case doing something using devicetree seems like a reasonable
>>> solution).
>>
>> We can either patch to make PPC weird or make Xen weird.  I'm on
the
>> fence.
>>
>> Two questions for Paulo:
>> 1) When QEMU support IOMMU on x86, will the virtio devices behind it
>>    respect the IOMMU (do they use the right memory access primitives?).
>>
>> 2) Are we really going to be able to exclude virtio devices from using
>>    the x86 IOMMU in a portable way which will always work?  If it's
>>    per-bus granularity, will qemu really put them on their own PCI bus
>>    and get this right?  Or will it sometimes get it wrong and users
will
>>    end up using virtio devices via IOMMU by accident?
>>
>> If the answers are both "yes", then x86 is going to be able
to use
>> virtio+IOMMU, so PPC looks like the odd one out.  Otherwise it looks
>> like we're really going to want to stick with the "ignore
IOMMU" rule
>> until (handwave future), and we make an exception for Xen.
> 
> There's a third option: try to make virtio-mmio work everywhere
> (except s390), at least in the long run.  This other benefits: it
> makes minimal hypervisors simpler, I think it'll get rid of the limits
> on the number of virtio devices in a system.  ARM is already going
> this direction, and I imagine that PPC support would be
> straightforward (it's already using devicetree).
In my opinion, a uniform "virt" machine for every instruction set
would be
very beneficial. I would guess that MMIO is more universally available than
PCI, and as you point out, simpler to implement.
> Does virtio-mmio have any reasonable way of doing hotplug?  It could
> also eventually make sense to have a standard for virtio on virtio.
I don't think so, but it seems possible. My bystander understanding is that
QEMU allocates some fixed number of VirtIO-MMIO devices, maybe a dozen, in the
device tree. The ones that don't actually get hooked up to something real
like
a block device or network interface are populated with a dummy device. One
naive approach might be to allow the dummy devices to tell the kernel that
they are now changing to a real device.

Also, higher level hotplug for at least SCSI sounds possible.

https://bugzilla.redhat.com/show_bug.cgi?id=1123390

Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

Andy Lutomirski

2014-Sep-10 16:15 UTC

head link

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

On Wed, Sep 10, 2014 at 8:36 AM, Christopher Covington
<cov at codeaurora.org> wrote:> On 09/04/2014 10:57 PM, Andy Lutomirski wrote:
>> There's a third option: try to make virtio-mmio work everywhere
>> (except s390), at least in the long run.  This other benefits: it
>> makes minimal hypervisors simpler, I think it'll get rid of the
limits
>> on the number of virtio devices in a system.  ARM is already going
>> this direction, and I imagine that PPC support would be
>> straightforward (it's already using devicetree).
>
> In my opinion, a uniform "virt" machine for every instruction set
would be
> very beneficial. I would guess that MMIO is more universally available than
> PCI, and as you point out, simpler to implement.
Except for x86 :(  That's presumably fixable, though.
>
>> Does virtio-mmio have any reasonable way of doing hotplug?  It could
>> also eventually make sense to have a standard for virtio on virtio.
>
> I don't think so, but it seems possible. My bystander understanding is
that
> QEMU allocates some fixed number of VirtIO-MMIO devices, maybe a dozen, in
the
> device tree. The ones that don't actually get hooked up to something
real like
> a block device or network interface are populated with a dummy device. One
> naive approach might be to allow the dummy devices to tell the kernel that
> they are now changing to a real device.
My thought (which I completely failed to articulate) was to have a
spec for a virtio device that exposes a complete virtio bus along with
hotplug and per-cpu interrupts (a la MSI-X).  This might be a bit
complicated, but it would work everywhere without any firmware or
platform issues.

--Andy

Apparently Analagous Threads

Search for more seemingly similar threads

Linux Virtualization - Sep 2014 - [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

[PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

Apparently Analagous Threads