Hi Jean,
On Wed Oct 25, 2017 at 10:07:53AM +0100, Jean-Philippe Brucker
wrote:> On 25/10/17 08:07, Linu Cherian wrote:
> > Hi Jean,
> >
> > On Tue Oct 24, 2017 at 10:28:59PM +0530, Linu Cherian wrote:
> >> Hi Jean,
> >> Thanks for your reply.
> >>
> >> On Tue Oct 24, 2017 at 09:37:12AM +0100, Jean-Philippe Brucker
wrote:
> >>> Hi Linu,
> >>>
> >>> On 24/10/17 07:27, Linu Cherian wrote:
> >>>> Hi Jean,
> >>>>
> >>>> On Mon Oct 23, 2017 at 10:32:41AM +0100, Jean-Philippe
Brucker wrote:
> >>>>> This is version 0.5 of the virtio-iommu specification,
the paravirtualized
> >>>>> IOMMU. This version addresses feedback from v0.4 and
adds an event virtqueue.
> >>>>> Please find the specification, LaTeX sources and pdf,
at:
> >>>>> git://linux-arm.org/virtio-iommu.git viommu/v0.5
> >>>>>
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.5/virtio-iommu-v0.5.pdf
> >>>>>
> >>>>> A detailed changelog since v0.4 follows. You can find
the pdf diff at:
> >>>>>
http://linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/diffs/virtio-iommu-pdf-diff-v0.4-v0.5.pdf
> >>>>>
> >>>>> * Add an event virtqueue for the device to report
translation faults to
> >>>>> the driver. For the moment only unrecoverable faults
are available but
> >>>>> future versions will extend it.
> >>>>> * Simplify PROBE request by removing the ack part, and
flattening RESV
> >>>>> properties.
> >>>>> * Rename "address space" to
"domain". The change might seem futile but
> >>>>> allows to introduce PASIDs and other features
cleanly in the next
> >>>>> versions. In the same vein, the few remaining
"device" occurrences were
> >>>>> replaced by "endpoint", to avoid any
confusion with "the device"
> >>>>> referring to the virtio device across the document.
> >>>>> * Add implementation notes for RESV_MEM properties.
> >>>>> * Update ACPI table definition.
> >>>>> * Fix typos and clarify a few things.
> >>>>>
> >>>>> I will publish the Linux driver for v0.5 shortly. Then
for next versions
> >>>>> I'll focus on optimizations and adding support for
hardware acceleration.
> >>>>>
> >>>>> Existing implementations are simple and can certainly
be optimized, even
> >>>>> without architectural changes. But the architecture
itself can also be
> >>>>> improved in a number of ways. Currently it is designed
to work well with
> >>>>> VFIO. However, having explicit MAP requests is less
efficient* than page
> >>>>> tables for emulated and PV endpoints, and the current
architecture doesn't
> >>>>> address this. Binding page tables is an obvious way to
improve throughput
> >>>>> in that case, but we can explore cleverer (and
possibly simpler) ways to
> >>>>> do it.
> >>>>>
> >>>>> So first we'll work on getting the base device and
driver merged, then
> >>>>> we'll analyze and compare several ideas for
improving performance.
> >>>>>
> >>>>> Thanks,
> >>>>> Jean
> >>>>>
> >>>>> * I have yet to study this behaviour, and would be
interested in any
> >>>>> prior art on the subject of analyzing devices DMA
patterns (virtio and
> >>>>> others)
> >>>>
> >>>>
> >>>> From the spec,
> >>>> Under future extensions.
> >>>>
> >>>> "Page Table Handover, to allow guests to manage their
own page tables and share them with the MMU"
> >>>>
> >>>> Had few questions on this.
> >>>>
> >>>> 1. Did you mean SVM support for vfio-pci devices attached
to guest processes here.
> >>>
> >>> Yes, using the VFIO BIND and INVALIDATE ioctls that Intel is
working on,
> >>> and adding requests in pretty much the same format to
virtio-iommu.
> >>>
> >>>> 2. Can you give some hints on how this is going to work ,
since virtio-iommu guest kernel
> >>>> driver need to create stage 1 page table as required by
hardware which is not the case now.
> >>>> CMIIW.
> >>>
> >>> The virtio-iommu device advertises which PASID/page table
format is
> >>> supported by the host (obtained via sysfs and communicated in
the PROBE
> >>> request), then the guest binds page tables or PASID tables to
a domain and
> >>> populates it. Binding page tables alone is easy because we
already have
> >>> the required drivers in the guest (io-pgtable or arch/* for
SVM) and code
> >>> in the host to manage PASID tables. But since the PASID table
pointer is
> >>> translated by stage-2, it would requires a little more work in
the host
> >>> for obtaining GPA buffers from the guest on demand.
> >> Is this for resolving PCI PRI requests ?.
> >> IIUC, PCI PRI requests for devices owned by guest need to be
resolved
> >> by guest itself.
>
> Supporting PCI PRI is a separate problem, that will be implemented by
> extending the event queue proposed in v0.5. Once the guest bound the PASID
> table and created the page tables, it will start some DMA job in the
> device. If a page isn't mapped, the pIOMMU sends a PRI Request (a page
> fault) to its driver, which is relayed to userspace by VFIO, then to the
> guest via virtio-iommu. The guest handles the fault, then sends a PRI
> response on the virtio-iommu request queue, relayed to the pIOMMU driver
> via VFIO and the device retries the access.
>
> >> In addition the BIND
> >>> ioctl is different from the one used by VT-d, so this solution
didn't get
> >>> much appreciation.
> >>
> >> Could you please share the links on this ?
>
> Please find the latest discussion at
> https://www.mail-archive.com/iommu at
lists.linux-foundation.org/msg20189.html
>
> >>> The alternative is to bind PASID tables.
> >>
> >> Sorry, i didnt get the difference here.
>
> PASID table is what we call Context Table in SMMU, it's the array
> associating a PASID (SSID) to a context descriptor. In the SMMUv3 the
> stream table entry (device descriptor) points to a PASID table. Each
> context descriptor in the PASID table points to a page directory (pgd).
>
> So the first solution was for the guest to send a BIND with pasid+pgd, and
> let the host deal with the context tables. The second solution is to send
> a BIND with a PASID table pointer, and have the guest handle the context
> table.
>
> > Also does this solution intend to cover the page table sharing of non
SVM
> > cases. For example, if we need to share the IOMMU page table for
> > a device used in guest kernel, so that map/unmap gets directly handled
by the guest
> > and only TLB invalidates happens through a virtio-iommu channel.
>
> Yes for non-SVM in SMMuv3, you still have a context table but with a
> single descriptor, so the interface stays the same.
So for non SVM case,
guest virtio-iommu driver will program the context descriptor such a way that,
ASID is not in shared set(ASET = 1b) and hence Physical IOMMU TLB invalidates
would get triggered
from software for every viommu_unmap(in guest kernel) through Qemu(using vfio
ioctls) ?
And for SVM case, ASID would be in shared set and explicit TLB invalidates
are not required from software ?
But with the second> solution, nested with SMMUv2 isn't supported since it doesn't have
context
> tables. The second solution was considered simpler to implement, so
we'll
> first go with this one.
>
> Thanks,
> Jean
>
> >> It requires to factor the guest
> >>> PASID handling code into a library, which is difficult for
SMMU. Luckily
> >>> I'm still working on adding PASID code for SMMUv3, so
extracting it out of
> >>> the driver isn't a big overhead. The good thing about this
solution is
> >>> that it reuses any specification work done for VFIO (and vice
versa) and
> >>> any host driver change made for vSMMU/VT-d emulations.
> >>>
> >>> Thanks,
> >>> Jean
> >>
> >> --
> >> Linu cherian
> >
--
Linu cherian