thr3ads.net - Virtualization - [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space [Jun 2023]

If this information is useful, please help other people find it:
Share via:

Jean-Philippe Brucker

2023-Jun-16 11:32 UTC

[RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

Hi Baolu,

On Tue, May 30, 2023 at 01:37:07PM +0800, Lu Baolu
wrote:> - The timeout value for the pending page fault messages. Ideally we
>   should determine the timeout value from the device configuration, but
>   I failed to find any statement in the PCI specification (version 6.x).
>   A default 100 milliseconds is selected in the implementation, but it
>   leave the room for grow the code for per-device setting.
If it helps we had some discussions about this timeout [1]. It's useful to
print out a warning for debugging, but I don't think completing the fault
on timeout is correct, we should leave the fault pending. Given that the
PCI spec does not indicate a timeout, the guest can wait as long as it
wants to complete the fault (and 100ms may even be reasonable on an
emulator, who knows how many layers and context switches the fault has to
go through).

Another outstanding issue was what to do for PASID stop. When the guest
device driver stops using a PASID it issues a PASID stop request to the
device (a device-specific mechanism). If the device is not using PRI stop
markers it waits for pending PRs to complete and we're fine. Otherwise it
sends a stop marker which is flushed to the PRI queue, but does not wait
for pending PRs.

Handling stop markers is annoying. If the device issues one, then the PRI
queue contains stale faults, a stop marker, followed by valid faults for
the next address space bound to this PASID. The next address space will
get all the spurious faults because the fault handler doesn't know that
there is a stop marker coming. Linux is probably alright with spurious
faults, though maybe not in all cases, and other guests may not support
them at all.

We might need to revisit supporting stop markers: request that each device
driver declares whether their device uses stop markers on unbind() ("This
mechanism must indicate that a Stop Marker Message will be generated."
says the spec, but doesn't say if the function always uses one or the
other mechanism so it's per-unbind). Then we still have to synchronize
unbind() with the fault handler to deal with the pending stop marker,
which might have already gone through or be generated later.

Currently we ignore all that and just flush the PRI queue, followed by the
IOPF queue, to get rid of any stale fault before reassigning the PASID. A
guest however would also need to first flush the HW PRI queue, but doesn't
have a direct way to do that. If we want to support guests that don't deal
with stop markers, the host needs to flush the PRI queue when a PASID is
detached. I guess on Intel detaching the PASID goes through the host which
can flush the host queue. On Arm we'll probably need to flush the queue
when receiving a PASID cache invalidation, which the guest issues after
clearing a PASID table entry.

Thanks,
Jean

[1] https://lore.kernel.org/linux-iommu/20180423153622.GC38106 at
ostrya.localdomain/
    Also unregistration, not sure if relevant here
    https://lore.kernel.org/linux-iommu/20190605154553.0d00ad8d at
jacob-builder/

Jason Gunthorpe

2023-Jun-19 12:58 UTC

head link

[RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

On Fri, Jun 16, 2023 at 12:32:32PM +0100, Jean-Philippe Brucker wrote:
> We might need to revisit supporting stop markers: request that each device
> driver declares whether their device uses stop markers on unbind()
("This
> mechanism must indicate that a Stop Marker Message will be generated."
> says the spec, but doesn't say if the function always uses one or the
> other mechanism so it's per-unbind). Then we still have to synchronize
> unbind() with the fault handler to deal with the pending stop marker,
> which might have already gone through or be generated later.
An explicit API to wait for the stop marker makes sense, with the
expectation that well behaved devices will generate it and well
behaved drivers will wait for it.

Things like VFIO should have a way to barrier/drain the PRI queue
after issuing FLR. ie the VMM processing FLR should also barrier the
real HW queues and flush them to VM visibility.
> with stop markers, the host needs to flush the PRI queue when a PASID is
> detached. I guess on Intel detaching the PASID goes through the host which
> can flush the host queue. On Arm we'll probably need to flush the queue
> when receiving a PASID cache invalidation, which the guest issues after
> clearing a PASID table entry.
We are trying to get ARM to a point where invalidations don't need to
be trapped. It would be good to not rely on that anyplace.

Jason

Reasonably Related Threads

Search for more apparently analagous threads

Virtualization - Jun 2023 - [RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

[RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

[RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

Reasonably Related Threads