Tian, Kevin
2022-Jun-15 07:35 UTC
[PATCH 3/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency
> From: Nicolin Chen <nicolinc at nvidia.com> > Sent: Wednesday, June 15, 2022 4:45 AM > > Hi Kevin, > > On Wed, Jun 08, 2022 at 11:48:27PM +0000, Tian, Kevin wrote: > > > > > The KVM mechanism for controlling wbinvd is only triggered during > > > > > kvm_vfio_group_add(), meaning it is a one-shot test done once the > > > devices > > > > > are setup. > > > > > > > > It's not one-shot. kvm_vfio_update_coherency() is called in both > > > > group_add() and group_del(). Then the coherency property is > > > > checked dynamically in wbinvd emulation: > > > > > > From the perspective of managing the domains that is still > > > one-shot. It doesn't get updated when individual devices are > > > added/removed to domains. > > > > It's unchanged per-domain but dynamic per-vm when multiple > > domains are added/removed (i.e. kvm->arch.noncoherent_dma_count). > > It's the latter being checked in the kvm. > > I am going to send a v2, yet not quite getting the point here. > Meanwhile, Jason is on leave. > > What, in your opinion, would be an accurate description here? >Something like below: -- The KVM mechanism for controlling wbinvd is based on OR of the coherency property of all devices attached to a guest, no matter those devices are attached to a single domain or multiple domains. So, there is no value in trying to push a device that could do enforced cache coherency to a dedicated domain vs re-using an existing domain which is non-coherent since KVM won't be able to take advantage of it. This just wastes domain memory. Simplify this code and eliminate the test. This removes the only logic that needed to have a dummy domain attached prior to searching for a matching domain and simplifies the next patches. It's unclear whether we want to further optimize the Intel driver to update the domain coherency after a device is detached from it, at least not before KVM can be verified to handle such dynamics in related emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality we don't see an usage requiring such optimization as the only device which imposes such non-coherency is Intel GPU which even doesn't support hotplug/hot remove. -- Thanks Kevin