Jason Wang
2023-Oct-17 02:35 UTC
[PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release
On Tue, Oct 17, 2023 at 4:30?AM Si-Wei Liu <si-wei.liu at oracle.com> wrote:> > > > On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote: > > On Mon, Oct 16, 2023 at 8:33?AM Jason Wang <jasowang at redhat.com> wrote: > >> On Fri, Oct 13, 2023 at 3:36?PM Si-Wei Liu <si-wei.liu at oracle.com> wrote: > >>> > >>> > >>> On 10/12/2023 8:01 PM, Jason Wang wrote: > >>>> On Tue, Oct 10, 2023 at 5:05?PM Si-Wei Liu <si-wei.liu at oracle.com> wrote: > >>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation > >>>>> may need to restore iotlb mapping to the initial or default state > >>>>> using the .reset_map op, as it's desirable for some parent devices > >>>>> to solely manipulate mappings by its own, independent of virtio device > >>>>> state. For instance, device reset does not cause mapping go away on > >>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa > >>>>> is going away, give them a chance to reset iotlb back to the initial > >>>>> state in vhost_vdpa_cleanup(). > >>>>> > >>>>> Signed-off-by: Si-Wei Liu <si-wei.liu at oracle.com> > >>>>> --- > >>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++ > >>>>> 1 file changed, 16 insertions(+) > >>>>> > >>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > >>>>> index 851535f..a3f8160 100644 > >>>>> --- a/drivers/vhost/vdpa.c > >>>>> +++ b/drivers/vhost/vdpa.c > >>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v, > >>>>> return vhost_vdpa_alloc_as(v, asid); > >>>>> } > >>>>> > >>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid) > >>>>> +{ > >>>>> + struct vdpa_device *vdpa = v->vdpa; > >>>>> + const struct vdpa_config_ops *ops = vdpa->config; > >>>>> + > >>>>> + if (ops->reset_map) > >>>>> + ops->reset_map(vdpa, asid); > >>>>> +} > >>>>> + > >>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > >>>>> { > >>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid); > >>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > >>>>> > >>>>> hlist_del(&as->hash_link); > >>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid); > >>>>> + /* > >>>>> + * Devices with vendor specific IOMMU may need to restore > >>>>> + * iotlb to the initial or default state which is not done > >>>>> + * through device reset, as the IOTLB mapping manipulation > >>>>> + * could be decoupled from the virtio device life cycle. > >>>>> + */ > >>>> Should we do this according to whether IOTLB_PRESIST is set? > >>> Well, in theory this seems like so but it's unnecessary code change > >>> actually, as that is the way how vDPA parent behind platform IOMMU works > >>> today, and userspace doesn't break as of today. :) > >> Well, this is one question I've ever asked before. You have explained > >> that one of the reason that we don't break userspace is that they may > >> couple IOTLB reset with vDPA reset as well. One example is the Qemu. > >> > >>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set > >>> it doesn't necessarily mean the iotlb will definitely be destroyed > >>> across reset (think about the platform IOMMU case), so userspace today > >>> is already tolerating enough with either good or bad IOMMU.I'm confused, how to define tolerating here? For example, if it has tolerance, why bother?> >>This code of > >>> not checking IOTLB_PERSIST being set is intentional, there's no point to > >>> emulate bad IOMMU behavior even for older userspace (with improper > >>> emulation to be done it would result in even worse performance).I can easily imagine a case: The old Qemu that works only with a setup like mlx5_vdpa. If we do this without a negotiation, IOTLB will not be clear but the Qemu will try to re-program the IOTLB after reset. Which will break? 1) stick the exact old behaviour with just one line of check 2) audit all the possible cases to avoid a one line of code 1) seems much easier than 2)> >> For two reasons: > >> > >> 1) backend features need acked by userspace this is by design > >> 2) keep the odd behaviour seems to be more safe as we can't audit > >> every userspace program > >> > > The old behavior (without flag ack) cannot be trusted already, as:Possibly but the point is to unbreak userspace no matter how weird the behaviour we've ever had.> > * Devices using platform IOMMU (in other words, not implementing > > neither .set_map nor .dma_map) does not unmap memory at virtio reset. > > * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do > > reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb > > called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset, > > now all backends work the same as far as I know., which was (and is) > > the way devices using the platform IOMMU works. > > > > The difference in behavior did not matter as QEMU unmaps all the > > memory unregistering the memory listener at vhost_vdpa_dev_start(..., > > started = false), > Exactly. It's not just QEMU, but any (older) userspace manipulates > mappings through the vhost-vdpa iotlb interface has to unmap all > mappings to workaround the vdpa parent driver bug.Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI.> If they don't do > explicit unmap, it would cause state inconsistency between vhost-vdpa > and parent driver, then old mappings can't be restored, and new mapping > can be added to iotlb after vDPA reset. There's no point to preserve > this broken and inconsistent behavior between vhost-vdpa and parent > driver, as userspace doesn't care at all!It's a userspace notice change so we can't fix it silently: https://lkml.org/lkml/2012/12/23/75 Another example which is related to vhost-vDPA: https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger at redhat.com/T/ Thanks> > > but the backend acknowledging this feature flag > > allows QEMU to make sure it is safe to skip this unmap & map in the > > case of vhost stop & start cycle. > > > > In that sense, this feature flag is actually a signal for userspace to > > know that the bug has been solved. > Right, I couldn't say it better than you do, thanks! The feature flag is > more of an unusual means to indicating kernel bug having been fixed, > rather than introduce a new feature or new kernel behavior ending up in > change of userspace's expectation. > > > Not offering it indicates that > > userspace cannot trust the kernel will retain the maps. > > > > Si-Wei or Dragos, please correct me if I've missed something. Feel > > free to use the text in case you find more clear in doc or patch log. > Sure, will do, thank you! Will post v2 adding these to the log. > > Thanks, > -Siwei > > > > > > > Thanks! > > > >> Thanks > >> > >>> I think > >>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100% > >>> certainty of persistent iotlb mapping not getting lost across vdpa reset. > >>> > >>> Thanks, > >>> -Siwei > >>> > >>> [1] > >>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd at oracle.com/ > >>> [2] > >>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e at oracle.com/ > >>>> Otherwise > >>>> we may break old userspace. > >>>> > >>>> Thanks > >>>> > >>>>> + vhost_vdpa_reset_map(v, asid); > >>>>> kfree(as); > >>>>> > >>>>> return 0; > >>>>> -- > >>>>> 1.8.3.1 > >>>>> >
Si-Wei Liu
2023-Oct-18 04:35 UTC
[PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release
On 10/16/2023 7:35 PM, Jason Wang wrote:> On Tue, Oct 17, 2023 at 4:30?AM Si-Wei Liu <si-wei.liu at oracle.com> wrote: >> >> >> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote: >>> On Mon, Oct 16, 2023 at 8:33?AM Jason Wang <jasowang at redhat.com> wrote: >>>> On Fri, Oct 13, 2023 at 3:36?PM Si-Wei Liu <si-wei.liu at oracle.com> wrote: >>>>> >>>>> On 10/12/2023 8:01 PM, Jason Wang wrote: >>>>>> On Tue, Oct 10, 2023 at 5:05?PM Si-Wei Liu <si-wei.liu at oracle.com> wrote: >>>>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation >>>>>>> may need to restore iotlb mapping to the initial or default state >>>>>>> using the .reset_map op, as it's desirable for some parent devices >>>>>>> to solely manipulate mappings by its own, independent of virtio device >>>>>>> state. For instance, device reset does not cause mapping go away on >>>>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa >>>>>>> is going away, give them a chance to reset iotlb back to the initial >>>>>>> state in vhost_vdpa_cleanup(). >>>>>>> >>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu at oracle.com> >>>>>>> --- >>>>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++ >>>>>>> 1 file changed, 16 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c >>>>>>> index 851535f..a3f8160 100644 >>>>>>> --- a/drivers/vhost/vdpa.c >>>>>>> +++ b/drivers/vhost/vdpa.c >>>>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v, >>>>>>> return vhost_vdpa_alloc_as(v, asid); >>>>>>> } >>>>>>> >>>>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid) >>>>>>> +{ >>>>>>> + struct vdpa_device *vdpa = v->vdpa; >>>>>>> + const struct vdpa_config_ops *ops = vdpa->config; >>>>>>> + >>>>>>> + if (ops->reset_map) >>>>>>> + ops->reset_map(vdpa, asid); >>>>>>> +} >>>>>>> + >>>>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) >>>>>>> { >>>>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid); >>>>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) >>>>>>> >>>>>>> hlist_del(&as->hash_link); >>>>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid); >>>>>>> + /* >>>>>>> + * Devices with vendor specific IOMMU may need to restore >>>>>>> + * iotlb to the initial or default state which is not done >>>>>>> + * through device reset, as the IOTLB mapping manipulation >>>>>>> + * could be decoupled from the virtio device life cycle. >>>>>>> + */ >>>>>> Should we do this according to whether IOTLB_PRESIST is set? >>>>> Well, in theory this seems like so but it's unnecessary code change >>>>> actually, as that is the way how vDPA parent behind platform IOMMU works >>>>> today, and userspace doesn't break as of today. :) >>>> Well, this is one question I've ever asked before. You have explained >>>> that one of the reason that we don't break userspace is that they may >>>> couple IOTLB reset with vDPA reset as well. One example is the Qemu. >>>> >>>>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set >>>>> it doesn't necessarily mean the iotlb will definitely be destroyed >>>>> across reset (think about the platform IOMMU case), so userspace today >>>>> is already tolerating enough with either good or bad IOMMU. > I'm confused, how to define tolerating here?Tolerating defined as QEMU has to proactively unmap before reset just to workaround the driver bug (on-chip maps out of sync), unconditionally for platform or on-chip. While we all know it doesn't have to do so for platform IOMMU, though userspace has no means to distinguish. That said, userspace is sacrificing reset time performance on platform IOMMU setup just for working around buggy implementation in the other setup.> For example, if it has tolerance, why bother?I'm not sure I get the question. But I think userspace is compromising because of buggy implementation in a few drivers doesn't mean we should uniformly enforce such behavior for all set_map/dma_map implementations.> >>>> This code of >>>>> not checking IOTLB_PERSIST being set is intentional, there's no point to >>>>> emulate bad IOMMU behavior even for older userspace (with improper >>>>> emulation to be done it would result in even worse performance). > I can easily imagine a case: > > The old Qemu that works only with a setup like mlx5_vdpa.Noted, seems to me there's no such case of a userspace implementation that only works with mlx5_vdpa or its friends, but doesn't work with the others e.g. platform IOMMU, or well behaving on-chip IOMMU implementations. The Unmap+remap trick around vdpa reset works totally fine for?platform IOMMU, except with sub-optimal performance. Other than this trick, I cannot easily think of other means or iotlb message sequence for userspace to recover the bogus state and make iotlb back to work again after reset. Are we talking about hypnosis that has no real basis to exist in the real world?> If we do > this without a negotiation, IOTLB will not be clear but the Qemu will > try to re-program the IOTLB after reset. Which will break? > > 1) stick the exact old behaviour with just one line of checkIt's not just one line of check here, the old behavior emulation has to be done as Eugenio illustrated in the other email. In addition, the emulation has to limit to those buggy drivers as I don't feel this emulation should apply uniformly to all future set_map/dma_map implementations.> 2) audit all the possible cases to avoid a one line of code > > 1) seems much easier than 2)You see it's more than just one line of code, and I'm uncertain if the additional complexity is warranted or necessary, particularly if added this piece of compatibility code will linger for quite a long time. Instead of adding hypothetical code change for no specific good reason and no real use case, I'd like to add the code when we find out a specific use case that may get impacted or already being affected, then we will have good understanding how to code up the fix and emulate properly for compatibility, while not affecting other good implementations. Thanks, -Siwe/i/> >>>> For two reasons: >>>> >>>> 1) backend features need acked by userspace this is by design >>>> 2) keep the odd behaviour seems to be more safe as we can't audit >>>> every userspace program >>>> >>> The old behavior (without flag ack) cannot be trusted already, as: > Possibly but the point is to unbreak userspace no matter how weird the > behaviour we've ever had. > >>> * Devices using platform IOMMU (in other words, not implementing >>> neither .set_map nor .dma_map) does not unmap memory at virtio reset. >>> * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do >>> reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb >>> called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset, >>> now all backends work the same as far as I know., which was (and is) >>> the way devices using the platform IOMMU works. >>> >>> The difference in behavior did not matter as QEMU unmaps all the >>> memory unregistering the memory listener at vhost_vdpa_dev_start(..., >>> started = false), >> Exactly. It's not just QEMU, but any (older) userspace manipulates >> mappings through the vhost-vdpa iotlb interface has to unmap all >> mappings to workaround the vdpa parent driver bug. > Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI. > >> If they don't do >> explicit unmap, it would cause state inconsistency between vhost-vdpa >> and parent driver, then old mappings can't be restored, and new mapping >> can be added to iotlb after vDPA reset. There's no point to preserve >> this broken and inconsistent behavior between vhost-vdpa and parent >> driver, as userspace doesn't care at all! > It's a userspace notice change so we can't fix it silently: > > https://lkml.org/lkml/2012/12/23/75 > > Another example which is related to vhost-vDPA: > > https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger at redhat.com/T/ > > Thanks > >>> but the backend acknowledging this feature flag >>> allows QEMU to make sure it is safe to skip this unmap & map in the >>> case of vhost stop & start cycle. >>> >>> In that sense, this feature flag is actually a signal for userspace to >>> know that the bug has been solved. >> Right, I couldn't say it better than you do, thanks! The feature flag is >> more of an unusual means to indicating kernel bug having been fixed, >> rather than introduce a new feature or new kernel behavior ending up in >> change of userspace's expectation. >> >>> Not offering it indicates that >>> userspace cannot trust the kernel will retain the maps. >>> >>> Si-Wei or Dragos, please correct me if I've missed something. Feel >>> free to use the text in case you find more clear in doc or patch log. >> Sure, will do, thank you! Will post v2 adding these to the log. >> >> Thanks, >> -Siwei >> >> >> >>> Thanks! >>> >>>> Thanks >>>> >>>>> I think >>>>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100% >>>>> certainty of persistent iotlb mapping not getting lost across vdpa reset. >>>>> >>>>> Thanks, >>>>> -Siwei >>>>> >>>>> [1] >>>>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd at oracle.com/ >>>>> [2] >>>>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e at oracle.com/ >>>>>> Otherwise >>>>>> we may break old userspace. >>>>>> >>>>>> Thanks >>>>>> >>>>>>> + vhost_vdpa_reset_map(v, asid); >>>>>>> kfree(as); >>>>>>> >>>>>>> return 0; >>>>>>> -- >>>>>>> 1.8.3.1 >>>>>>>