Si-Wei Liu
2023-Aug-21 22:31 UTC
[PATCH RFC 1/4] vdpa: introduce .reset_map operation callback
On 8/17/2023 8:28 AM, Eugenio Perez Martin wrote:> On Thu, Aug 17, 2023 at 2:05?AM Si-Wei Liu <si-wei.liu at oracle.com> wrote: >> >> >> On 8/15/2023 6:55 PM, Jason Wang wrote: >>> On Wed, Aug 16, 2023 at 3:49?AM Si-Wei Liu <si-wei.liu at oracle.com> wrote: >>>> >>>> On 8/14/2023 7:21 PM, Jason Wang wrote: >>>>> On Tue, Aug 15, 2023 at 9:46?AM Si-Wei Liu <si-wei.liu at oracle.com> wrote: >>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu at oracle.com> >>>>>> --- >>>>>> include/linux/vdpa.h | 7 +++++++ >>>>>> 1 file changed, 7 insertions(+) >>>>>> >>>>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h >>>>>> index db1b0ea..3a3878d 100644 >>>>>> --- a/include/linux/vdpa.h >>>>>> +++ b/include/linux/vdpa.h >>>>>> @@ -314,6 +314,12 @@ struct vdpa_map_file { >>>>>> * @iova: iova to be unmapped >>>>>> * @size: size of the area >>>>>> * Returns integer: success (0) or error (< 0) >>>>>> + * @reset_map: Reset device memory mapping (optional) >>>>>> + * Needed for device that using device >>>>>> + * specific DMA translation (on-chip IOMMU) >>>>> This exposes the device internal to the upper layer which is not optimal. >>>> Not sure what does it mean by "device internal", but this op callback >>>> just follows existing convention to describe what vdpa parent this API >>>> targets. >>> I meant the bus tries to hide the differences among vendors. So it >>> needs to hide on-chip IOMMU stuff to the upper layer. >>> >>> We can expose two dimensional IO mappings models but it looks like >>> over engineering for this issue. More below. >>> >>>> * @set_map: Set device memory mapping (optional) >>>> * Needed for device that using device >>>> * specific DMA translation (on-chip IOMMU) >>>> : >>>> : >>>> * @dma_map: Map an area of PA to IOVA (optional) >>>> * Needed for device that using device >>>> * specific DMA translation (on-chip IOMMU) >>>> * and preferring incremental map. >>>> : >>>> : >>>> * @dma_unmap: Unmap an area of IOVA (optional but >>>> * must be implemented with dma_map) >>>> * Needed for device that using device >>>> * specific DMA translation (on-chip IOMMU) >>>> * and preferring incremental unmap. >>>> >>>> >>>>> Btw, what's the difference between this and a simple >>>>> >>>>> set_map(NULL)? >>>> I don't think parent drivers support this today - they can accept >>>> non-NULL iotlb containing empty map entry, but not a NULL iotlb. The >>>> behavior is undefined or it even causes panic when a NULL iotlb is >>>> passed in. >>> We can do this simple change if it can work. >> If we go with setting up 1:1 DMA mapping at virtio-vdpa .probe() and >> tearing it down at .release(), perhaps set_map(NULL) is not sufficient. >>>> Further this doesn't work with .dma_map parent drivers. >>> Probably, but I'd remove dma_map as it doesn't have any real users >>> except for the simulator. >> OK, at a point there was suggestion to get this incremental API extended >> to support batching to be in par with or even replace .set_map, not sure >> if it's too soon to conclude. But I'm okay with the removal if need be. > Yes, I think the right move in the long run is to delegate the > batching to the parent driver. This allows drivers like mlx to add > memory (like hotplugged memory) without the need of tearing down all > the old maps.Nods.> > Having said that, maybe we can work on top if we need to remove > .dma_map for now.I guess for that sake I would keep .dma_map unless there's strong objection against it. Thanks, -Siwei> >>>> The reason why a new op is needed or better is because it allows >>>> userspace to tell apart different reset behavior from the older kernel >>>> (via the F_IOTLB_PERSIST feature bit in patch 4), while this behavior >>>> could vary between parent drivers. >>> I'm ok with a new feature flag, but we need to first seek a way to >>> reuse the existing API. >> A feature flag is needed anyway. I'm fine with reusing but guess I'd >> want to converge on the direction first. >> >> Thanks, >> -Siwei >>> Thanks >>> >>>> Regards, >>>> -Siwei >>>> >>>>> Thanks >>>>> >>>>>> + * @vdev: vdpa device >>>>>> + * @asid: address space identifier >>>>>> + * Returns integer: success (0) or error (< 0) >>>>>> * @get_vq_dma_dev: Get the dma device for a specific >>>>>> * virtqueue (optional) >>>>>> * @vdev: vdpa device >>>>>> @@ -390,6 +396,7 @@ struct vdpa_config_ops { >>>>>> u64 iova, u64 size, u64 pa, u32 perm, void *opaque); >>>>>> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid, >>>>>> u64 iova, u64 size); >>>>>> + int (*reset_map)(struct vdpa_device *vdev, unsigned int asid); >>>>>> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group, >>>>>> unsigned int asid); >>>>>> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx); >>>>>> -- >>>>>> 1.8.3.1 >>>>>>