Si-Wei Liu
2023-Sep-09 08:42 UTC
[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group
Following patchset introduces dedicated group for descriptor table to reduce live migration downtime when passthrough VQ is being switched to shadow VQ. This RFC v2 is sent to incorporate the early feedback from reviewers on the uAPI and driver API part of changes, the associated driver patch set consuming ths API will come around soon along with formal submission of this series. Some initial performance data will be gathered using the real hardware device with mlx5_vdpa. The target goal of this series is to reduce the SVQ switching overhead to less than 300ms on a ~100GB guest with 2 non-mq vhost-vdpa devices. The reduction in the downtime is thanks to avoiding the full remap in the switching. The plan of the intended driver implementation is to use a dedicated group (specifically, 2 in below table) to host the descriptor tables for data vqs, different from where buffer addresses are contained (in group 0 as below). cvq does not have to allocate dedicated group for descriptor table, so its buffers and descriptor table would always belong to the same group (1 in table below). | data vq | ctrl vq ==============+==========+==========vq_group | 0 | 1 vq_desc_group | 2 | 1 --- Si-Wei Liu (3): vdpa: introduce dedicated descriptor group for virtqueue vhost-vdpa: introduce descriptor group backend feature vhost-vdpa: uAPI to get dedicated descriptor group id drivers/vhost/vdpa.c | 27 +++++++++++++++++++++++++++ include/linux/vdpa.h | 11 +++++++++++ include/uapi/linux/vhost.h | 8 ++++++++ include/uapi/linux/vhost_types.h | 5 +++++ 4 files changed, 51 insertions(+) -- 1.8.3.1
Si-Wei Liu
2023-Sep-09 08:42 UTC
[PATCH RFC v2 1/3] vdpa: introduce dedicated descriptor group for virtqueue
In some cases, the access to the virtqueue's descriptor area, device and driver areas (precluding indirect descriptor table in guest memory) may have to be confined to a different address space than where its buffers reside. Without loss of simplicity and generality with already established terminology, let's fold up these 3 areas and call them as a whole as descriptor table group, or descriptor group for short. Specifically, in case of split virtqueues, descriptor group consists of regions for Descriptor Table, Available Ring and Used Ring; for packed virtqueues layout, descriptor group contains Descriptor Ring, Driver and Device Event Suppression structures. The group ID for a dedicated descriptor group can be obtained through a new .get_vq_desc_group() op. If driver implements this op, it means that the descriptor, device and driver areas of the virtqueue may reside in a dedicated group than where its buffers reside, a.k.a the default virtqueue group through the .get_vq_group() op. In principle, the descriptor group may or may not have same group ID as the default group. Even if the descriptor group has a different ID, meaning the vq's descriptor group areas can optionally move to a separate address space than where guest memory resides, the descriptor group may still start from a default address space, same as where its buffers reside. To move the descriptor group to a different address space, .set_group_asid() has to be called to change the ASID binding for the group, which is no different than what needs to be done on any other virtqueue group. On the other hand, the .reset() semantics also applies on descriptor table group, meaning the device reset will clear all ASID bindings and move all virtqueue groups including descriptor group back to the default address space, i.e. in ASID 0. QEMU's shadow virtqueue is going to utilize dedicated descriptor group to speed up map and unmap operations, yielding tremendous downtime reduction by avoiding the full and slow remap cycle in SVQ switching. Signed-off-by: Si-Wei Liu <si-wei.liu at oracle.com> Acked-by: Eugenio P?rez <eperezma at redhat.com> --- RFC v1 -> v2: - expand commit log to mention downtime reduction in switching - add clarifications for what "descriptor group" covers and whatnot --- include/linux/vdpa.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index db1b0ea..17a4efa 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -204,6 +204,16 @@ struct vdpa_map_file { * @vdev: vdpa device * @idx: virtqueue index * Returns u32: group id for this virtqueue + * @get_vq_desc_group: Get the group id for the descriptor table of + * a specific virtqueue (optional) + * @vdev: vdpa device + * @idx: virtqueue index + * Returns u32: group id for the descriptor table + * portion of this virtqueue. Could be different + * than the one from @get_vq_group, in which case + * the access to the descriptor table can be + * confined to a separate asid, isolating from + * the virtqueue's buffer address access. * @get_device_features: Get virtio features supported by the device * @vdev: vdpa device * Returns the virtio features support by the @@ -357,6 +367,7 @@ struct vdpa_config_ops { /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); u32 (*get_vq_group)(struct vdpa_device *vdev, u16 idx); + u32 (*get_vq_desc_group)(struct vdpa_device *vdev, u16 idx); u64 (*get_device_features)(struct vdpa_device *vdev); int (*set_driver_features)(struct vdpa_device *vdev, u64 features); u64 (*get_driver_features)(struct vdpa_device *vdev); -- 1.8.3.1
Si-Wei Liu
2023-Sep-09 08:42 UTC
[PATCH RFC v2 2/3] vhost-vdpa: introduce descriptor group backend feature
Userspace knows if the device has dedicated descriptor group or not by checking this feature bit. It's only exposed if the vdpa driver backend implements the .get_vq_desc_group() operation callback. Userspace trying to negotiate this feature when it or the dependent _F_IOTLB_ASID feature hasn't been exposed will result in an error. Signed-off-by: Si-Wei Liu <si-wei.liu at oracle.com> Acked-by: Eugenio P?rez <eperezma at redhat.com> --- RFC v1 -> v2: - add clarifications for what areas F_DESC_ASID should cover --- drivers/vhost/vdpa.c | 17 +++++++++++++++++ include/uapi/linux/vhost_types.h | 5 +++++ 2 files changed, 22 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index b43e868..f2e5dce 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -389,6 +389,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v) return ops->resume; } +static bool vhost_vdpa_has_desc_group(const struct vhost_vdpa *v) +{ + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->get_vq_desc_group; +} + static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep) { struct vdpa_device *vdpa = v->vdpa; @@ -679,6 +687,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, if (copy_from_user(&features, featurep, sizeof(features))) return -EFAULT; if (features & ~(VHOST_VDPA_BACKEND_FEATURES | + BIT_ULL(VHOST_BACKEND_F_DESC_ASID) | BIT_ULL(VHOST_BACKEND_F_SUSPEND) | BIT_ULL(VHOST_BACKEND_F_RESUME))) return -EOPNOTSUPP; @@ -688,6 +697,12 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) && !vhost_vdpa_can_resume(v)) return -EOPNOTSUPP; + if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) && + !(features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID))) + return -EINVAL; + if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) && + !vhost_vdpa_has_desc_group(v)) + return -EOPNOTSUPP; vhost_set_backend_features(&v->vdev, features); return 0; } @@ -741,6 +756,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND); if (vhost_vdpa_can_resume(v)) features |= BIT_ULL(VHOST_BACKEND_F_RESUME); + if (vhost_vdpa_has_desc_group(v)) + features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID); if (copy_to_user(featurep, &features, sizeof(features))) r = -EFAULT; break; diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index d3aad12a..6acc604 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -181,5 +181,10 @@ struct vhost_vdpa_iova_range { #define VHOST_BACKEND_F_SUSPEND 0x4 /* Device can be resumed */ #define VHOST_BACKEND_F_RESUME 0x5 +/* Device may expose the virtqueue's descriptor area, driver area and + * device area to a different group for ASID binding than where its + * buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID. + */ +#define VHOST_BACKEND_F_DESC_ASID 0x6 #endif -- 1.8.3.1
Si-Wei Liu
2023-Sep-09 08:42 UTC
[PATCH RFC v2 3/3] vhost-vdpa: uAPI to get dedicated descriptor group id
With _F_DESC_ASID backend feature, the device can now support the VHOST_VDPA_GET_VRING_DESC_GROUP ioctl, and it may expose the descriptor table (including avail and used ring) in a different group than the buffers it contains. This new uAPI will fetch the group ID of the descriptor table. Signed-off-by: Si-Wei Liu <si-wei.liu at oracle.com> Acked-by: Eugenio P?rez <eperezma at redhat.com> --- drivers/vhost/vdpa.c | 10 ++++++++++ include/uapi/linux/vhost.h | 8 ++++++++ 2 files changed, 18 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index f2e5dce..eabac06 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -602,6 +602,16 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, else if (copy_to_user(argp, &s, sizeof(s))) return -EFAULT; return 0; + case VHOST_VDPA_GET_VRING_DESC_GROUP: + if (!vhost_vdpa_has_desc_group(v)) + return -EOPNOTSUPP; + s.index = idx; + s.num = ops->get_vq_desc_group(vdpa, idx); + if (s.num >= vdpa->ngroups) + return -EIO; + else if (copy_to_user(argp, &s, sizeof(s))) + return -EFAULT; + return 0; case VHOST_VDPA_SET_GROUP_ASID: if (copy_from_user(&s, argp, sizeof(s))) return -EFAULT; diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index f5c48b6..649560c 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -219,4 +219,12 @@ */ #define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) +/* Get the group for the descriptor table including driver & device areas + * of a virtqueue: read index, write group in num. + * The virtqueue index is stored in the index field of vhost_vring_state. + * The group ID of the descriptor table for this specific virtqueue + * is returned via num field of vhost_vring_state. + */ +#define VHOST_VDPA_GET_VRING_DESC_GROUP _IOWR(VHOST_VIRTIO, 0x7F, \ + struct vhost_vring_state) #endif -- 1.8.3.1
Jason Wang
2023-Sep-11 06:43 UTC
[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group
On Sat, Sep 9, 2023 at 4:45?PM Si-Wei Liu <si-wei.liu at oracle.com> wrote:> > Following patchset introduces dedicated group for descriptor table to > reduce live migration downtime when passthrough VQ is being switched > to shadow VQ. This RFC v2 is sent to incorporate the early feedback > from reviewers on the uAPI and driver API part of changes, the > associated driver patch set consuming ths API will come around > soon along with formal submission of this series. > > Some initial performance data will be gathered using the real > hardware device with mlx5_vdpa. The target goal of this series is to > reduce the SVQ switching overhead to less than 300ms on a ~100GB > guest with 2 non-mq vhost-vdpa devices. The reduction in the downtime > is thanks to avoiding the full remap in the switching. > > The plan of the intended driver implementation is to use a dedicated > group (specifically, 2 in below table) to host the descriptor tables > for data vqs, different from where buffer addresses are contained (in > group 0 as below). cvq does not have to allocate dedicated group for > descriptor table, so its buffers and descriptor table would always > belong to the same group (1 in table below). > > > | data vq | ctrl vq > ==============+==========+==========> vq_group | 0 | 1 > vq_desc_group | 2 | 1 > > > --- > > Si-Wei Liu (3): > vdpa: introduce dedicated descriptor group for virtqueue > vhost-vdpa: introduce descriptor group backend feature > vhost-vdpa: uAPI to get dedicated descriptor group idLooks good to me but I'd expect example implementations in the parent. Thanks> > drivers/vhost/vdpa.c | 27 +++++++++++++++++++++++++++ > include/linux/vdpa.h | 11 +++++++++++ > include/uapi/linux/vhost.h | 8 ++++++++ > include/uapi/linux/vhost_types.h | 5 +++++ > 4 files changed, 51 insertions(+) > > -- > 1.8.3.1 >