Si-Wei Liu
2022-Mar-09 03:32 UTC
[PATCH v1 1/2] vdpa: Add support for querying vendor statistics
On 3/8/2022 6:13 AM, Eli Cohen wrote:> >> -----Original Message----- >> From: Si-Wei Liu <si-wei.liu at oracle.com> >> Sent: Tuesday, March 8, 2022 8:16 AM >> To: Eli Cohen <elic at nvidia.com> >> Cc: mst at redhat.com; jasowang at redhat.com; virtualization at lists.linux- >> foundation.org; eperezma at redhat.com; amorenoz at redhat.com; >> lvivier at redhat.com; sgarzare at redhat.com; Parav Pandit <parav at nvidia.com> >> Subject: Re: [PATCH v1 1/2] vdpa: Add support for querying vendor statistics >> >> >> >> On 3/6/2022 11:57 PM, Eli Cohen wrote: >>>> -----Original Message----- >>>> From: Si-Wei Liu <si-wei.liu at oracle.com> >>>> Sent: Saturday, March 5, 2022 12:34 AM >>>> To: Eli Cohen <elic at nvidia.com> >>>> Cc: mst at redhat.com; jasowang at redhat.com; virtualization at lists.linux- >>>> foundation.org; eperezma at redhat.com; amorenoz at redhat.com; >>>> lvivier at redhat.com; sgarzare at redhat.com; Parav Pandit >>>> <parav at nvidia.com> >>>> Subject: Re: [PATCH v1 1/2] vdpa: Add support for querying vendor >>>> statistics >>>> >>>> Sorry, I somehow missed this after my break. Please see comments in line. >>>> >>>> On 2/16/2022 10:46 PM, Eli Cohen wrote: >>>>> On Wed, Feb 16, 2022 at 10:49:26AM -0800, Si-Wei Liu wrote: >>>>>> On 2/16/2022 12:00 AM, Eli Cohen wrote: >>>>>>> Allows to read vendor statistics of a vdpa device. The specific >>>>>>> statistics data is received by the upstream driver in the form of >>>>>>> an (attribute name, attribute value) pairs. >>>>>>> >>>>>>> An example of statistics for mlx5_vdpa device are: >>>>>>> >>>>>>> received_desc - number of descriptors received by the virtqueue >>>>>>> completed_desc - number of descriptors completed by the virtqueue >>>>>>> >>>>>>> A descriptor using indirect buffers is still counted as 1. In >>>>>>> addition, N chained descriptors are counted correctly N times as >>>>>>> one >>>> would expect. >>>>>>> A new callback was added to vdpa_config_ops which provides the >>>>>>> means for the vdpa driver to return statistics results. >>>>>>> >>>>>>> The interface allows for reading all the supported virtqueues, >>>>>>> including the control virtqueue if it exists. >>>>>>> >>>>>>> Below are some examples taken from mlx5_vdpa which are introduced >>>>>>> in the following patch: >>>>>>> >>>>>>> 1. Read statistics for the virtqueue at index 1 >>>>>>> >>>>>>> $ vdpa dev vstats show vdpa-a qidx 1 >>>>>>> vdpa-a: >>>>>>> queue_type tx queue_index 1 received_desc 3844836 completed_desc >>>>>>> 3844836 >>>>>>> >>>>>>> 2. Read statistics for the virtqueue at index 32 $ vdpa dev vstats >>>>>>> show vdpa-a qidx 32 >>>>>>> vdpa-a: >>>>>>> queue_type control_vq queue_index 32 received_desc 62 >>>>>>> completed_desc >>>>>>> 62 >>>>>>> >>>>>>> 3. Read statisitics for the virtqueue at index 0 with json output >>>>>>> $ vdpa -j dev vstats show vdpa-a qidx 0 {"vstats":{"vdpa-a":{ >>>>>>> >>>> "queue_type":"rx","queue_index":0,"name":"received_desc","value":4177 >>>> 76,\ >>>>>>> "name":"completed_desc","value":417548}}} >>>>>>> >>>>>>> 4. Read statistics for the virtqueue at index 0 with preety json >>>>>>> output $ vdpa -jp dev vstats show vdpa-a qidx 0 { >>>>>>> "vstats": { >>>>>>> "vdpa-a": { >>>>>>> >>>>>>> "queue_type": "rx", >>>>>> I wonder where this info can be inferred? I don't see relevant >>>>>> change in the patch series that helps gather the >> VDPA_ATTR_DEV_QUEUE_TYPE? >>>>>> Is this an arbitrary string defined by the vendor as well? If so, >>>>>> how does the user expect to consume it? >>>>> The queue tupe is deduced from the index and whether we have a >>>>> virtqueue. Even numbers are rx, odd numbers are tx and if there is >>>>> CVQ, the last one is CVQ. >>>> OK, then VDPA_ATTR_DEV_QUEUE_TYPE attribute introduced in this patch >>>> might not be useful at all? >>> Right, will remove. >>> >>>> And how do you determine in the vdpa tool if CVQ is negotiated or >>>> not? >>> I make a netlink call to get the same information as " vdpa dev config show" >> retrieves. I use the negotiated features to determine if a CVQ is available. If it >> is, the number of VQs equals the control VQ index. So there are two netlink >> calls under the hood. >> The lock vdpa_dev_mutex won't hold across the two separate netlink calls, and >> it may end up with inconsistent state - theoretically things could happen like >> that the first call gets CVQ negotiated, but the later call for >> get_vendor_vq_stats() on the cvq might get -EINVAL due to device reset. Can >> the negotiated status and stat query be done within one single netlink call? > I see your concern. > The only reason I do the extra call is to know if we have a control VQ and what > index it is, just to print a descriptive string telling if it's a either rx, tx or control VQ. > > So the cure can be simple. Let's have a new attribute that returns the type of > virtqueue.I am not sure I follow the cure. Wouldn't it be possible to get both negotiated status and the queue stat in vdpa_nl_cmd_dev_stats_get_doit() under the same vdpa_dev_mutex lock? And I am not even sure if it is a must to display the queue type - it doesn't seem the output includes the vdpa class info, which makes it hard for script to parse the this field in generic way.> I think Jason did not like the idea of communicating the kind of VQ > from kernel to userspace but under these circumstances, maybe he would approve. > Jason? > >> What worried me is that the queue index being dynamic and depended on >> negotiation status would make host admin user quite hard to follow. The guest >> may or may not advertise F_MQ and/or F_CTRL_VQ across various phases, e.g. >> firmware (UEFI), boot loader (grub) till OS driver is up and running, which can >> be agnostic to host admin. For most of the part it's not easy to script and >> predict the queue index which can change from time to time. Can we define >> the order of host predictable queue index, which is independent from any >> guest negotiated state?Here I think we can just use the plain queue index in the host view - say if vdpa net has 4 pairs of data vqs and 1 control vq, user may use qindex 8 across the board to identify the control vq, regardless if the F_MQ feature is negotiated or not in guest. Regards, -Siwei>> >>>> Looks to me there are still some loose end I don't quite yet >>>> understand. >>>> >>>> >>>>>>> "queue_index": 0, >>> I think this can be removed since the command is for a specific index. >>> >>>>>>> "name": "received_desc", >>>>>>> "value": 417776, >>>>>>> "name": "completed_desc", >>>>>>> "value": 417548 >>>>>> Not for this kernel patch, but IMHO it's the best to put the name & >>>>>> value pairs in an array instead of flat entries in json's >>>>>> hash/dictionary. The hash entries can be re-ordered deliberately by >>>>>> external json parsing tool, ending up with inconsistent stat values. >>>> This comment is missed for some reason. Please change the example in >>>> the log if you agree to address it in vdpa tool. Or justify why >>>> keeping the order for json hash/dictionary is fine. >>> Sorry for skipping this comment. >>> Do you mean to present the information like: >>> "received_desc": 417776, >>> "completed_desc": 417548, >> I mean the following presentation: >> >> $ vdpa -jp dev vstats show vdpa-a qidx 0 { >> "vstats": { >> "vdpa-a": { >> "queue_stats": [{ >> "queue_index": 0, >> "queue_type": "rx", >> "stat_name": [ "received_desc","completed_desc" ], >> "stat_value": [ 417776,417548 ], >> }] >> } >> } >> } >> >> I think Parav had similar suggestion, too. >> >> Thanks, >> -Siwei >> >>>> Thanks, >>>> -Siwei >>>> >>>>>> Thanks, >>>>>> -Siwei >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Signed-off-by: Eli Cohen <elic at nvidia.com> >>>>>>> --- >>>>>>> drivers/vdpa/vdpa.c | 129 >>>> ++++++++++++++++++++++++++++++++++++++ >>>>>>> include/linux/vdpa.h | 5 ++ >>>>>>> include/uapi/linux/vdpa.h | 7 +++ >>>>>>> 3 files changed, 141 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index >>>>>>> 9846c9de4bfa..d0ff671baf88 100644 >>>>>>> --- a/drivers/vdpa/vdpa.c >>>>>>> +++ b/drivers/vdpa/vdpa.c >>>>>>> @@ -909,6 +909,74 @@ vdpa_dev_config_fill(struct vdpa_device >>>>>>> *vdev, >>>> struct sk_buff *msg, u32 portid, >>>>>>> return err; >>>>>>> } >>>>>>> +static int vdpa_fill_stats_rec(struct vdpa_device *vdev, struct >>>>>>> +sk_buff >>>> *msg, >>>>>>> + struct genl_info *info, u32 index) { >>>>>>> + int err; >>>>>>> + >>>>>>> + if (nla_put_u32(msg, VDPA_ATTR_DEV_QUEUE_INDEX, index)) >>>>>>> + return -EMSGSIZE; >>>>>>> + >>>>>>> + err = vdev->config->get_vendor_vq_stats(vdev, index, msg, info- >>>>> extack); >>>>>>> + if (err) >>>>>>> + return err; >>>>>>> + >>>>>>> + return 0; >>>>>>> +} >>>>>>> + >>>>>>> +static int vendor_stats_fill(struct vdpa_device *vdev, struct sk_buff >> *msg, >>>>>>> + struct genl_info *info, u32 index) { >>>>>>> + int err; >>>>>>> + >>>>>>> + if (!vdev->config->get_vendor_vq_stats) >>>>>>> + return -EOPNOTSUPP; >>>>>>> + >>>>>>> + err = vdpa_fill_stats_rec(vdev, msg, info, index); >>>>>>> + if (err) >>>>>>> + return err; >>>>>>> + >>>>>>> + return 0; >>>>>>> +} >>>>>>> + >>>>>>> +static int vdpa_dev_vendor_stats_fill(struct vdpa_device *vdev, >>>>>>> + struct sk_buff *msg, >>>>>>> + struct genl_info *info, u32 index) { >>>>>>> + u32 device_id; >>>>>>> + void *hdr; >>>>>>> + int err; >>>>>>> + u32 portid = info->snd_portid; >>>>>>> + u32 seq = info->snd_seq; >>>>>>> + u32 flags = 0; >>>>>>> + >>>>>>> + hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags, >>>>>>> + VDPA_CMD_DEV_VSTATS_GET); >>>>>>> + if (!hdr) >>>>>>> + return -EMSGSIZE; >>>>>>> + >>>>>>> + if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(&vdev- >>>>> dev))) { >>>>>>> + err = -EMSGSIZE; >>>>>>> + goto undo_msg; >>>>>>> + } >>>>>>> + >>>>>>> + device_id = vdev->config->get_device_id(vdev); >>>>>>> + if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) { >>>>>>> + err = -EMSGSIZE; >>>>>>> + goto undo_msg; >>>>>>> + } >>>>>>> + >>>>>>> + err = vendor_stats_fill(vdev, msg, info, index); >>>>>>> + >>>>>>> + genlmsg_end(msg, hdr); >>>>>>> + >>>>>>> + return err; >>>>>>> + >>>>>>> +undo_msg: >>>>>>> + genlmsg_cancel(msg, hdr); >>>>>>> + return err; >>>>>>> +} >>>>>>> + >>>>>>> static int vdpa_nl_cmd_dev_config_get_doit(struct sk_buff >>>>>>> *skb, struct >>>> genl_info *info) >>>>>>> { >>>>>>> struct vdpa_device *vdev; >>>>>>> @@ -990,6 +1058,60 @@ vdpa_nl_cmd_dev_config_get_dumpit(struct >>>> sk_buff *msg, struct netlink_callback * >>>>>>> return msg->len; >>>>>>> } >>>>>>> +static int vdpa_nl_cmd_dev_stats_get_doit(struct sk_buff *skb, >>>>>>> + struct genl_info *info) >>>>>>> +{ >>>>>>> + struct vdpa_device *vdev; >>>>>>> + struct sk_buff *msg; >>>>>>> + const char *devname; >>>>>>> + struct device *dev; >>>>>>> + u32 index; >>>>>>> + int err; >>>>>>> + >>>>>>> + if (!info->attrs[VDPA_ATTR_DEV_NAME]) >>>>>>> + return -EINVAL; >>>>>>> + >>>>>>> + if (!info->attrs[VDPA_ATTR_DEV_QUEUE_INDEX]) >>>>>>> + return -EINVAL; >>>>>>> + >>>>>>> + devname = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]); >>>>>>> + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); >>>>>>> + if (!msg) >>>>>>> + return -ENOMEM; >>>>>>> + >>>>>>> + index = nla_get_u32(info->attrs[VDPA_ATTR_DEV_QUEUE_INDEX]); >>>>>>> + mutex_lock(&vdpa_dev_mutex); >>>>>>> + dev = bus_find_device(&vdpa_bus, NULL, devname, >>>> vdpa_name_match); >>>>>>> + if (!dev) { >>>>>>> + NL_SET_ERR_MSG_MOD(info->extack, "device not found"); >>>>>>> + err = -ENODEV; >>>>>>> + goto dev_err; >>>>>>> + } >>>>>>> + vdev = container_of(dev, struct vdpa_device, dev); >>>>>>> + if (!vdev->mdev) { >>>>>>> + NL_SET_ERR_MSG_MOD(info->extack, "unmanaged vdpa >>>> device"); >>>>>>> + err = -EINVAL; >>>>>>> + goto mdev_err; >>>>>>> + } >>>>>>> + err = vdpa_dev_vendor_stats_fill(vdev, msg, info, index); >>>>>>> + if (!err) >>>>>>> + err = genlmsg_reply(msg, info); >>>>>>> + >>>>>>> + put_device(dev); >>>>>>> + mutex_unlock(&vdpa_dev_mutex); >>>>>>> + >>>>>>> + if (err) >>>>>>> + nlmsg_free(msg); >>>>>>> + >>>>>>> + return err; >>>>>>> + >>>>>>> +mdev_err: >>>>>>> + put_device(dev); >>>>>>> +dev_err: >>>>>>> + mutex_unlock(&vdpa_dev_mutex); >>>>>>> + return err; >>>>>>> +} >>>>>>> + >>>>>>> static const struct nla_policy vdpa_nl_policy[VDPA_ATTR_MAX + 1] = { >>>>>>> [VDPA_ATTR_MGMTDEV_BUS_NAME] = { .type >> NLA_NUL_STRING }, >>>>>>> [VDPA_ATTR_MGMTDEV_DEV_NAME] = { .type = NLA_STRING >> }, @@ - >>>> 997,6 >>>>>>> +1119,7 @@ static const struct nla_policy >>>> vdpa_nl_policy[VDPA_ATTR_MAX + 1] = { >>>>>>> [VDPA_ATTR_DEV_NET_CFG_MACADDR] >> NLA_POLICY_ETH_ADDR, >>>>>>> /* virtio spec 1.1 section 5.1.4.1 for valid MTU range */ >>>>>>> [VDPA_ATTR_DEV_NET_CFG_MTU] >> NLA_POLICY_MIN(NLA_U16, 68), >>>>>>> + [VDPA_ATTR_DEV_QUEUE_INDEX] = NLA_POLICY_RANGE(NLA_U32, 0, >>>> 65535), >>>>>>> }; >>>>>>> static const struct genl_ops vdpa_nl_ops[] = { @@ -1030,6 >>>>>>> +1153,12 @@ static const struct genl_ops vdpa_nl_ops[] = { >>>>>>> .doit = vdpa_nl_cmd_dev_config_get_doit, >>>>>>> .dumpit = vdpa_nl_cmd_dev_config_get_dumpit, >>>>>>> }, >>>>>>> + { >>>>>>> + .cmd = VDPA_CMD_DEV_VSTATS_GET, >>>>>>> + .validate = GENL_DONT_VALIDATE_STRICT | >>>> GENL_DONT_VALIDATE_DUMP, >>>>>>> + .doit = vdpa_nl_cmd_dev_stats_get_doit, >>>>>>> + .flags = GENL_ADMIN_PERM, >>>>>>> + }, >>>>>>> }; >>>>>>> static struct genl_family vdpa_nl_family __ro_after_init = { >>>>>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index >>>>>>> 2de442ececae..274203845cfc 100644 >>>>>>> --- a/include/linux/vdpa.h >>>>>>> +++ b/include/linux/vdpa.h >>>>>>> @@ -275,6 +275,9 @@ struct vdpa_config_ops { >>>>>>> const struct vdpa_vq_state *state); >>>>>>> int (*get_vq_state)(struct vdpa_device *vdev, u16 idx, >>>>>>> struct vdpa_vq_state *state); >>>>>>> + int (*get_vendor_vq_stats)(struct vdpa_device *vdev, u16 idx, >>>>>>> + struct sk_buff *msg, >>>>>>> + struct netlink_ext_ack *extack); >>>>>>> struct vdpa_notification_area >>>>>>> (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); >>>>>>> /* vq irq is not expected to be changed once DRIVER_OK is set >>>>>>> */ @@ -466,4 +469,6 @@ struct vdpa_mgmt_dev { >>>>>>> int vdpa_mgmtdev_register(struct vdpa_mgmt_dev *mdev); >>>>>>> void vdpa_mgmtdev_unregister(struct vdpa_mgmt_dev *mdev); >>>>>>> +#define VDPA_INVAL_QUEUE_INDEX 0xffff >>>>>>> + >>>>>>> #endif /* _LINUX_VDPA_H */ >>>>>>> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h >>>>>>> index 1061d8d2d09d..c5f229a41dc2 100644 >>>>>>> --- a/include/uapi/linux/vdpa.h >>>>>>> +++ b/include/uapi/linux/vdpa.h >>>>>>> @@ -18,6 +18,7 @@ enum vdpa_command { >>>>>>> VDPA_CMD_DEV_DEL, >>>>>>> VDPA_CMD_DEV_GET, /* can dump */ >>>>>>> VDPA_CMD_DEV_CONFIG_GET, /* can dump */ >>>>>>> + VDPA_CMD_DEV_VSTATS_GET, >>>>>>> }; >>>>>>> enum vdpa_attr { >>>>>>> @@ -46,6 +47,12 @@ enum vdpa_attr { >>>>>>> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, /* u64 */ >>>>>>> VDPA_ATTR_DEV_MGMTDEV_MAX_VQS, /* >> u32 */ >>>>>>> VDPA_ATTR_DEV_SUPPORTED_FEATURES, /* u64 */ >>>>>>> + >>>>>>> + VDPA_ATTR_DEV_QUEUE_INDEX, /* u16 */ >>>>>>> + VDPA_ATTR_DEV_QUEUE_TYPE, /* string */ >>>>>>> + VDPA_ATTR_DEV_VENDOR_ATTR_NAME, /* string */ >>>>>>> + VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, /* u64 */ >>>>>>> + >>>>>>> /* new attributes must be added above here */ >>>>>>> VDPA_ATTR_MAX, >>>>>>> };