Jason Wang
2021-Apr-07 03:56 UTC
[PATCH linux-next v2 04/14] vdpa: Introduce query of device config layout
? 2021/4/7 ??1:04, Parav Pandit ??:> Introduce a command to query a device config layout. > > An example query of network vdpa device: > > $ vdpa dev add name bar mgmtdev vdpasim_net > > $ vdpa dev config show > bar: mac 00:35:09:19:48:05 link up link_announce false mtu 1500 speed 0 duplex 0 > > $ vdpa dev config show -jp > { > "config": { > "bar": { > "mac": "00:35:09:19:48:05", > "link ": "up", > "link_announce ": false, > "mtu": 1500, > "speed": 0, > "duplex": 0 > } > } > } > > Signed-off-by: Parav Pandit <parav at nvidia.com> > Signed-off-by: Eli Cohen <elic at nvidia.com> > --- > changelog: > v1->v2: > - read whole net config layout instead of individual fields > - added error extack for unmanaged vdpa device > - fixed several endianness issues > - introduced vdpa device ops for get config which is synchronized > with other get/set features ops and config ops > --- > drivers/vdpa/vdpa.c | 234 +++++++++++++++++++++++++++++++++++++- > include/linux/vdpa.h | 42 +++++++ > include/uapi/linux/vdpa.h | 11 ++ > 3 files changed, 282 insertions(+), 5 deletions(-) > > diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c > index 116e076c72fd..9da8deb8c0f2 100644 > --- a/drivers/vdpa/vdpa.c > +++ b/drivers/vdpa/vdpa.c > @@ -14,6 +14,8 @@ > #include <uapi/linux/vdpa.h> > #include <net/genetlink.h> > #include <linux/mod_devicetable.h> > +#include <linux/virtio_net.h> > +#include <linux/virtio_ids.h> > > static LIST_HEAD(mdev_head); > /* A global mutex that protects vdpa management device and device level operations. */ > @@ -60,6 +62,7 @@ static void vdpa_release_dev(struct device *d) > ops->free(vdev); > > ida_simple_remove(&vdpa_index_ida, vdev->index); > + mutex_destroy(&vdev->cf_mutex); > kfree(vdev); > } > > @@ -114,6 +117,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, > if (err) > goto err_name; > > + mutex_init(&vdev->cf_mutex); > device_initialize(&vdev->dev); > > return vdev; > @@ -269,10 +273,25 @@ EXPORT_SYMBOL_GPL(vdpa_mgmtdev_register); > */ > u64 vdpa_get_features(struct vdpa_device *vdev) > { > - return vdev->config->get_features(vdev); > + u64 features; > + > + mutex_lock(&vdev->cf_mutex); > + features = vdev->config->get_features(vdev); > + mutex_unlock(&vdev->cf_mutex); > + return features; > } > EXPORT_SYMBOL_GPL(vdpa_get_features); > > +static int __vdpa_set_features(struct vdpa_device *vdev, u64 features) > +{ > + const struct vdpa_config_ops *ops = vdev->config; > + int err; > + > + vdev->features_valid = true; > + err = ops->set_features(vdev, features); > + return err; > +} > + > /** > * vdpa_set_features - Set vDPA device features > * @vdev: vdpa device whose features to set > @@ -282,10 +301,12 @@ EXPORT_SYMBOL_GPL(vdpa_get_features); > */ > int vdpa_set_features(struct vdpa_device *vdev, u64 features) > { > - const struct vdpa_config_ops *ops = vdev->config; > + int err; > > - vdev->features_valid = true; > - return ops->set_features(vdev, features); > + mutex_lock(&vdev->cf_mutex); > + err = __vdpa_set_features(vdev, features); > + mutex_unlock(&vdev->cf_mutex); > + return err; > } > EXPORT_SYMBOL_GPL(vdpa_set_features); > > @@ -294,13 +315,15 @@ void vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, > { > const struct vdpa_config_ops *ops = vdev->config; > > + mutex_lock(&vdev->cf_mutex); > /* > * Config accesses aren't supposed to trigger before features are set. > * If it does happen we assume a legacy guest. > */ > if (!vdev->features_valid) > - vdpa_set_features(vdev, 0); > + __vdpa_set_features(vdev, 0); > ops->get_config(vdev, offset, buf, len); > + mutex_unlock(&vdev->cf_mutex); > } > EXPORT_SYMBOL_GPL(vdpa_get_config); > > @@ -314,7 +337,9 @@ EXPORT_SYMBOL_GPL(vdpa_get_config); > void vdpa_set_config(struct vdpa_device *dev, unsigned int offset, > void *buf, unsigned int length) > { > + mutex_lock(&dev->cf_mutex); > dev->config->set_config(dev, offset, buf, length); > + mutex_unlock(&dev->cf_mutex); > } > EXPORT_SYMBOL_GPL(vdpa_set_config); > > @@ -664,6 +689,198 @@ static int vdpa_nl_cmd_dev_get_dumpit(struct sk_buff *msg, struct netlink_callba > return msg->len; > } > > +static int vdpa_dev_net_mq_config_fill(struct vdpa_device *vdev, > + struct sk_buff *msg, u64 features, > + const struct vdpa_dev_config *config) > +{ > + if ((features & (1ULL << VIRTIO_NET_F_MQ)) == 0) > + return 0; > + > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP, > + config->net.max_virtqueue_pairs)) > + return -EMSGSIZE; > + return 0; > +} > + > +static int vdpa_dev_net_rss_config_fill(struct vdpa_device *vdev, > + struct sk_buff *msg, u64 features, > + const struct vdpa_dev_config *config) > +{ > + if ((features & (1ULL << VIRTIO_NET_F_RSS)) == 0) > + return 0; > + > + if (nla_put_u8(msg, VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, > + config->net.rss_max_key_size)) > + return -EMSGSIZE; > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, > + config->net.rss_max_key_size)) > + return -EMSGSIZE; > + if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, > + config->net.supported_hash_types)) > + return -EMSGSIZE; > + return 0; > +} > + > +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg) > +{ > + struct vdpa_dev_config config = {}; > + u64 features; > + int err; > + > + mutex_lock(&vdev->cf_mutex); > + vdev->config->get_ce_config(vdev, &config); > + mutex_unlock(&vdev->cf_mutex); > + > + if (nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR, sizeof(config.net.mac), > + config.net.mac)) > + return -EMSGSIZE; > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, config.net.status)) > + return -EMSGSIZE; > + if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, config.net.mtu)) > + return -EMSGSIZE; > + if (nla_put_u32(msg, VDPA_ATTR_DEV_NET_CFG_SPEED, config.net.speed)) > + return -EMSGSIZE; > + if (nla_put_u8(msg, VDPA_ATTR_DEV_NET_CFG_DUPLEX, config.net.duplex)) > + return -EMSGSIZE; > + > + features = vdev->config->get_features(vdev); > + > + err = vdpa_dev_net_mq_config_fill(vdev, msg, features, &config); > + if (err) > + return err; > + return vdpa_dev_net_rss_config_fill(vdev, msg, features, &config); > +} > + > +static int > +vdpa_dev_config_fill(struct vdpa_device *vdev, struct sk_buff *msg, u32 portid, u32 seq, > + int flags, struct netlink_ext_ack *extack) > +{ > + u32 device_id; > + void *hdr; > + int err; > + > + if (!vdev->config->get_ce_config) { > + NL_SET_ERR_MSG_MOD(extack, > + "Configuration layout query is unsupported"); > + return -EOPNOTSUPP; > + } > + > + hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags, > + VDPA_CMD_DEV_CONFIG_GET); > + if (!hdr) > + return -EMSGSIZE; > + > + if (nla_put_string(msg, VDPA_ATTR_DEV_NAME, dev_name(&vdev->dev))) { > + err = -EMSGSIZE; > + goto msg_err; > + } > + > + device_id = vdev->config->get_device_id(vdev); > + if (nla_put_u32(msg, VDPA_ATTR_DEV_ID, device_id)) { > + err = -EMSGSIZE; > + goto msg_err; > + } > + > + switch (device_id) { > + case VIRTIO_ID_NET: > + err = vdpa_dev_net_config_fill(vdev, msg); > + break; > + default: > + err = -EOPNOTSUPP; > + break; > + } > + if (err) > + goto msg_err; > + > + genlmsg_end(msg, hdr); > + return 0; > + > +msg_err: > + genlmsg_cancel(msg, hdr); > + return err; > +} > + > +static int vdpa_nl_cmd_dev_config_get_doit(struct sk_buff *skb, struct genl_info *info) > +{ > + struct vdpa_device *vdev; > + struct sk_buff *msg; > + const char *devname; > + struct device *dev; > + int err; > + > + if (!info->attrs[VDPA_ATTR_DEV_NAME]) > + return -EINVAL; > + devname = nla_data(info->attrs[VDPA_ATTR_DEV_NAME]); > + msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); > + if (!msg) > + return -ENOMEM; > + > + mutex_lock(&vdpa_dev_mutex); > + dev = bus_find_device(&vdpa_bus, NULL, devname, vdpa_name_match); > + if (!dev) { > + NL_SET_ERR_MSG_MOD(info->extack, "device not found"); > + err = -ENODEV; > + goto dev_err; > + } > + vdev = container_of(dev, struct vdpa_device, dev); > + if (!vdev->mdev) { > + NL_SET_ERR_MSG_MOD(info->extack, "unmanaged vdpa device"); > + err = -EINVAL; > + goto mdev_err; > + } > + err = vdpa_dev_config_fill(vdev, msg, info->snd_portid, info->snd_seq, > + 0, info->extack); > + if (!err) > + err = genlmsg_reply(msg, info); > + > +mdev_err: > + put_device(dev); > +dev_err: > + mutex_unlock(&vdpa_dev_mutex); > + if (err) > + nlmsg_free(msg); > + return err; > +} > + > +static int vdpa_dev_config_dump(struct device *dev, void *data) > +{ > + struct vdpa_device *vdev = container_of(dev, struct vdpa_device, dev); > + struct vdpa_dev_dump_info *info = data; > + int err; > + > + if (!vdev->mdev) > + return 0; > + if (info->idx < info->start_idx) { > + info->idx++; > + return 0; > + } > + err = vdpa_dev_config_fill(vdev, info->msg, NETLINK_CB(info->cb->skb).portid, > + info->cb->nlh->nlmsg_seq, NLM_F_MULTI, > + info->cb->extack); > + if (err) > + return err; > + > + info->idx++; > + return 0; > +} > + > +static int > +vdpa_nl_cmd_dev_config_get_dumpit(struct sk_buff *msg, struct netlink_callback *cb) > +{ > + struct vdpa_dev_dump_info info; > + > + info.msg = msg; > + info.cb = cb; > + info.start_idx = cb->args[0]; > + info.idx = 0; > + > + mutex_lock(&vdpa_dev_mutex); > + bus_for_each_dev(&vdpa_bus, NULL, &info, vdpa_dev_config_dump); > + mutex_unlock(&vdpa_dev_mutex); > + cb->args[0] = info.idx; > + return msg->len; > +} > + > static const struct nla_policy vdpa_nl_policy[VDPA_ATTR_MAX + 1] = { > [VDPA_ATTR_MGMTDEV_BUS_NAME] = { .type = NLA_NUL_STRING }, > [VDPA_ATTR_MGMTDEV_DEV_NAME] = { .type = NLA_STRING }, > @@ -695,6 +912,13 @@ static const struct genl_ops vdpa_nl_ops[] = { > .doit = vdpa_nl_cmd_dev_get_doit, > .dumpit = vdpa_nl_cmd_dev_get_dumpit, > }, > + { > + .cmd = VDPA_CMD_DEV_CONFIG_GET, > + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, > + .doit = vdpa_nl_cmd_dev_config_get_doit, > + .dumpit = vdpa_nl_cmd_dev_config_get_dumpit, > + .flags = GENL_ADMIN_PERM, > + }, > }; > > static struct genl_family vdpa_nl_family __ro_after_init = { > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h > index 62e68ccc4c96..dcbbecb5dea8 100644 > --- a/include/linux/vdpa.h > +++ b/include/linux/vdpa.h > @@ -6,6 +6,7 @@ > #include <linux/device.h> > #include <linux/interrupt.h> > #include <linux/vhost_iotlb.h> > +#include <linux/if_ether.h> > > /** > * struct vdpa_calllback - vDPA callback definition. > @@ -47,6 +48,7 @@ struct vdpa_mgmt_dev; > * @nvqs: maximum number of supported virtqueues > * @mdev: management device pointer; caller must setup when registering device as part > * of dev_add() mgmtdev ops callback before invoking _vdpa_register_device(). > + * @cf_mutex: Protects get and set access to features and configuration layout. > */ > struct vdpa_device { > struct device dev; > @@ -56,6 +58,7 @@ struct vdpa_device { > bool features_valid; > int nvqs; > struct vdpa_mgmt_dev *mdev; > + struct mutex cf_mutex; /* Protects get/set config and features */ > }; > > /** > @@ -68,6 +71,39 @@ struct vdpa_iova_range { > u64 last; > }; > > +/** > + * struct vdpa_net_dev_config - vDPA net device config to get/set via > + * management device. > + * @mac: mac address of the vdpa device > + * @status: indicates VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* > + * @max_virtqueue_pairs: Maximum number of each of transmit and receive queues. > + * see VIRTIO_NET_F_MQ and VIRTIO_NET_CTRL_MQ. > + * Legal values are between 1 and 0x8000. > + * @speed: In units of 1Mb. All values 0 to INT_MAX are legal. > + * @mtu: Default maximum transmit unit advice. > + * @duplex: 0x00 - half duplex > + * 0x01 - full duplex > + * @rss_max_key_size: Maximum size of RSS key. > + * @supported_hash_types: Bitmask of supported VIRTIO_NET_RSS_HASH_ types > + * @rss_max_indirection_table_length: Maximum number of indirection table > + * entries. > + */ > +struct vdpa_net_dev_config { > + u8 mac[ETH_ALEN]; > + u16 status; > + u16 max_virtqueue_pairs; > + u32 speed; > + u16 mtu; > + u8 duplex; > + u8 rss_max_key_size; > + u32 supported_hash_types; > + u16 rss_max_indirection_table_length; > +}; > + > +struct vdpa_dev_config { > + struct vdpa_net_dev_config net; > +}; > + > /** > * struct vdpa_config_ops - operations for configuring a vDPA device. > * Note: vDPA device drivers are required to implement all of the > @@ -164,6 +200,10 @@ struct vdpa_iova_range { > * @buf: buffer used to write from > * @len: the length to write to > * configuration space > + * @get_ce_config: Read device specific configuration in > + * cpu endianness. > + * @vdev: vdpa device > + * @config: pointer to config buffer used to read toSo I wonder what's the reason for having this? If it's only the reason of endian, we can just use get_config. We don't plan to expose a legacy or transition device, so we can simply do the endian conversion in the device. Then we can stick to the uapi of virtio_net_config and there's no need for the VDPA_ATTR_DEV_NET_CFG_XXX? Thanks> * @get_generation: Get device config generation (optional) > * @vdev: vdpa device > * Returns u32: device generation > @@ -235,6 +275,8 @@ struct vdpa_config_ops { > void *buf, unsigned int len); > void (*set_config)(struct vdpa_device *vdev, unsigned int offset, > const void *buf, unsigned int len); > + void (*get_ce_config)(struct vdpa_device *vdev, > + struct vdpa_dev_config *config); > u32 (*get_generation)(struct vdpa_device *vdev); > struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev); > > diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h > index 66a41e4ec163..5c31ecc3b956 100644 > --- a/include/uapi/linux/vdpa.h > +++ b/include/uapi/linux/vdpa.h > @@ -17,6 +17,7 @@ enum vdpa_command { > VDPA_CMD_DEV_NEW, > VDPA_CMD_DEV_DEL, > VDPA_CMD_DEV_GET, /* can dump */ > + VDPA_CMD_DEV_CONFIG_GET, /* can dump */ > }; > > enum vdpa_attr { > @@ -33,6 +34,16 @@ enum vdpa_attr { > VDPA_ATTR_DEV_MAX_VQS, /* u32 */ > VDPA_ATTR_DEV_MAX_VQ_SIZE, /* u16 */ > > + VDPA_ATTR_DEV_NET_CFG_MACADDR, /* binary */ > + VDPA_ATTR_DEV_NET_STATUS, /* u8 */ > + VDPA_ATTR_DEV_NET_CFG_MAX_VQP, /* u16 */ > + VDPA_ATTR_DEV_NET_CFG_MTU, /* u16 */ > + VDPA_ATTR_DEV_NET_CFG_SPEED, /* u16 */ > + VDPA_ATTR_DEV_NET_CFG_DUPLEX, /* u16 */ > + VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, /* u8 */ > + VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, /* u16 */ > + VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, /* u32 */ > + > /* new attributes must be added above here */ > VDPA_ATTR_MAX, > };
Parav Pandit
2021-Apr-07 05:10 UTC
[PATCH linux-next v2 04/14] vdpa: Introduce query of device config layout
> From: Jason Wang <jasowang at redhat.com> > Sent: Wednesday, April 7, 2021 9:26 AM[..]> > /** > > * struct vdpa_config_ops - operations for configuring a vDPA device. > > * Note: vDPA device drivers are required to implement all of the @@ > > -164,6 +200,10 @@ struct vdpa_iova_range { > > * @buf: buffer used to write from > > * @len: the length to write to > > * configuration space > > + * @get_ce_config: Read device specific configuration in > > + * cpu endianness. > > + * @vdev: vdpa device > > + * @config: pointer to config buffer used to > read to > > > So I wonder what's the reason for having this? If it's only the reason of > endian, we can just use get_config. >Didn't follow your suggestion. How does in kernel management tool caller get_config know how to do endianenss conversion? Or you are proposing to send this whole virtio_config structure as binary data to user space via netlink? If so, it is not a practice in netlink to return struct. Also making netlink user space to understand __virtio16 doesn't sound correct. Often orchestration is not written in C. I cannot think of returning virtio_net_config via netlink socket if it is your thought. And decoding it requires to query features too. Querying features and config are not atomic so parsed value can be wrong. Endianness has to be self-contained in fields return via netlink. With that baseline, lets think further.> We don't plan to expose a legacy or transition device, so we can simply do > the endian conversion in the device. >I am bit confused with Michael's suggestion in v1 [1]. VIRTIO_F_VERSION_1 is set today by mlx5 and ifcvf driver.> Then we can stick to the uapi of virtio_net_config and there's no need for the > VDPA_ATTR_DEV_NET_CFG_XXX? >[1] https://lore.kernel.org/virtualization/20210224020336-mutt-send-email-mst at kernel.org/