Si-Wei Liu
2022-Aug-03 23:09 UTC
[PATCH V3 4/6] vDPA: !FEATURES_OK should not block querying device config space
On 8/2/2022 7:30 PM, Zhu, Lingshan wrote:> > > On 8/3/2022 9:26 AM, Si-Wei Liu wrote: >> >> >> On 8/1/2022 11:33 PM, Jason Wang wrote: >>> On Tue, Aug 2, 2022 at 6:58 AM Si-Wei Liu <si-wei.liu at oracle.com> >>> wrote: >>>> >>>> >>>> On 8/1/2022 3:53 PM, Si-Wei Liu wrote: >>>>> >>>>> On 7/31/2022 9:44 PM, Jason Wang wrote: >>>>>> ? 2022/7/30 04:55, Si-Wei Liu ??: >>>>>>> >>>>>>> On 7/28/2022 7:04 PM, Zhu, Lingshan wrote: >>>>>>>> >>>>>>>> On 7/29/2022 5:48 AM, Si-Wei Liu wrote: >>>>>>>>> >>>>>>>>> On 7/27/2022 7:43 PM, Zhu, Lingshan wrote: >>>>>>>>>> >>>>>>>>>> On 7/28/2022 8:56 AM, Si-Wei Liu wrote: >>>>>>>>>>> >>>>>>>>>>> On 7/27/2022 4:47 AM, Zhu, Lingshan wrote: >>>>>>>>>>>> >>>>>>>>>>>> On 7/27/2022 5:43 PM, Si-Wei Liu wrote: >>>>>>>>>>>>> Sorry to chime in late in the game. For some reason I >>>>>>>>>>>>> couldn't >>>>>>>>>>>>> get to most emails for this discussion (I only subscribed to >>>>>>>>>>>>> the virtualization list), while I was taking off amongst the >>>>>>>>>>>>> past few weeks. >>>>>>>>>>>>> >>>>>>>>>>>>> It looks to me this patch is incomplete. Noted down the >>>>>>>>>>>>> way in >>>>>>>>>>>>> vdpa_dev_net_config_fill(), we have the following: >>>>>>>>>>>>> ????????? features = vdev->config->get_driver_features(vdev); >>>>>>>>>>>>> ????????? if (nla_put_u64_64bit(msg, >>>>>>>>>>>>> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features, >>>>>>>>>>>>> VDPA_ATTR_PAD)) >>>>>>>>>>>>> ????????????????? return -EMSGSIZE; >>>>>>>>>>>>> >>>>>>>>>>>>> Making call to .get_driver_features() doesn't make sense when >>>>>>>>>>>>> feature negotiation isn't complete. Neither should present >>>>>>>>>>>>> negotiated_features to userspace before negotiation is done. >>>>>>>>>>>>> >>>>>>>>>>>>> Similarly, max_vqp through vdpa_dev_net_mq_config_fill() >>>>>>>>>>>>> probably should not show before negotiation is done - it >>>>>>>>>>>>> depends on driver features negotiated. >>>>>>>>>>>> I have another patch in this series introduces device_features >>>>>>>>>>>> and will report device_features to the userspace even features >>>>>>>>>>>> negotiation not done. Because the spec says we should allow >>>>>>>>>>>> driver access the config space before FEATURES_OK. >>>>>>>>>>> The config space can be accessed by guest before features_ok >>>>>>>>>>> doesn't necessarily mean the value is valid. You may want to >>>>>>>>>>> double check with Michael for what he quoted earlier: >>>>>>>>>> that's why I proposed to fix these issues, e.g., if no _F_MAC, >>>>>>>>>> vDPA kernel should not return a mac to the userspace, there is >>>>>>>>>> not a default value for mac. >>>>>>>>> Then please show us the code, as I can only comment based on your >>>>>>>>> latest (v4) patch and it was not there.. To be honest, I don't >>>>>>>>> understand the motivation and the use cases you have, is it for >>>>>>>>> debugging/monitoring or there's really a use case for live >>>>>>>>> migration? For the former, you can do a direct dump on all config >>>>>>>>> space fields regardless of endianess and feature negotiation >>>>>>>>> without having to worry about validity (meaningful to present to >>>>>>>>> admin user). To me these are conflict asks that is impossible to >>>>>>>>> mix in exact one command. >>>>>>>> This bug just has been revealed two days, and you will see the >>>>>>>> patch soon. >>>>>>>> >>>>>>>> There are something to clarify: >>>>>>>> 1) we need to read the device features, or how can you pick a >>>>>>>> proper LM destination >>>>>> >>>>>> So it's probably not very efficient to use this, the manager layer >>>>>> should have the knowledge about the compatibility before doing >>>>>> migration other than try-and-fail. >>>>>> >>>>>> And it's the task of the management to gather the nodes whose >>>>>> devices >>>>>> could be live migrated to each other as something like "cluster" >>>>>> which we've already used in the case of cpuflags. >>>>>> >>>>>> 1) during node bootstrap, the capability of each node and devices >>>>>> was >>>>>> reported to management layer >>>>>> 2) management layer decide the cluster and make sure the migration >>>>>> can only done among the nodes insides the cluster >>>>>> 3) before migration, the vDPA needs to be provisioned on the >>>>>> destination >>>>>> >>>>>> >>>>>>>> 2) vdpa dev config show can show both device features and driver >>>>>>>> features, there just need a patch for iproute2 >>>>>>>> 3) To process information like MQ, we don't just dump the config >>>>>>>> space, MST has explained before >>>>>>> So, it's for live migration... Then why not export those config >>>>>>> parameters specified for vdpa creation (as well as device feature >>>>>>> bits) to the output of "vdpa dev show" command? That's where device >>>>>>> side config lives and is static across vdpa's life cycle. "vdpa dev >>>>>>> config show" is mostly for dynamic driver side config, and the >>>>>>> validity is subject to feature negotiation. I suppose this should >>>>>>> suit your need of LM, e.g. >>>>>> >>>>>> I think so. >>>>>> >>>>>> >>>>>>> $ vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 max_vqp 7 mtu >>>>>>> 2000 >>>>>>> $ vdpa dev show vdpa1 >>>>>>> vdpa1: type network mgmtdev pci/0000:41:04.2 vendor_id 5555 max_vqs >>>>>>> 15 max_vq_size 256 >>>>>>> ?? max_vqp 7 mtu 2000 >>>>>>> ?? dev_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS >>>>>>> CTRL_VQ MQ CTRL_MAC_ADDR VERSION_1 RING_PACKED >>>>>> >>>>>> Note that the mgmt should know this destination have those >>>>>> capability/features before the provisioning. >>>>> Yes, mgmt software should have to check the above from source. >>>> On destination mgmt software can run below to check vdpa mgmtdev's >>>> capability/features: >>>> >>>> $ vdpa mgmtdev show pci/0000:41:04.3 >>>> pci/0000:41:04.3: >>>> ??? supported_classes net >>>> ??? max_supported_vqs 257 >>>> ??? dev_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS >>>> CTRL_VQ >>>> MQ CTRL_MAC_ADDR VERSION_1 RING_PACKED >>> Right and this is probably better to be done at node bootstrapping for >>> the management to know about the cluster. >> Exactly. That's what mgmt software is supposed to do typically. > I think this could apply to both mgmt devices and vDPA devices: > 1)mgmt device, see whether the mgmt device is capable to create a vDPA > device with a certain feature bits, this is for LM > 2)vDPA device, report the device features, it is for normal operationCan you elaborate the use case "for normal operations"? Then it has nothing to do with LM for sure, correct? Noted for the LM case, just as Jason indicated, it's not even *required* for the mgmt software to gather the device features through "vdpa dev show" on source host *alive* right before live migration is started. Depending on the way how it is implemented, the mgmt software can well collect device capability on boot strap time, or may well save the vdpa device capability/config in persistent store ahead of time, say before any VM is to be launched. Then with all such info collected for each cluster node, mgmt software is able to get its own way to infer and sort out the live migration compatibility between nodes. I'm not sure which case you would need to check the device features, but in case you need it, it'd be better live in "vdpa dev show" than "vdpa dev config show". Thanks, -Siwei> > Thanks, > Zhu Lingshan >> >> Thanks, >> -Siwei >> >>> >>> Thanks >>> >>>>>> >>>>>>> For it to work, you'd want to pass "struct vdpa_dev_set_config" to >>>>>>> _vdpa_register_device() during registration, and get it saved there >>>>>>> in "struct vdpa_device". Then in vdpa_dev_fill() show each field >>>>>>> conditionally subject to "struct vdpa_dev_set_config.mask". >>>>>>> >>>>>>> Thanks, >>>>>>> -Siwei >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>>>> Thanks >>>>>>>> Zhu Lingshan >>>>>>>>>>>> Nope: >>>>>>>>>>>> >>>>>>>>>>>> 2.5.1? Driver Requirements: Device Configuration Space >>>>>>>>>>>> >>>>>>>>>>>> ... >>>>>>>>>>>> >>>>>>>>>>>> For optional configuration space fields, the driver MUST check >>>>>>>>>>>> that the corresponding feature is offered >>>>>>>>>>>> before accessing that part of the configuration space. >>>>>>>>>>> and how many driver bugs taking wrong assumption of the >>>>>>>>>>> validity >>>>>>>>>>> of config space field without features_ok. I am not sure what >>>>>>>>>>> use case you want to expose config resister values for before >>>>>>>>>>> features_ok, if it's mostly for live migration I guess it's >>>>>>>>>>> probably heading a wrong direction. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Last but not the least, this "vdpa dev config" command was >>>>>>>>>>>>> not >>>>>>>>>>>>> designed to display the real config space register values in >>>>>>>>>>>>> the first place. Quoting the vdpa-dev(8) man page: >>>>>>>>>>>>> >>>>>>>>>>>>>> vdpa dev config show - Show configuration of specific device >>>>>>>>>>>>>> or all devices. >>>>>>>>>>>>>> DEV - specifies the vdpa device to show its >>>>>>>>>>>>>> configuration. If >>>>>>>>>>>>>> this argument is omitted all devices configuration is >>>>>>>>>>>>>> listed. >>>>>>>>>>>>> It doesn't say anything about configuration space or register >>>>>>>>>>>>> values in config space. As long as it can convey the config >>>>>>>>>>>>> attribute when instantiating vDPA device instance, and more >>>>>>>>>>>>> importantly, the config can be easily imported from or >>>>>>>>>>>>> exported to userspace tools when trying to reconstruct vdpa >>>>>>>>>>>>> instance intact on destination host for live migration, IMHO >>>>>>>>>>>>> in my personal interpretation it doesn't matter what the >>>>>>>>>>>>> config space may present. It may be worth while adding a new >>>>>>>>>>>>> debug command to expose the real register value, but that's >>>>>>>>>>>>> another story. >>>>>>>>>>>> I am not sure getting your points. vDPA now reports device >>>>>>>>>>>> feature bits(device_features) and negotiated feature >>>>>>>>>>>> bits(driver_features), and yes, the drivers features can be a >>>>>>>>>>>> subset of the device features; and the vDPA device features >>>>>>>>>>>> can >>>>>>>>>>>> be a subset of the management device features. >>>>>>>>>>> What I said is after unblocking the conditional check, you'd >>>>>>>>>>> have to handle the case for each of the vdpa attribute when >>>>>>>>>>> feature negotiation is not yet done: basically the register >>>>>>>>>>> values you got from config space via the >>>>>>>>>>> vdpa_get_config_unlocked() call is not considered to be valid >>>>>>>>>>> before features_ok (per-spec). Although in some case you may >>>>>>>>>>> get >>>>>>>>>>> sane value, such behavior is generally undefined. If you desire >>>>>>>>>>> to show just the device_features alone without any config space >>>>>>>>>>> field, which the device had advertised *before feature >>>>>>>>>>> negotiation is complete*, that'll be fine. But looks to me this >>>>>>>>>>> is not how patch has been implemented. Probably need some more >>>>>>>>>>> work? >>>>>>>>>> They are driver_features(negotiated) and the >>>>>>>>>> device_features(which comes with the device), and the config >>>>>>>>>> space fields that depend on them. In this series, we report both >>>>>>>>>> to the userspace. >>>>>>>>> I fail to understand what you want to present from your >>>>>>>>> description. May be worth showing some example outputs that at >>>>>>>>> least include the following cases: 1) when device offers features >>>>>>>>> but not yet acknowledge by guest 2) when guest acknowledged >>>>>>>>> features and device is yet to accept 3) after guest feature >>>>>>>>> negotiation is completed (agreed upon between guest and device). >>>>>>>> Only two feature sets: 1) what the device has. (2) what is >>>>>>>> negotiated >>>>>>>>> Thanks, >>>>>>>>> -Siwei >>>>>>>>>>> Regards, >>>>>>>>>>> -Siwei >>>>>>>>>>> >>>>>>>>>>>>> Having said, please consider to drop the Fixes tag, as >>>>>>>>>>>>> appears >>>>>>>>>>>>> to me you're proposing a new feature rather than fixing a >>>>>>>>>>>>> real >>>>>>>>>>>>> issue. >>>>>>>>>>>> it's a new feature to report the device feature bits than only >>>>>>>>>>>> negotiated features, however this patch is a must, or it will >>>>>>>>>>>> block the device feature bits reporting. but I agree, the fix >>>>>>>>>>>> tag is not a must. >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> -Siwei >>>>>>>>>>>>> >>>>>>>>>>>>> On 7/1/2022 3:12 PM, Parav Pandit via Virtualization wrote: >>>>>>>>>>>>>>> From: Zhu Lingshan<lingshan.zhu at intel.com> >>>>>>>>>>>>>>> Sent: Friday, July 1, 2022 9:28 AM >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Users may want to query the config space of a vDPA device, >>>>>>>>>>>>>>> to choose a >>>>>>>>>>>>>>> appropriate one for a certain guest. This means the users >>>>>>>>>>>>>>> need to read the >>>>>>>>>>>>>>> config space before FEATURES_OK, and the existence of >>>>>>>>>>>>>>> config >>>>>>>>>>>>>>> space >>>>>>>>>>>>>>> contents does not depend on FEATURES_OK. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The spec says: >>>>>>>>>>>>>>> The device MUST allow reading of any device-specific >>>>>>>>>>>>>>> configuration field >>>>>>>>>>>>>>> before FEATURES_OK is set by the driver. This includes >>>>>>>>>>>>>>> fields which are >>>>>>>>>>>>>>> conditional on feature bits, as long as those feature bits >>>>>>>>>>>>>>> are offered by the >>>>>>>>>>>>>>> device. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fixes: 30ef7a8ac8a07 (vdpa: Read device configuration only >>>>>>>>>>>>>>> if FEATURES_OK) >>>>>>>>>>>>>> Fix is fine, but fixes tag needs correction described below. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Above commit id is 13 letters should be 12. >>>>>>>>>>>>>> And >>>>>>>>>>>>>> It should be in format >>>>>>>>>>>>>> Fixes: 30ef7a8ac8a0 ("vdpa: Read device configuration >>>>>>>>>>>>>> only if >>>>>>>>>>>>>> FEATURES_OK") >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please use checkpatch.pl script before posting the >>>>>>>>>>>>>> patches to >>>>>>>>>>>>>> catch these errors. >>>>>>>>>>>>>> There is a bot that looks at the fixes tag and identifies >>>>>>>>>>>>>> the >>>>>>>>>>>>>> right kernel version to apply this fix. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Signed-off-by: Zhu Lingshan<lingshan.zhu at intel.com> >>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>> ?? drivers/vdpa/vdpa.c | 8 -------- >>>>>>>>>>>>>>> ?? 1 file changed, 8 deletions(-) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c >>>>>>>>>>>>>>> index >>>>>>>>>>>>>>> 9b0e39b2f022..d76b22b2f7ae 100644 >>>>>>>>>>>>>>> --- a/drivers/vdpa/vdpa.c >>>>>>>>>>>>>>> +++ b/drivers/vdpa/vdpa.c >>>>>>>>>>>>>>> @@ -851,17 +851,9 @@ vdpa_dev_config_fill(struct >>>>>>>>>>>>>>> vdpa_device >>>>>>>>>>>>>>> *vdev, >>>>>>>>>>>>>>> struct sk_buff *msg, u32 portid,? { >>>>>>>>>>>>>>> ?????? u32 device_id; >>>>>>>>>>>>>>> ?????? void *hdr; >>>>>>>>>>>>>>> -??? u8 status; >>>>>>>>>>>>>>> ?????? int err; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> down_read(&vdev->cf_lock); >>>>>>>>>>>>>>> -??? status = vdev->config->get_status(vdev); >>>>>>>>>>>>>>> -??? if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { >>>>>>>>>>>>>>> -??????? NL_SET_ERR_MSG_MOD(extack, "Features >>>>>>>>>>>>>>> negotiation not >>>>>>>>>>>>>>> completed"); >>>>>>>>>>>>>>> -??????? err = -EAGAIN; >>>>>>>>>>>>>>> -??????? goto out; >>>>>>>>>>>>>>> -??? } >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> ?????? hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, >>>>>>>>>>>>>>> flags, >>>>>>>>>>>>>>> VDPA_CMD_DEV_CONFIG_GET); >>>>>>>>>>>>>>> ?????? if (!hdr) { >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> 2.31.1 >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Virtualization mailing list >>>>>>>>>>>>>> Virtualization at lists.linux-foundation.org >>>>>>>>>>>>>> https://urldefense.com/v3/__https://lists.linuxfoundation.org/mailman/listinfo/virtualization__;!!ACWV5N9M2RV99hQ!NzOv5Ew_Z2CP-zHyD7RsUoStLZ54KpB21QyuZ8L63YVPLEGDEwvcOSDlIGxQPHY-DMkOa9sKKZdBSaNknMU$ >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >> >