Jason Wang
2021-Mar-02 04:25 UTC
[PATCH linux-next 2/9] vdpa: Introduce query of device config layout
On 2021/3/1 8:06 ??, Sean Mooney wrote:> On Mon, 2021-03-01 at 11:28 +0100, Adrian Moreno wrote: >> On 3/1/21 8:50 AM, Jason Wang wrote: >>> On 2021/3/1 3:29 ??, Parav Pandit wrote: >>>>> From: Jason Wang <jasowang at redhat.com> >>>>> Sent: Monday, March 1, 2021 9:29 AM >>>>> >>>>> On 2021/2/26 4:50 ??, Parav Pandit wrote: >>>>>>> From: Jason Wang <jasowang at redhat.com> >>>>>>> Sent: Friday, February 26, 2021 1:56 PM >>>>>>> >>>>>>> >>>>>>> On 2021/2/26 1:32 ??, Parav Pandit wrote: >>>>>>>>> From: Jason Wang <jasowang at redhat.com> >>>>>>>>> Sent: Wednesday, February 24, 2021 2:18 PM >>>>>>>>> >>>>>>>>> On 2021/2/24 2:18 ??, Parav Pandit wrote: >>>>>>>>>> + >>>>>>>>>> +??? if (nla_put_u16(msg, >>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP, >>>>>>>>>> +??????????? config->max_virtqueue_pairs)) >>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>> We probably need to check VIRTIO_NET_F_RSS here. >>>>>>>> Yes. Will use it in v2. >>>>>>>> >>>>>>>>>> +??? if (nla_put_u8(msg, >>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, >>>>>>>>>> +?????????????? config->rss_max_key_size)) >>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>> + >>>>>>>>>> +??? rss_field = le16_to_cpu(config->rss_max_key_size); >>>>>>>>>> +??? if (nla_put_u16(msg, >>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, >>>>>>>>> rss_field)) >>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>> + >>>>>>>>>> +??? hash_types = le32_to_cpu(config->supported_hash_types); >>>>>>>>>> +??? if (nla_put_u32(msg, >>>>> VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, >>>>>>>>>> +??????????? config->supported_hash_types)) >>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>> +??? return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, >>>>>>>>>> +struct sk_buff *msg) { >>>>>>>>>> +??? struct virtio_net_config config = {}; >>>>>>>>>> + >>>>>>>>>> +??? vdev->config->get_config(vdev, 0, &config, sizeof(config)); >>>>>>>>> Do we need to sync with other possible get_config calls here? >>>>>>>> To readers of get_config() is ok. No need to sync. >>>>>>>> However, I think set_config() and get_config() should sync otherwise >>>>>>> get_config can get partial value. >>>>>>>> Will probably have to add vdpa_device->config_access_lock(). >>>>>>> Probably. And the set_config() should be synchronized with the >>>>>>> requrest that comes from vDPA bus. >>>>>> Yes, a rw semaphore is good enough. >>>>>> Device config fields can be change from the device side anyway, such as >>>>> link status anyway. >>>>>> Sync using lock shouldn?t be problem. >>>>>> >>>>>>> This makes me think whether we should go back to two phases method, >>>>>>> create and enable. >>>>>>> >>>>>>> The vDPA device is only registred after enabling, then there's no >>>>>>> need to sync with vDPA bus, and mgmt is not allowed to change config >>>>> after enalbing? >>>>>> In that case we should be able to disable it as well. Disable should take the >>>>> device of the bus. >>>>>> I find it weird to have this model to linger around the device without >>>>> anchoring on the bus. >>>>>> For example device is not yet enabled, so it is not anchored on the bus, but >>>>> its name is still have to remain unique. >>>>> >>>>> >>>>> Can we do some synchornization between vdpa device allocation and vdpa >>>>> device registier to solve the naming issue? >>>> mgmt tool querying the device config space after vdpa device is in use is real. >>>> So I don't see solving it any differently. >>>> >>>> Now upper layers of vdpa to not access the device on the placed on the vdpa >>>> bus, can be taken care by existing driver code using >>>> echo 0 > /sys/bus/vdpa/drivers_autoprobe >>>> >>>> By default vdpa device placed on the bus wont bind to any driver (net/vhost etc). >>>> 1. Decision to bind to which driver is done after config of the vdpa device is >>>> done. >>>> 2. orchestration sw does create and config before it binds to the right driver >>>> 3. which driver to bind to decided based on the use case of that individual >>>> vdpa device >>>> For example, >>>> (a) vdpa0 binds to net driver to have Netdev for container >>>> (b) vdpa1 binds to vhost driver to map vdpa device to VM >>> >>> I think it should work. >>>> Adding Adrian, Maxime and Sean for more comments. >> Is this a strong requirement or just a optional operation?So my understanding is that something needs to be done to make sure: 1) the vpda device is not bound to virtio-vdpa 2) config is not changed if it has been set by management>> This might break some existing logic that expects netdev or vhost-vdpa device to >> exist before the some config (e.g:mac address) is set. > the orcestration software should not require permission to do dirver binding and in the case of openstack will > not have the capablity to bind the vdpa device to a dirvier in the current proposed design. > the ordering we are expecting to support in openstack at least initally is that the adminsitrator > will use either systemd service files or udev rules to allocate the VF and vdpa device staictly at boot. > > in the case of openstack we will expect all vdpa devies that will be used by openstack(nova) for assignment > to vms will be prebound by the adminstrators to vhost-vdpa. openstack will simply track the /dev/vhost* paths > corallated to the vdpa devices whose parent VF are allowed by the pci passthough whitelist. > > if the adminstrator binds the vdpa device to virtio-vdpa so that virtio-net-pci is used on the host nova and libvirt > will not correct that and the vm boot will fail. but provided you do not list that parent VF in the whitelist nova > will also ignore it. so if you want to mix and match the vdpa device driver that should not break anything provided > you do not list its parent in the pci whitelist. > > but one thing to keep in mind is whlie we are not planning to create VF/VDPA devices dynamiclaly at lesat not for the initall > implentation we do expact to be able to configure attibute of the vdpa device after tehy are create like any other nic. > for example the mac, mtu, link-state should all be configureable at runtime via normal ip route2 commands.So in the case of vDPA, this task is expected to be done by vdpa tool. (There's no guarantee that vDPA is a VF).> > at the morment we are encounterng issue with the mac adress specificly which qemu is hopefully going to work around since > libvirt and nova cannot currently configure the mac address of a vhost-vdpa device so we are relying on qemu > to take the mac adress form the qemu command line so that it does not end up as all 0 macs in the guest.Note that a kernel patch is posted by Eli[1] which assign a randomized mac. This should solve the issue in another way.> as some one that works on orcestration level i would have expect that setting the mac/mtu on either the vf or the netdev > representor to match the deireed mac/mtu in the guest would have been sufficent to alter the vdpa device confirutation. > likely the netdev representor would make the most sence to alter to match the desired mac so that the mac adress see on the ovs > bridge matches the mac of the vm.Speaking for genreal vDPA, it's not guaranteed that the vendor will implmenet the device with a representor. So vDPA tool is expected to be the only way to configure the device. Thanks [1] https://www.spinics.net/lists/linux-virtualization/msg48451.html> the current case where the vdpa representor netdev seams to have no bareing on the vdpa device > config makes debugging much more difficalt and is likely to confuse monitoring software. fortunetly openstack is not matching on the mac > address in this spefic case to identify the netdev represntor port but we have other backend that do look at the mac. in the case of ovs > we store the uuid of the neutron port in the ovsdb when adding a port to ovs so we rely on that not the mac but for legacy mode > sriov which we hope to support with vdpa eventulally after we comple the hardware offloaed ovs support, currenlty relys on the mac in some cases. > > i hope that answered your questions? > we are aware that we likely will have to support vdpa device creation at some point if we intend to add support for vdpa devices > created form subfunctions or intels siov howver that was not ready when we started this work so it was declared out of scope and > the curernt static approch was taken as the only path we could enable before code free in 10 days. > > regards sean. > >> >> Thanks >> > > https://www.spinics.net/lists/linux-virtualization/msg48451.html
Adrian Moreno
2021-Mar-03 09:24 UTC
[PATCH linux-next 2/9] vdpa: Introduce query of device config layout
On 3/2/21 5:25 AM, Jason Wang wrote:> > On 2021/3/1 8:06 ??, Sean Mooney wrote: >> On Mon, 2021-03-01 at 11:28 +0100, Adrian Moreno wrote: >>> On 3/1/21 8:50 AM, Jason Wang wrote: >>>> On 2021/3/1 3:29 ??, Parav Pandit wrote: >>>>>> From: Jason Wang <jasowang at redhat.com> >>>>>> Sent: Monday, March 1, 2021 9:29 AM >>>>>> >>>>>> On 2021/2/26 4:50 ??, Parav Pandit wrote: >>>>>>>> From: Jason Wang <jasowang at redhat.com> >>>>>>>> Sent: Friday, February 26, 2021 1:56 PM >>>>>>>> >>>>>>>> >>>>>>>> On 2021/2/26 1:32 ??, Parav Pandit wrote: >>>>>>>>>> From: Jason Wang <jasowang at redhat.com> >>>>>>>>>> Sent: Wednesday, February 24, 2021 2:18 PM >>>>>>>>>> >>>>>>>>>> On 2021/2/24 2:18 ??, Parav Pandit wrote: >>>>>>>>>>> + >>>>>>>>>>> +??? if (nla_put_u16(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP, >>>>>>>>>>> +??????????? config->max_virtqueue_pairs)) >>>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>> We probably need to check VIRTIO_NET_F_RSS here. >>>>>>>>> Yes. Will use it in v2. >>>>>>>>> >>>>>>>>>>> +??? if (nla_put_u8(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN, >>>>>>>>>>> +?????????????? config->rss_max_key_size)) >>>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>>> + >>>>>>>>>>> +??? rss_field = le16_to_cpu(config->rss_max_key_size); >>>>>>>>>>> +??? if (nla_put_u16(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN, >>>>>>>>>> rss_field)) >>>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>>> + >>>>>>>>>>> +??? hash_types = le32_to_cpu(config->supported_hash_types); >>>>>>>>>>> +??? if (nla_put_u32(msg, >>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES, >>>>>>>>>>> +??????????? config->supported_hash_types)) >>>>>>>>>>> +??????? return -EMSGSIZE; >>>>>>>>>>> +??? return 0; >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> +static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, >>>>>>>>>>> +struct sk_buff *msg) { >>>>>>>>>>> +??? struct virtio_net_config config = {}; >>>>>>>>>>> + >>>>>>>>>>> +??? vdev->config->get_config(vdev, 0, &config, sizeof(config)); >>>>>>>>>> Do we need to sync with other possible get_config calls here? >>>>>>>>> To readers of get_config() is ok. No need to sync. >>>>>>>>> However, I think set_config() and get_config() should sync otherwise >>>>>>>> get_config can get partial value. >>>>>>>>> Will probably have to add vdpa_device->config_access_lock(). >>>>>>>> Probably. And the set_config() should be synchronized with the >>>>>>>> requrest that comes from vDPA bus. >>>>>>> Yes, a rw semaphore is good enough. >>>>>>> Device config fields can be change from the device side anyway, such as >>>>>> link status anyway. >>>>>>> Sync using lock shouldn?t be problem. >>>>>>> >>>>>>>> This makes me think whether we should go back to two phases method, >>>>>>>> create and enable. >>>>>>>> >>>>>>>> The vDPA device is only registred after enabling, then there's no >>>>>>>> need to sync with vDPA bus, and mgmt is not allowed to change config >>>>>> after enalbing? >>>>>>> In that case we should be able to disable it as well. Disable should take >>>>>>> the >>>>>> device of the bus. >>>>>>> I find it weird to have this model to linger around the device without >>>>>> anchoring on the bus. >>>>>>> For example device is not yet enabled, so it is not anchored on the bus, but >>>>>> its name is still have to remain unique. >>>>>> >>>>>> >>>>>> Can we do some synchornization between vdpa device allocation and vdpa >>>>>> device registier to solve the naming issue? >>>>> mgmt tool querying the device config space after vdpa device is in use is >>>>> real. >>>>> So I don't see solving it any differently. >>>>> >>>>> Now upper layers of vdpa to not access the device on the placed on the vdpa >>>>> bus, can be taken care by existing driver code using >>>>> echo 0 > /sys/bus/vdpa/drivers_autoprobe >>>>> >>>>> By default vdpa device placed on the bus wont bind to any driver (net/vhost >>>>> etc). >>>>> 1. Decision to bind to which driver is done after config of the vdpa device is >>>>> done. >>>>> 2. orchestration sw does create and config before it binds to the right driver >>>>> 3. which driver to bind to decided based on the use case of that individual >>>>> vdpa device >>>>> For example, >>>>> (a) vdpa0 binds to net driver to have Netdev for container >>>>> (b) vdpa1 binds to vhost driver to map vdpa device to VM >>>> >>>> I think it should work. >>>>> Adding Adrian, Maxime and Sean for more comments. >>> Is this a strong requirement or just a optional operation? > > > So my understanding is that something needs to be done to make sure: >> 1) the vpda device is not bound to virtio-vdpaThis is a problem of api not being unified, right? If vdpa device is bound tovirtio-vdpa I would expect some net-specific options (like mac or tso) would be modifiable through net_device_ops if the needed capabilities are held.> 2) config is not changed if it has been set by management >For virtio-vdpa (virtio-net) devices, that would be controlled via CAP_NET_ADMIN.For vhost-vdpa case, are you thinking of a way of stopping a user from sending VHOST_VDPA_SET_CONFIG messages? or making it fail if management has made some config at the vdpa level?> >>> This might break some existing logic that expects netdev or vhost-vdpa device to >>> exist before the some config (e.g:mac address) is set. >> the orcestration software should not require permission to do dirver binding >> and in the case of openstack will >> not have the capablity to bind the vdpa device to a dirvier in the current >> proposed design. >> the ordering we are expecting to support in openstack at least initally is >> that the adminsitrator >> will use either systemd service files or udev rules to allocate the VF and >> vdpa device staictly at boot. >> >> in the case of openstack we will expect all vdpa devies that will be used by >> openstack(nova) for assignment >> to vms will be prebound by the adminstrators to vhost-vdpa. openstack will >> simply track the /dev/vhost* paths >> corallated to the vdpa devices whose parent VF are allowed by the pci >> passthough whitelist. >> >> if the adminstrator binds the vdpa device to virtio-vdpa so that >> virtio-net-pci is used on the host nova and libvirt >> will not correct that and the vm boot will fail. but provided you do not list >> that parent VF in the whitelist nova >> will also ignore it. so if you want to mix and match the vdpa device driver >> that should not break anything provided >> you do not list its parent in the pci whitelist. >> >> but one thing to keep in mind is whlie we are not planning to create VF/VDPA >> devices dynamiclaly at lesat not for the initall >> implentation we do expact to be able to configure attibute of the vdpa device >> after tehy are create like any other nic. >> for example the mac, mtu, link-state should all be configureable at runtime >> via normal ip route2 commands. > > > So in the case of vDPA, this task is expected to be done by vdpa tool. (There's > no guarantee that vDPA is a VF). > > >> >> at the morment we are encounterng issue with the mac adress specificly which >> qemu is hopefully going to work around since >> libvirt and nova cannot currently configure the mac address of a vhost-vdpa >> device so we are relying on qemu >> to take the mac adress form the qemu command line so that it does not end up >> as all 0 macs in the guest. > > > Note that a kernel patch is posted by Eli[1] which assign a randomized mac. This > should solve the issue in another way. > > >> as some one that works on orcestration level i would have expect that setting >> the mac/mtu on either the vf or the netdev >> representor to match the deireed mac/mtu in the guest would have been >> sufficent to alter the vdpa device confirutation. >> likely? the netdev representor would make the most sence to alter to match the >> desired mac so that the mac adress see on the ovs >> bridge matches the mac of the vm. > > > Speaking for genreal vDPA, it's not guaranteed that the vendor will implmenet > the device with a representor. So vDPA tool is expected to be the only way to > configure the device. > > Thanks > > [1] https://www.spinics.net/lists/linux-virtualization/msg48451.html > > >> the current case where the vdpa representor netdev seams to have no bareing on >> the vdpa device >> config makes debugging much more difficalt and is likely to confuse monitoring >> software. fortunetly openstack is not matching on the mac >> address in this spefic case to identify the netdev represntor port but we have >> other backend that do look at the mac. in the case of ovs >> we store the uuid of the neutron port in the ovsdb when adding a port to ovs >> so we rely on that not the mac but? for legacy mode >> sriov which we hope to support with vdpa eventulally after we comple the >> hardware offloaed ovs support, currenlty relys on the mac in some cases. >> >> i hope that answered your questions? >> we are aware that we likely will have to support vdpa device creation at some >> point if we intend to add support for vdpa devices >> created form subfunctions or intels siov howver that was not ready when we >> started this work so it was declared out of scope and >> the curernt static approch was taken as the only path we could enable before >> code free in 10 days. >> >> regards sean. >> >>> >>> Thanks >>> >> >> https://www.spinics.net/lists/linux-virtualization/msg48451.html >-- Adri?n Moreno