thr3ads.net - Virtualization - [PATCH linux-next 2/9] vdpa: Introduce query of device config layout [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2021-Mar-02 04:25 UTC

[PATCH linux-next 2/9] vdpa: Introduce query of device config layout

On 2021/3/1 8:06 ??, Sean Mooney wrote:> On Mon, 2021-03-01 at 11:28 +0100, Adrian Moreno wrote:
>> On 3/1/21 8:50 AM, Jason Wang wrote:
>>> On 2021/3/1 3:29 ??, Parav Pandit wrote:
>>>>> From: Jason Wang <jasowang at redhat.com>
>>>>> Sent: Monday, March 1, 2021 9:29 AM
>>>>>
>>>>> On 2021/2/26 4:50 ??, Parav Pandit wrote:
>>>>>>> From: Jason Wang <jasowang at redhat.com>
>>>>>>> Sent: Friday, February 26, 2021 1:56 PM
>>>>>>>
>>>>>>>
>>>>>>> On 2021/2/26 1:32 ??, Parav Pandit wrote:
>>>>>>>>> From: Jason Wang <jasowang at
redhat.com>
>>>>>>>>> Sent: Wednesday, February 24, 2021 2:18 PM
>>>>>>>>>
>>>>>>>>> On 2021/2/24 2:18 ??, Parav Pandit wrote:
>>>>>>>>>> +
>>>>>>>>>> +??? if (nla_put_u16(msg,
>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>>>> +???????????
config->max_virtqueue_pairs))
>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>> We probably need to check VIRTIO_NET_F_RSS
here.
>>>>>>>> Yes. Will use it in v2.
>>>>>>>>
>>>>>>>>>> +??? if (nla_put_u8(msg,
>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN,
>>>>>>>>>> +??????????????
config->rss_max_key_size))
>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>> +
>>>>>>>>>> +??? rss_field =
le16_to_cpu(config->rss_max_key_size);
>>>>>>>>>> +??? if (nla_put_u16(msg,
>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN,
>>>>>>>>> rss_field))
>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>> +
>>>>>>>>>> +??? hash_types =
le32_to_cpu(config->supported_hash_types);
>>>>>>>>>> +??? if (nla_put_u32(msg,
>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES,
>>>>>>>>>> +???????????
config->supported_hash_types))
>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>> +??? return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int
vdpa_dev_net_config_fill(struct vdpa_device *vdev,
>>>>>>>>>> +struct sk_buff *msg) {
>>>>>>>>>> +??? struct virtio_net_config config =
{};
>>>>>>>>>> +
>>>>>>>>>> +???
vdev->config->get_config(vdev, 0, &config, sizeof(config));
>>>>>>>>> Do we need to sync with other possible
get_config calls here?
>>>>>>>> To readers of get_config() is ok. No need to
sync.
>>>>>>>> However, I think set_config() and get_config()
should sync otherwise
>>>>>>> get_config can get partial value.
>>>>>>>> Will probably have to add
vdpa_device->config_access_lock().
>>>>>>> Probably. And the set_config() should be
synchronized with the
>>>>>>> requrest that comes from vDPA bus.
>>>>>> Yes, a rw semaphore is good enough.
>>>>>> Device config fields can be change from the device side
anyway, such as
>>>>> link status anyway.
>>>>>> Sync using lock shouldn?t be problem.
>>>>>>
>>>>>>> This makes me think whether we should go back to
two phases method,
>>>>>>> create and enable.
>>>>>>>
>>>>>>> The vDPA device is only registred after enabling,
then there's no
>>>>>>> need to sync with vDPA bus, and mgmt is not allowed
to change config
>>>>> after enalbing?
>>>>>> In that case we should be able to disable it as well.
Disable should take the
>>>>> device of the bus.
>>>>>> I find it weird to have this model to linger around the
device without
>>>>> anchoring on the bus.
>>>>>> For example device is not yet enabled, so it is not
anchored on the bus, but
>>>>> its name is still have to remain unique.
>>>>>
>>>>>
>>>>> Can we do some synchornization between vdpa device
allocation and vdpa
>>>>> device registier to solve the naming issue?
>>>> mgmt tool querying the device config space after vdpa device is
in use is real.
>>>> So I don't see solving it any differently.
>>>>
>>>> Now upper layers of vdpa to not access the device on the placed
on the vdpa
>>>> bus, can be taken care by existing driver code using
>>>> echo 0 > /sys/bus/vdpa/drivers_autoprobe
>>>>
>>>> By default vdpa device placed on the bus wont bind to any
driver (net/vhost etc).
>>>> 1. Decision to bind to which driver is done after config of the
vdpa device is
>>>> done.
>>>> 2. orchestration sw does create and config before it binds to
the right driver
>>>> 3. which driver to bind to decided based on the use case of
that individual
>>>> vdpa device
>>>> For example,
>>>> (a) vdpa0 binds to net driver to have Netdev for container
>>>> (b) vdpa1 binds to vhost driver to map vdpa device to VM
>>>
>>> I think it should work.
>>>> Adding Adrian, Maxime and Sean for more comments.
>> Is this a strong requirement or just a optional operation?

So my understanding is that something needs to be done to make sure:

1) the vpda device is not bound to virtio-vdpa
2) config is not changed if it has been set by management

>> This might break some existing logic that expects netdev or vhost-vdpa
device to
>> exist before the some config (e.g:mac address) is set.
> the orcestration software should not require permission to do dirver
binding and in the case of openstack will
> not have the capablity to bind the vdpa device to a dirvier in the current
proposed design.
> the ordering we are expecting to support in openstack at least initally is
that the adminsitrator
> will use either systemd service files or udev rules to allocate the VF and
vdpa device staictly at boot.
>
> in the case of openstack we will expect all vdpa devies that will be used
by openstack(nova) for assignment
> to vms will be prebound by the adminstrators to vhost-vdpa. openstack will
simply track the /dev/vhost* paths
> corallated to the vdpa devices whose parent VF are allowed by the pci
passthough whitelist.
>
> if the adminstrator binds the vdpa device to virtio-vdpa so that
virtio-net-pci is used on the host nova and libvirt
> will not correct that and the vm boot will fail. but provided you do not
list that parent VF in the whitelist nova
> will also ignore it. so if you want to mix and match the vdpa device driver
that should not break anything provided
> you do not list its parent in the pci whitelist.
>
> but one thing to keep in mind is whlie we are not planning to create
VF/VDPA devices dynamiclaly at lesat not for the initall
> implentation we do expact to be able to configure attibute of the vdpa
device after tehy are create like any other nic.
> for example the mac, mtu, link-state should all be configureable at runtime
via normal ip route2 commands.

So in the case of vDPA, this task is expected to be done by vdpa tool. 
(There's no guarantee that vDPA is a VF).

>
> at the morment we are encounterng issue with the mac adress specificly
which qemu is hopefully going to work around since
> libvirt and nova cannot currently configure the mac address of a vhost-vdpa
device so we are relying on qemu
> to take the mac adress form the qemu command line so that it does not end
up as all 0 macs in the guest.

Note that a kernel patch is posted by Eli[1] which assign a randomized 
mac. This should solve the issue in another way.

> as some one that works on orcestration level i would have expect that
setting the mac/mtu on either the vf or the netdev
> representor to match the deireed mac/mtu in the guest would have been
sufficent to alter the vdpa device confirutation.
> likely  the netdev representor would make the most sence to alter to match
the desired mac so that the mac adress see on the ovs
> bridge matches the mac of the vm.

Speaking for genreal vDPA, it's not guaranteed that the vendor will 
implmenet the device with a representor. So vDPA tool is expected to be 
the only way to configure the device.

Thanks

[1] https://www.spinics.net/lists/linux-virtualization/msg48451.html

> the current case where the vdpa representor netdev seams to have no bareing
on the vdpa device
> config makes debugging much more difficalt and is likely to confuse
monitoring software. fortunetly openstack is not matching on the mac
> address in this spefic case to identify the netdev represntor port but we
have other backend that do look at the mac. in the case of ovs
> we store the uuid of the neutron port in the ovsdb when adding a port to
ovs so we rely on that not the mac but  for legacy mode
> sriov which we hope to support with vdpa eventulally after we comple the
hardware offloaed ovs support, currenlty relys on the mac in some cases.
>
> i hope that answered your questions?
> we are aware that we likely will have to support vdpa device creation at
some point if we intend to add support for vdpa devices
> created form subfunctions or intels siov howver that was not ready when we
started this work so it was declared out of scope and
> the curernt static approch was taken as the only path we could enable
before code free in 10 days.
>
> regards sean.
>
>>
>> Thanks
>>
>
> https://www.spinics.net/lists/linux-virtualization/msg48451.html

Adrian Moreno

2021-Mar-03 09:24 UTC

head link

[PATCH linux-next 2/9] vdpa: Introduce query of device config layout

On 3/2/21 5:25 AM, Jason Wang wrote:> 
> On 2021/3/1 8:06 ??, Sean Mooney wrote:
>> On Mon, 2021-03-01 at 11:28 +0100, Adrian Moreno wrote:
>>> On 3/1/21 8:50 AM, Jason Wang wrote:
>>>> On 2021/3/1 3:29 ??, Parav Pandit wrote:
>>>>>> From: Jason Wang <jasowang at redhat.com>
>>>>>> Sent: Monday, March 1, 2021 9:29 AM
>>>>>>
>>>>>> On 2021/2/26 4:50 ??, Parav Pandit wrote:
>>>>>>>> From: Jason Wang <jasowang at redhat.com>
>>>>>>>> Sent: Friday, February 26, 2021 1:56 PM
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2021/2/26 1:32 ??, Parav Pandit wrote:
>>>>>>>>>> From: Jason Wang <jasowang at
redhat.com>
>>>>>>>>>> Sent: Wednesday, February 24, 2021 2:18
PM
>>>>>>>>>>
>>>>>>>>>> On 2021/2/24 2:18 ??, Parav Pandit
wrote:
>>>>>>>>>>> +
>>>>>>>>>>> +??? if (nla_put_u16(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
>>>>>>>>>>> +???????????
config->max_virtqueue_pairs))
>>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>> We probably need to check
VIRTIO_NET_F_RSS here.
>>>>>>>>> Yes. Will use it in v2.
>>>>>>>>>
>>>>>>>>>>> +??? if (nla_put_u8(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_KEY_LEN,
>>>>>>>>>>> +??????????????
config->rss_max_key_size))
>>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>>> +
>>>>>>>>>>> +??? rss_field =
le16_to_cpu(config->rss_max_key_size);
>>>>>>>>>>> +??? if (nla_put_u16(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_MAX_IT_LEN,
>>>>>>>>>> rss_field))
>>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>>> +
>>>>>>>>>>> +??? hash_types =
le32_to_cpu(config->supported_hash_types);
>>>>>>>>>>> +??? if (nla_put_u32(msg,
>>>>>> VDPA_ATTR_DEV_NET_CFG_RSS_HASH_TYPES,
>>>>>>>>>>> +???????????
config->supported_hash_types))
>>>>>>>>>>> +??????? return -EMSGSIZE;
>>>>>>>>>>> +??? return 0;
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>> +static int
vdpa_dev_net_config_fill(struct vdpa_device *vdev,
>>>>>>>>>>> +struct sk_buff *msg) {
>>>>>>>>>>> +??? struct virtio_net_config
config = {};
>>>>>>>>>>> +
>>>>>>>>>>> +???
vdev->config->get_config(vdev, 0, &config, sizeof(config));
>>>>>>>>>> Do we need to sync with other possible
get_config calls here?
>>>>>>>>> To readers of get_config() is ok. No need
to sync.
>>>>>>>>> However, I think set_config() and
get_config() should sync otherwise
>>>>>>>> get_config can get partial value.
>>>>>>>>> Will probably have to add
vdpa_device->config_access_lock().
>>>>>>>> Probably. And the set_config() should be
synchronized with the
>>>>>>>> requrest that comes from vDPA bus.
>>>>>>> Yes, a rw semaphore is good enough.
>>>>>>> Device config fields can be change from the device
side anyway, such as
>>>>>> link status anyway.
>>>>>>> Sync using lock shouldn?t be problem.
>>>>>>>
>>>>>>>> This makes me think whether we should go back
to two phases method,
>>>>>>>> create and enable.
>>>>>>>>
>>>>>>>> The vDPA device is only registred after
enabling, then there's no
>>>>>>>> need to sync with vDPA bus, and mgmt is not
allowed to change config
>>>>>> after enalbing?
>>>>>>> In that case we should be able to disable it as
well. Disable should take
>>>>>>> the
>>>>>> device of the bus.
>>>>>>> I find it weird to have this model to linger around
the device without
>>>>>> anchoring on the bus.
>>>>>>> For example device is not yet enabled, so it is not
anchored on the bus, but
>>>>>> its name is still have to remain unique.
>>>>>>
>>>>>>
>>>>>> Can we do some synchornization between vdpa device
allocation and vdpa
>>>>>> device registier to solve the naming issue?
>>>>> mgmt tool querying the device config space after vdpa
device is in use is
>>>>> real.
>>>>> So I don't see solving it any differently.
>>>>>
>>>>> Now upper layers of vdpa to not access the device on the
placed on the vdpa
>>>>> bus, can be taken care by existing driver code using
>>>>> echo 0 > /sys/bus/vdpa/drivers_autoprobe
>>>>>
>>>>> By default vdpa device placed on the bus wont bind to any
driver (net/vhost
>>>>> etc).
>>>>> 1. Decision to bind to which driver is done after config of
the vdpa device is
>>>>> done.
>>>>> 2. orchestration sw does create and config before it binds
to the right driver
>>>>> 3. which driver to bind to decided based on the use case of
that individual
>>>>> vdpa device
>>>>> For example,
>>>>> (a) vdpa0 binds to net driver to have Netdev for container
>>>>> (b) vdpa1 binds to vhost driver to map vdpa device to VM
>>>>
>>>> I think it should work.
>>>>> Adding Adrian, Maxime and Sean for more comments.
>>> Is this a strong requirement or just a optional operation?
> 
> 
> So my understanding is that something needs to be done to make sure:
> 
> 1) the vpda device is not bound to virtio-vdpaThis is a problem of api not
being unified, right? If vdpa device is bound tovirtio-vdpa I would expect some net-specific options (like mac or tso) would be
modifiable through net_device_ops if the needed capabilities are held.
> 2) config is not changed if it has been set by management
>For virtio-vdpa (virtio-net) devices, that would be controlled via
CAP_NET_ADMIN.For vhost-vdpa case, are you thinking of a way of stopping a user from sending
VHOST_VDPA_SET_CONFIG messages? or making it fail if management has made some
config at the vdpa level?
> 
>>> This might break some existing logic that expects netdev or
vhost-vdpa device to
>>> exist before the some config (e.g:mac address) is set.
>> the orcestration software should not require permission to do dirver
binding
>> and in the case of openstack will
>> not have the capablity to bind the vdpa device to a dirvier in the
current
>> proposed design.
>> the ordering we are expecting to support in openstack at least initally
is
>> that the adminsitrator
>> will use either systemd service files or udev rules to allocate the VF
and
>> vdpa device staictly at boot.
>>
>> in the case of openstack we will expect all vdpa devies that will be
used by
>> openstack(nova) for assignment
>> to vms will be prebound by the adminstrators to vhost-vdpa. openstack
will
>> simply track the /dev/vhost* paths
>> corallated to the vdpa devices whose parent VF are allowed by the pci
>> passthough whitelist.
>>
>> if the adminstrator binds the vdpa device to virtio-vdpa so that
>> virtio-net-pci is used on the host nova and libvirt
>> will not correct that and the vm boot will fail. but provided you do
not list
>> that parent VF in the whitelist nova
>> will also ignore it. so if you want to mix and match the vdpa device
driver
>> that should not break anything provided
>> you do not list its parent in the pci whitelist.
>>
>> but one thing to keep in mind is whlie we are not planning to create
VF/VDPA
>> devices dynamiclaly at lesat not for the initall
>> implentation we do expact to be able to configure attibute of the vdpa
device
>> after tehy are create like any other nic.
>> for example the mac, mtu, link-state should all be configureable at
runtime
>> via normal ip route2 commands.
> 
> 
> So in the case of vDPA, this task is expected to be done by vdpa tool.
(There's
> no guarantee that vDPA is a VF).
> 
> 
>>
>> at the morment we are encounterng issue with the mac adress specificly
which
>> qemu is hopefully going to work around since
>> libvirt and nova cannot currently configure the mac address of a
vhost-vdpa
>> device so we are relying on qemu
>> to take the mac adress form the qemu command line so that it does not
end up
>> as all 0 macs in the guest.
> 
> 
> Note that a kernel patch is posted by Eli[1] which assign a randomized mac.
This
> should solve the issue in another way.
> 
> 
>> as some one that works on orcestration level i would have expect that
setting
>> the mac/mtu on either the vf or the netdev
>> representor to match the deireed mac/mtu in the guest would have been
>> sufficent to alter the vdpa device confirutation.
>> likely? the netdev representor would make the most sence to alter to
match the
>> desired mac so that the mac adress see on the ovs
>> bridge matches the mac of the vm.
> 
> 
> Speaking for genreal vDPA, it's not guaranteed that the vendor will
implmenet
> the device with a representor. So vDPA tool is expected to be the only way
to
> configure the device.
> 
> Thanks
> 
> [1] https://www.spinics.net/lists/linux-virtualization/msg48451.html
> 
> 
>> the current case where the vdpa representor netdev seams to have no
bareing on
>> the vdpa device
>> config makes debugging much more difficalt and is likely to confuse
monitoring
>> software. fortunetly openstack is not matching on the mac
>> address in this spefic case to identify the netdev represntor port but
we have
>> other backend that do look at the mac. in the case of ovs
>> we store the uuid of the neutron port in the ovsdb when adding a port
to ovs
>> so we rely on that not the mac but? for legacy mode
>> sriov which we hope to support with vdpa eventulally after we comple
the
>> hardware offloaed ovs support, currenlty relys on the mac in some
cases.
>>
>> i hope that answered your questions?
>> we are aware that we likely will have to support vdpa device creation
at some
>> point if we intend to add support for vdpa devices
>> created form subfunctions or intels siov howver that was not ready when
we
>> started this work so it was declared out of scope and
>> the curernt static approch was taken as the only path we could enable
before
>> code free in 10 days.
>>
>> regards sean.
>>
>>>
>>> Thanks
>>>
>>
>> https://www.spinics.net/lists/linux-virtualization/msg48451.html
> 
-- 
Adri?n Moreno

Virtualization - Mar 2021 - [PATCH linux-next 2/9] vdpa: Introduce query of device config layout

[PATCH linux-next 2/9] vdpa: Introduce query of device config layout

[PATCH linux-next 2/9] vdpa: Introduce query of device config layout