thr3ads.net - Virtualization - [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Jason Wang

2021-Jun-28 05:03 UTC

[PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

? 2021/6/25 ??2:45, Parav Pandit ??:>
>> From: Jason Wang <jasowang at redhat.com>
>> Sent: Friday, June 25, 2021 8:59 AM
>>
>> ? 2021/6/24 ??3:59, Parav Pandit ??:
>>>> From: Jason Wang <jasowang at redhat.com>
>>>> Sent: Thursday, June 24, 2021 12:35 PM
>>>>
>>>>>> Consider we had a mature set of virtio specific uAPI
for config space.
>>>>>> It would be a burden if we need an unnecessary
translation layer of
>>>>>> netlink in the middle:
>>>>>>
>>>>>> [vDPA parent (virtio_net_config)] <-> [netlink
>>>>>> (VDPA_ATTR_DEV_NET_XX)] <-> [userspace
>>>> (VDPA_ATTR_DEV_NET_XX)]
>>>>>>> <-> [ user (virtio_net_config)]
>>>>> This translation is not there. We show relevant net config
fields as
>>>> VDPA_ATTR_DEV_NET individually.
>>>>> It is not a binary dump which is harder for users to parse
and make
>>>>> any use
>>>> of it.
>>>>
>>>>
>>>> The is done implicitly, user needs to understand the semantic
of
>>>> virtio_net_config and map the individual fields to the vdpa
tool sub-
>>>> command.
>>> Mostly not virtio_net_config is for the producer and consumer sw
entities.
>>> Here user doesn't know about such layout and where its located.
>>> User only sets config params that gets set in the config space.
>>> (without understanding what is config layout and its location).
>>>
>>>>> It is only one level of translation from virtio_net_config
(kernel)
>>>>> -> netlink
>>>> vdpa fields.
>>>>> It is similar to 'struct netdevice' -> rtnl info
fields.
>>>> I think not, the problem is that the netdevice is not a part of
uAPI
>>>> but virtio_net_config is.
>>> Virtio_net_config is a UAPI for sw consumption.
>>> That way yes, netlink can also do it, however it requires side
channel
>> communicate what is valid.
>>>>>> If we make netlink simply a transport, it would be much
easier. And
>>>>>> we
>>>> had
>>>>>> the chance to unify the logic of build_config() and
set_config() in
>>>>>> the
>>>> driver.
>>>>> How? We need bit mask to tell that out of 21 fields which
fields to
>>>>> update
>>>> and which not.
>>>>> And that is further mixed with offset and length.
>>>> So set_config() could be called from userspace, so did
build_config().
>>>> The only difference is:
>>>>
>>>> 1) they're using different transport, ioctl vs netlink
>>>> 2) build_config() is only expected to be called by the
management
>>>> tool
>>>>
>>>> If qemu works well via set_config ioctl, netlink should work as
well.
>>>>
>>> mlx5 set_config is noop.
>>> vdpa_set_config() need to return an error code. I don't
vp_vdpa.c
>>> blindly writes the config as its passthrough.
>>> Parsing which fields to write and which not, using offset and
length is a
>> messy code with typecast and compare old values etc.
>>
>>
>> I don't see why it needs typecast, virtio_net_config is also uABI,
you can
>> deference the fields directly.
>>
> User wants set only the mac address of the config space. How do user space
tell this?

Good question, but we need first answer:

"Do we allow userspace space to modify one specific field of all the 
config?"

> Pass the whole virtio_net_config and inform via side channel?

That could be a method.

> Or vendor driver is expected to compare what fields changed from old config
space?

So I think we need solve them all, but netlink is probably the wrong 
layer, we need to solve them at virtio level and let netlink a transport 
for them virtio uAPI/ABI.

And we need to figure out if we want to allow the userspace to modify 
the config after the device is created. If not, simply build the 
virtio_net_config and pass it to the vDPA parent during device creation. 
If not, invent new uAPI at virtio level to passing the config fields. 
Virtio or vDPA core can provide the library to compare the difference.

My feeling is that, if we restrict to only support build the config 
during the creation, it would simply a lot of things. And I didn't 
notice a use case that we need to change the config fields in the middle 
via the management API/tool.

>   
>>>> Btw, what happens if management tool tries to modify the mac of
vDPA
>>>> when the device is already used by the driver?
>>> At present it allows modifying, but it should be improved in future
to fail if
>> device is in use.
>>
>>
>> This is something we need to fix I think. Or if it's really useful
to
>> allowing the attributes to be modified after the device is created.
>>
>> Why not simply allow the config to be built only at device creation?
>>
> That avoids the problem of modifying fields after bind to vhost.
> But UAPI issue still remains so lets resolve that first.
>
>>>>>>>> And actually, it's not the binary blob
since uapi clearly define the
>>>>>>>> format (e.g struct virtio_net_config), can we
find a way to use that?
>>>>>>>> E.g introduce device/net specific command and
passing the blob with
>>>>>>>> length and negotiated features.
>>>>>>> Length may change in future, mostly expand. And
parsing based on
>>>> length
>>>>>> is not such a clean way.
>>>>>>
>>>>>>
>>>>>> Length is only for legal checking. The config is self
contained with:
>>>>>>
>>>>> Unlikely. When structure size increases later, the parsing
will change
>> based
>>>> on the length.
>>>>> Because older kernel would return shorter length with older
iproute2
>> tool.
>>>> This is fine since the older kernel only support less features.
The only
>>>> possible issue if the old iproute 2 runs on new kernel. With
the current
>>>> proposal, it may cause some config fields can't not be
showed.
>>>>
>>> Not showing is ok.
>>> But the code is messy to typecast on size.
>>>
>>>> I think it might be useful to introduce a command to simply
dump the
>>>> config space.
>>>>
>>>>
>>>>> So user space always have to deal and have nasty
parsing/typecasting
>>>> based on the length.
>>> Such nasty parsing is not required for netlink interface.
>>>
>>>> That's how userspace (Qemu) is expected to work now. The
userspace
>>>> should determine the semantic of the fields based on the
features.
>>>>
>>>> Differentiate config fields doesn't help much, e.g
userspace still need
>>>> to differ LINK_UP and ANNOUNCE for the status field.
>>> Yes, this parsing is from constant size u16 status.
>>> [..]
>>>
>>>>> Its not about performance. By the time 1st call is made,
features got
>>>> updated and it is out of sync with config.
>>>>>> 1) get config
>>>>>> 2) get device id
>>>>>> 3) get features
>>>>>>
>>>>> This requires using features from 3rd netlink output to
decode output of
>>>> 1st netlink output.
>>>>> Which is a bit odd of netlink.
>>>>> Other netlink nla_put() probably sending whole structure
doesn?t need
>> to
>>>> do it.
>>>>
>>>>
>>>> Well, we can pack them all into a single skb isn't it?
(probably with a
>>>> config len).
>>>>
>>> You want to pack features and config both in the single nla_put()?
>>> If so, it isn't necessary. There are more examples in kernel
that adds
>> individual fields instead of nla_put(blob).
>>> I wouldn?t follow those nla_put() callers.
>>
>> No, a single skb not single nla_put().
>>
>> Actually git grep told me a very good example of carrying uABI via
>> netlink, that is the ndt_config:
>>
>> 1) we had ndt_config definition in the uAPI
>> 2) netlink simply carries the structure in neightbl_fill_info():
>>
>>   ??????????????? if (nla_put(skb, NDTA_CONFIG, sizeof(ndc), &ndc))
>>
> Sure. But the reverse path doesn?t have this that requires side band mask.
> My concern is not for existing virtio_net_config layout, but the future
increase of it requires size based typecasting on both directions.
>
>> For virito_net_config, why not simply:
>>
>> len = ops->get_config_len();
>> config = kmalloc(len, GFP_KERNEL);
>> ops->get_config(vdev, 0, config, len);
>> nla_put(skb, VIRTIO_CONFIG, config, len);
> User space need to parse content based on this length as it can change in
future.
> Length telling how to typecast is want I want to avoid here.

So there's no real difference, using xxx_is_valid, is just a implicit 
length checking as what is done via config_len:

if (a_is_valid) {
 ??? /* dump a */
} else if (b_is_valid) {
 ??? /* dump b */
}

vs.

if (length < offsetof(struct virtio_net_config, next field of a)) {
 ??? /* dump a*/
}

Actually, Qemu has solved the similar issues via the uAPI:

https://git.qemu.org/?p=qemu.git;a=blob;f=hw/net/virtio-net.c;h=bd7958b9f0eed2705e0d6a2feaeaefb5e63bd6a4;hb=HEAD#l92

If the current uAPI is not sufficient, let's tweak it.

>
>> nla_put_le64(skb, VIRTIO_FETURES, features);
>>
>   
>> For build_config, we can simply do thing reversely. Then everything
>> works via the existing virtio uAPI/ABI.
>>
> In reverse path how do you tell which fields of the config space to set and
which to ignore?

See my above reply.

> Shall we use u64 features for it?
> Will type of device able to describe their config space via a feature bit?

I think not. They're a lot of fields can not be deduced from the 
features (mtu, queue paris, mac etc).

But I agree the the config fields can not work without the feature bits.

>
>>>>>> For build config, it's only one
>>>>>>
>>>>>> 1) build config
>>>>>>
>>>>>>
>>>>>>> I prefer to follow rest of the kernel style to
return self contained
>>>>>> invidividual fields.
>>>>>>
>>>>>>
>>>>>> But I saw a lot of kernel codes choose to use e.g
nla_put() directly with
>>>>>> module specific structure.
>>>>>>
>>>>> It might be self-contained structure that probably has not
found the
>> need
>>>> to expand.
>>>>
>>>>
>>>> I think it's just a matter of putting the config length
with the config
>>>> data. Note that we've already had .get_config_size() ops
for validating
>>>> inputs through VHOST_SET_CONFIG/VHOST_GET_CONFIG.
>>> This length comes as part of the netlink interface already, no need
for extra
>> length.
>>> The whole point is to avoid parsing based on length.
>>
>> Well, it doesn't do anything difference compared to xxx_is_valid
which
>> just calculating the offset implicitly (via the compiler).
>>
>>
>>> We cannot change the virtio_net_config UAPI in use, but netlink
code
>> doesn?t need to be bound to size based typecasting and compare fields
>> during build_config().
>>
>>
>> The points are:
>>
>> 1) Avoid duplicating the existing uAPIs
>> 2) Avoid unnecessary parsing in the netlink, netlink is just the
>> transport, it's the charge of the vDPA parent to do that
>>
> All those parsing will move to vendor drivers to validate offset/length to
update only specific fields of config space.

Or the vDPA or virtio core can provide helpers to compare the difference 
if it's necessary.

Thanks

> It is a transport to carry fields which is what we are using for.
> I agree there that these config fields are exposed individually in both
directions to keep it safe from structure layout increments.
>   
>> Thanks
>>

Parav Pandit

2021-Jun-28 10:56 UTC

head link

[PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

> From: Jason Wang <jasowang at redhat.com>
> Sent: Monday, June 28, 2021 10:33 AM
> [..]
> >>
> >> I don't see why it needs typecast, virtio_net_config is also
uABI,
> >> you can deference the fields directly.
> >>
> > User wants set only the mac address of the config space. How do user
> space tell this?
> 
> 
> Good question, but we need first answer:
> 
> "Do we allow userspace space to modify one specific field of all the
config?"
> Even if we restrict to specify config params at creation time, question still
remains open how to pass, either as whole struct + side_based info or as
individual fields.
More below.
> 
> > Pass the whole virtio_net_config and inform via side channel?
> 
> 
> That could be a method.I prefer the method to pass individual fields which has the clean code approach
and full flexibility.
Clean code = 
1. no typecasting based on length
2. self-describing fields, do not depends on feature bits parsing
3. proof against structure size increases in fully backward/forward
compatibility without code changes
> 
> 
> > Or vendor driver is expected to compare what fields changed from old
> config space?
> 
> 
> So I think we need solve them all, but netlink is probably the wrong
> layer, we need to solve them at virtio level and let netlink a transport
> for them virtio uAPI/ABI.In spirit of using the virtio UAPI structure, we creating other side band
fields, that results into code that?s not common to netlink method.
Ioctl() interface of QEMU/vhost didn't have any other choice with ioctl().
> 
> And we need to figure out if we want to allow the userspace to modify
> the config after the device is created. If not, simply build the
> virtio_net_config and pass it to the vDPA parent during device creation.I like this idea to pass fields at creation time.
> If not, invent new uAPI at virtio level to passing the config fields.
> Virtio or vDPA core can provide the library to compare the difference.
> 
> My feeling is that, if we restrict to only support build the config
> during the creation, it would simply a lot of things. And I didn't
> notice a use case that we need to change the config fields in the middle
> via the management API/tool.
> Sure yes. Whichever config fields user wants to pass, user space passes it.
> >> For virito_net_config, why not simply:
> >>
> >> len = ops->get_config_len();
> >> config = kmalloc(len, GFP_KERNEL);
> >> ops->get_config(vdev, 0, config, len);
> >> nla_put(skb, VIRTIO_CONFIG, config, len);
> > User space need to parse content based on this length as it can change
in
> future.
> > Length telling how to typecast is want I want to avoid here.
> 
> 
> So there's no real difference, using xxx_is_valid, is just a implicit
> length checking as what is done via config_len:
> 
> if (a_is_valid) {
>  ??? /* dump a */
> } else if (b_is_valid) {
>  ??? /* dump b */
> }
> 
> vs.
> 
> if (length < offsetof(struct virtio_net_config, next field of a)) {
>  ??? /* dump a*/+ the feature parsing code, for each field.
> }
> 
> Actually, Qemu has solved the similar issues via the uAPI:
> 
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/net/virtio-
> net.c;h=bd7958b9f0eed2705e0d6a2feaeaefb5e63bd6a4;hb=HEAD#l92
> 
> If the current uAPI is not sufficient, let's tweak it.I am unable to convince my self to build side bitmask for config fields, type
casting code in spirit of using existing structure UAPI.
This creates messy code for future.

Michael S. Tsirkin

2021-Jun-28 22:39 UTC

head link

[PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

On Mon, Jun 28, 2021 at 01:03:20PM +0800, Jason Wang
wrote:> So I think we need solve them all, but netlink is probably the wrong layer,
> we need to solve them at virtio level and let netlink a transport for them
> virtio uAPI/ABI.
I'm not sure I follow. virtio defines VF to driver communication.
This is PF to hypervisor. virtio simply does not cover it ATM.

-- 
MST

Virtualization - Jun 2021 - [PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

[PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

[PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout

[PATCH linux-next v3 2/6] vdpa: Introduce query of device config layout