thr3ads.net - Virtualization - [PATCH] vdpa/mlx5: set_features should allow reset to zero [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Si-Wei Liu

2021-Feb-24 18:24 UTC

[PATCH] vdpa/mlx5: set_features should allow reset to zero

On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote:> On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
>>
>> On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
>>> On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
>>>> On 2021/2/23 9:12 ??, Si-Wei Liu wrote:
>>>>> On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
>>>>>> On Mon, Feb 22, 2021 at 12:14:17PM +0800, Jason Wang
wrote:
>>>>>>> On 2021/2/19 7:54 ??, Si-Wei Liu wrote:
>>>>>>>> Commit 452639a64ad8 ("vdpa: make sure
set_features is invoked
>>>>>>>> for legacy") made an exception for legacy
guests to reset
>>>>>>>> features to 0, when config space is accessed
before features
>>>>>>>> are set. We should relieve the
verify_min_features() check
>>>>>>>> and allow features reset to 0 for this case.
>>>>>>>>
>>>>>>>> It's worth noting that not just legacy
guests could access
>>>>>>>> config space before features are set. For
instance, when
>>>>>>>> feature VIRTIO_NET_F_MTU is advertised some
modern driver
>>>>>>>> will try to access and validate the MTU present
in the config
>>>>>>>> space before virtio features are set.
>>>>>>> This looks like a spec violation:
>>>>>>>
>>>>>>> "
>>>>>>>
>>>>>>> The following driver-read-only field, mtu only
exists if
>>>>>>> VIRTIO_NET_F_MTU is
>>>>>>> set.
>>>>>>> This field specifies the maximum MTU for the driver
to use.
>>>>>>> "
>>>>>>>
>>>>>>> Do we really want to workaround this?
>>>>>>>
>>>>>>> Thanks
>>>>>> And also:
>>>>>>
>>>>>> The driver MUST follow this sequence to initialize a
device:
>>>>>> 1. Reset the device.
>>>>>> 2. Set the ACKNOWLEDGE status bit: the guest OS has
noticed the device.
>>>>>> 3. Set the DRIVER status bit: the guest OS knows how to
drive the
>>>>>> device.
>>>>>> 4. Read device feature bits, and write the subset of
feature bits
>>>>>> understood by the OS and driver to the
>>>>>> device. During this step the driver MAY read (but MUST
NOT write)
>>>>>> the device-specific configuration
>>>>>> fields to check that it can support the device before
accepting it.
>>>>>> 5. Set the FEATURES_OK status bit. The driver MUST NOT
accept new
>>>>>> feature bits after this step.
>>>>>> 6. Re-read device status to ensure the FEATURES_OK bit
is still set:
>>>>>> otherwise, the device does not
>>>>>> support our subset of features and the device is
unusable.
>>>>>> 7. Perform device-specific setup, including discovery
of virtqueues
>>>>>> for the device, optional per-bus setup,
>>>>>> reading and possibly writing the device?s virtio
configuration
>>>>>> space, and population of virtqueues.
>>>>>> 8. Set the DRIVER_OK status bit. At this point the
device is ?live?.
>>>>>>
>>>>>>
>>>>>> so accessing config space before FEATURES_OK is a spec
violation, right?
>>>>> It is, but it's not relevant to what this commit tries
to address. I
>>>>> thought the legacy guest still needs to be supported.
>>>>>
>>>>> Having said, a separate patch has to be posted to fix the
guest driver
>>>>> issue where this discrepancy is introduced to
virtnet_validate() (since
>>>>> commit fe36cbe067). But it's not technically related to
this patch.
>>>>>
>>>>> -Siwei
>>>> I think it's a bug to read config space in validate, we
should move it to
>>>> virtnet_probe().
>>>>
>>>> Thanks
>>> I take it back, reading but not writing seems to be explicitly
allowed by spec.
>>> So our way to detect a legacy guest is bogus, need to think what is
>>> the best way to handle this.
>> Then maybe revert commit fe36cbe067 and friends, and have QEMU detect
legacy
>> guest? Supposedly only config space write access needs to be guarded
before
>> setting FEATURES_OK.
>>
>> -Siwie
> Detecting it isn't enough though, we will need a new ioctl to notify
> the kernel that it's a legacy guest. Ugh :(Well, although I think adding an ioctl is doable, may I know what the 
use case there will be for kernel to leverage such info directly? Is 
there a case QEMU can't do with dedicate ioctls later if there's indeed 
differentiation (legacy v.s. modern) needed?

One of the reason I asked is if this ioctl becomes a mandate for 
vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't 
see this ioctl coming?

If it's optional, suppose the kernel may need it only when it becomes 
necessary?

Thanks,
-Siwei

>
>
>>>>>>>> Rejecting reset to 0
>>>>>>>> prematurely causes correct MTU and link status
unable to load
>>>>>>>> for the very first config space access,
rendering issues like
>>>>>>>> guest showing inaccurate MTU value, or failure
to reject
>>>>>>>> out-of-range MTU.
>>>>>>>>
>>>>>>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA
driver for
>>>>>>>> supported mlx5 devices")
>>>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu at
oracle.com>
>>>>>>>> ---
>>>>>>>>   ?? drivers/vdpa/mlx5/net/mlx5_vnet.c | 15
+--------------
>>>>>>>>   ?? 1 file changed, 1 insertion(+), 14
deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> index 7c1f789..540dd67 100644
>>>>>>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>>>>>>> @@ -1490,14 +1490,6 @@ static u64
>>>>>>>> mlx5_vdpa_get_features(struct vdpa_device
*vdev)
>>>>>>>>   ?????? return mvdev->mlx_features;
>>>>>>>>   ?? }
>>>>>>>> -static int verify_min_features(struct
mlx5_vdpa_dev *mvdev,
>>>>>>>> u64 features)
>>>>>>>> -{
>>>>>>>> -??? if (!(features &
BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)))
>>>>>>>> -??????? return -EOPNOTSUPP;
>>>>>>>> -
>>>>>>>> -??? return 0;
>>>>>>>> -}
>>>>>>>> -
>>>>>>>>   ?? static int setup_virtqueues(struct
mlx5_vdpa_net *ndev)
>>>>>>>>   ?? {
>>>>>>>>   ?????? int err;
>>>>>>>> @@ -1558,18 +1550,13 @@ static int
>>>>>>>> mlx5_vdpa_set_features(struct vdpa_device
*vdev, u64
>>>>>>>> features)
>>>>>>>>   ?? {
>>>>>>>>   ?????? struct mlx5_vdpa_dev *mvdev =
to_mvdev(vdev);
>>>>>>>>   ?????? struct mlx5_vdpa_net *ndev =
to_mlx5_vdpa_ndev(mvdev);
>>>>>>>> -??? int err;
>>>>>>>>   ?????? print_features(mvdev, features, true);
>>>>>>>> -??? err = verify_min_features(mvdev,
features);
>>>>>>>> -??? if (err)
>>>>>>>> -??????? return err;
>>>>>>>> -
>>>>>>>>   ?????? ndev->mvdev.actual_features =
features &
>>>>>>>> ndev->mvdev.mlx_features;
>>>>>>>>   ?????? ndev->config.mtu =
cpu_to_mlx5vdpa16(mvdev, ndev->mtu);
>>>>>>>>   ?????? ndev->config.status |=
cpu_to_mlx5vdpa16(mvdev,
>>>>>>>> VIRTIO_NET_S_LINK_UP);
>>>>>>>> -??? return err;
>>>>>>>> +??? return 0;
>>>>>>>>   ?? }
>>>>>>>>   ?? static void mlx5_vdpa_set_config_cb(struct
vdpa_device
>>>>>>>> *vdev, struct vdpa_callback *cb)

Si-Wei Liu

2021-Feb-26 00:35 UTC

head link

[PATCH] vdpa/mlx5: set_features should allow reset to zero

Hi Michael,

Are you okay to live without this ioctl for now? I think QEMU is the one 
that needs to be fixed and will have to be made legacy guest aware. I 
think the kernel can just honor the feature negotiation result done by 
QEMU and do as what's told to.Will you agree?

If it's fine, I would proceed to reverting commit fe36cbe067 and related 
code in question from the kernel.

Thanks,
-Siwei

On 2/24/2021 10:24 AM, Si-Wei Liu wrote:>> Detecting it isn't enough though, we will need a new ioctl to
notify
>> the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the 
> use case there will be for kernel to leverage such info directly? Is 
> there a case QEMU can't do with dedicate ioctls later if there's 
> indeed differentiation (legacy v.s. modern) needed?
>
> One of the reason I asked is if this ioctl becomes a mandate for 
> vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't 
> see this ioctl coming?
>
> If it's optional, suppose the kernel may need it only when it becomes 
> necessary?
>
> Thanks,
> -Siwei 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210225/4b6770ca/attachment-0001.html>

Si-Wei Liu

2021-Feb-26 00:56 UTC

head link

[PATCH] vdpa/mlx5: set_features should allow reset to zero

Hi Michael,

Are you okay to live without this ioctl for now? I think QEMU is the one 
that needs to be fixed and will have to be made legacy guest aware. I 
think the kernel can just honor the feature negotiation result done by 
QEMU and do as what's told to.Will you agree?

If it's fine, I would proceed to reverting commit fe36cbe067 and related 
code in question from the kernel.

Thanks,
-Siwei

On 2/24/2021 10:24 AM, Si-Wei Liu wrote:>> Detecting it isn't enough though, we will need a new ioctl to
notify
>> the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the 
> use case there will be for kernel to leverage such info directly? Is 
> there a case QEMU can't do with dedicate ioctls later if there's 
> indeed differentiation (legacy v.s. modern) needed?
>
> One of the reason I asked is if this ioctl becomes a mandate for 
> vhost-vdpa kernel. QEMU would reject initialize vhost-vdpa if doesn't 
> see this ioctl coming?
>
> If it's optional, suppose the kernel may need it only when it becomes 
> necessary?
>
> Thanks,
> -Siwei

Michael S. Tsirkin

2021-Feb-28 21:28 UTC

head link

[PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu
wrote:> 
> 
> On 2/23/2021 9:04 PM, Michael S. Tsirkin wrote:
> > On Tue, Feb 23, 2021 at 11:35:57AM -0800, Si-Wei Liu wrote:
> > > 
> > > On 2/23/2021 5:26 AM, Michael S. Tsirkin wrote:
> > > > On Tue, Feb 23, 2021 at 10:03:57AM +0800, Jason Wang wrote:
> > > > > On 2021/2/23 9:12 ??, Si-Wei Liu wrote:
> > > > > > On 2/21/2021 11:34 PM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Feb 22, 2021 at 12:14:17PM +0800,
Jason Wang wrote:
> > > > > > > > On 2021/2/19 7:54 ??, Si-Wei Liu wrote:
> > > > > > > > > Commit 452639a64ad8 ("vdpa:
make sure set_features is invoked
> > > > > > > > > for legacy") made an exception
for legacy guests to reset
> > > > > > > > > features to 0, when config space is
accessed before features
> > > > > > > > > are set. We should relieve the
verify_min_features() check
> > > > > > > > > and allow features reset to 0 for
this case.
> > > > > > > > > 
> > > > > > > > > It's worth noting that not just
legacy guests could access
> > > > > > > > > config space before features are
set. For instance, when
> > > > > > > > > feature VIRTIO_NET_F_MTU is
advertised some modern driver
> > > > > > > > > will try to access and validate the
MTU present in the config
> > > > > > > > > space before virtio features are
set.
> > > > > > > > This looks like a spec violation:
> > > > > > > > 
> > > > > > > > "
> > > > > > > > 
> > > > > > > > The following driver-read-only field,
mtu only exists if
> > > > > > > > VIRTIO_NET_F_MTU is
> > > > > > > > set.
> > > > > > > > This field specifies the maximum MTU for
the driver to use.
> > > > > > > > "
> > > > > > > > 
> > > > > > > > Do we really want to workaround this?
> > > > > > > > 
> > > > > > > > Thanks
> > > > > > > And also:
> > > > > > > 
> > > > > > > The driver MUST follow this sequence to
initialize a device:
> > > > > > > 1. Reset the device.
> > > > > > > 2. Set the ACKNOWLEDGE status bit: the guest
OS has noticed the device.
> > > > > > > 3. Set the DRIVER status bit: the guest OS
knows how to drive the
> > > > > > > device.
> > > > > > > 4. Read device feature bits, and write the
subset of feature bits
> > > > > > > understood by the OS and driver to the
> > > > > > > device. During this step the driver MAY read
(but MUST NOT write)
> > > > > > > the device-specific configuration
> > > > > > > fields to check that it can support the
device before accepting it.
> > > > > > > 5. Set the FEATURES_OK status bit. The driver
MUST NOT accept new
> > > > > > > feature bits after this step.
> > > > > > > 6. Re-read device status to ensure the
FEATURES_OK bit is still set:
> > > > > > > otherwise, the device does not
> > > > > > > support our subset of features and the device
is unusable.
> > > > > > > 7. Perform device-specific setup, including
discovery of virtqueues
> > > > > > > for the device, optional per-bus setup,
> > > > > > > reading and possibly writing the device?s
virtio configuration
> > > > > > > space, and population of virtqueues.
> > > > > > > 8. Set the DRIVER_OK status bit. At this
point the device is ?live?.
> > > > > > > 
> > > > > > > 
> > > > > > > so accessing config space before FEATURES_OK
is a spec violation, right?
> > > > > > It is, but it's not relevant to what this
commit tries to address. I
> > > > > > thought the legacy guest still needs to be
supported.
> > > > > > 
> > > > > > Having said, a separate patch has to be posted to
fix the guest driver
> > > > > > issue where this discrepancy is introduced to
virtnet_validate() (since
> > > > > > commit fe36cbe067). But it's not technically
related to this patch.
> > > > > > 
> > > > > > -Siwei
> > > > > I think it's a bug to read config space in
validate, we should move it to
> > > > > virtnet_probe().
> > > > > 
> > > > > Thanks
> > > > I take it back, reading but not writing seems to be
explicitly allowed by spec.
> > > > So our way to detect a legacy guest is bogus, need to think
what is
> > > > the best way to handle this.
> > > Then maybe revert commit fe36cbe067 and friends, and have QEMU
detect legacy
> > > guest? Supposedly only config space write access needs to be
guarded before
> > > setting FEATURES_OK.
> > > 
> > > -Siwie
> > Detecting it isn't enough though, we will need a new ioctl to
notify
> > the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the use
> case there will be for kernel to leverage such info directly? Is there a
> case QEMU can't do with dedicate ioctls later if there's indeed
> differentiation (legacy v.s. modern) needed?
> 
> One of the reason I asked is if this ioctl becomes a mandate for vhost-vdpa
> kernel. QEMU would reject initialize vhost-vdpa if doesn't see this
ioctl
> coming?
Only on BE hosts or guests I think. With LE host and guest legacy and
modern behave the same so ioctl isn't needed.
> If it's optional, suppose the kernel may need it only when it becomes
> necessary?
> 
> Thanks,
> -Siwei

Michael S. Tsirkin

2021-Feb-28 21:34 UTC

head link

[PATCH] vdpa/mlx5: set_features should allow reset to zero

On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu
wrote:> > Detecting it isn't enough though, we will need a new ioctl to
notify
> > the kernel that it's a legacy guest. Ugh :(
> Well, although I think adding an ioctl is doable, may I know what the use
> case there will be for kernel to leverage such info directly? Is there a
> case QEMU can't do with dedicate ioctls later if there's indeed
> differentiation (legacy v.s. modern) needed?
BTW a good API could be

#define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
#define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)

we did it per vring but maybe that was a mistake ...

-- 
MST

Jason Wang

2021-Mar-01 03:56 UTC

head link

[PATCH] vdpa/mlx5: set_features should allow reset to zero

On 2021/3/1 5:34 ??, Michael S. Tsirkin wrote:> On Wed, Feb 24, 2021 at 10:24:41AM -0800, Si-Wei Liu wrote:
>>> Detecting it isn't enough though, we will need a new ioctl to
notify
>>> the kernel that it's a legacy guest. Ugh :(
>> Well, although I think adding an ioctl is doable, may I know what the
use
>> case there will be for kernel to leverage such info directly? Is there
a
>> case QEMU can't do with dedicate ioctls later if there's indeed
>> differentiation (legacy v.s. modern) needed?
> BTW a good API could be
>
> #define VHOST_SET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
> #define VHOST_GET_ENDIAN _IOW(VHOST_VIRTIO, ?, int)
>
> we did it per vring but maybe that was a mistake ...

Actually, I wonder whether it's good time to just not support legacy 
driver for vDPA. Consider:

1) It's definition is no-normative
2) A lot of budren of codes

So qemu can still present the legacy device since the config space or 
other stuffs that is presented by vhost-vDPA is not expected to be 
accessed by guest directly. Qemu can do the endian conversion when 
necessary in this case?

Thanks

>

Virtualization - Mar 2021 - [PATCH] vdpa/mlx5: set_features should allow reset to zero

[PATCH] vdpa/mlx5: set_features should allow reset to zero

[PATCH] vdpa/mlx5: set_features should allow reset to zero

[PATCH] vdpa/mlx5: set_features should allow reset to zero

[PATCH] vdpa/mlx5: set_features should allow reset to zero

[PATCH] vdpa/mlx5: set_features should allow reset to zero

[PATCH] vdpa/mlx5: set_features should allow reset to zero