On Fri, Nov 11, 2022 at 11:49:10PM +0800, Longpeng (Mike, Cloud Infrastructure
Service Product Dept.) wrote:>
>
>? 2022/11/11 23:14, Stefano Garzarella ??:
>>On Fri, Nov 11, 2022 at 10:55:05PM +0800, Longpeng(Mike) wrote:
>>>From: Longpeng <longpeng2 at huawei.com>
>>>
>>>1. We should not set status to 0 when invoking vp_vdpa_set_status().
>>>
>>>2. The driver MUST wait for a read of device_status to return 0
before
>>>? reinitializing the device.
>>>
>>>Signed-off-by: Longpeng <longpeng2 at huawei.com>
>>>---
>>>drivers/vdpa/virtio_pci/vp_vdpa.c | 11 ++++++++++-
>>>1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>>diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c
>>>b/drivers/vdpa/virtio_pci/vp_vdpa.c
>>>index d448db0c4de3..d35fac5cde11 100644
>>>--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
>>>+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
>>>@@ -212,8 +212,12 @@ static void vp_vdpa_set_status(struct
>>>vdpa_device *vdpa, u8 status)
>>>{
>>>????struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa);
>>>????struct virtio_pci_modern_device *mdev =
vp_vdpa_to_mdev(vp_vdpa);
>>>-??? u8 s = vp_vdpa_get_status(vdpa);
>>
>>Is this change really needed?
>>
>No need to get the status if we try to set status to 0 (trigger BUG).
>
Okay, but that's the case that should never happen, so IMHO we can leave
it as it is.
>>>+??? u8 s;
>>>+
>>>+??? /* We should never be setting status to 0. */
>>>+??? BUG_ON(status == 0);
>>
>>IMHO panicking the kernel seems excessive in this case, please use
>>WARN_ON and maybe return earlier.
>>
>Um...I referenced the vp_reset/vp_set_status,
Ah I see, maybe it's an old code, because recently we always try to
avoid BUG_ON().
>
>>>
>>>+??? s = vp_vdpa_get_status(vdpa);
>>>????if (status & VIRTIO_CONFIG_S_DRIVER_OK &&
>>>??????? !(s & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>>??????? vp_vdpa_request_irq(vp_vdpa);
>>>@@ -229,6 +233,11 @@ static int vp_vdpa_reset(struct vdpa_device
*vdpa)
>>>????u8 s = vp_vdpa_get_status(vdpa);
>>>
>>>????vp_modern_set_status(mdev, 0);
>>>+??? /* After writing 0 to device_status, the driver MUST wait for
>>>a read of
>>>+???? * device_status to return 0 before reinitializing the device.
>>>+???? */
>>>+??? while (vp_modern_get_status(mdev))
>>>+??????? msleep(1);
>>
>>Should we set a limit after which we give up? A malfunctioning
>>device could keep us here forever.
>>
>Yes, but the malfunctioning device maybe can not work anymore, how to
>handle it?
Maybe we should set the status to broken, but in this case we could just
return an error if we couldn't reset it, how about that?
Thanks,
Stefano