Michael S. Tsirkin
2021-Apr-07 13:45 UTC
[PATCH 2/3] virito_pci: add timeout to reset device operation
On Wed, Apr 07, 2021 at 12:09:23PM +0000, Max Gurtovoy wrote:> According to the spec after writing 0 to device_status, the driver MUST > wait for a read of device_status to return 0 before reinitializing the > device. In case we have a device that won't return 0, the reset > operation will loop forever and cause the host/vm to stuck. Set timeout > for 3 minutes before giving up on the device. > > Signed-off-by: Max Gurtovoy <mgurtovoy at nvidia.com> > --- > drivers/virtio/virtio_pci_modern.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c > index cc3412a96a17..dcee616e8d21 100644 > --- a/drivers/virtio/virtio_pci_modern.c > +++ b/drivers/virtio/virtio_pci_modern.c > @@ -162,6 +162,7 @@ static int vp_reset(struct virtio_device *vdev) > { > struct virtio_pci_device *vp_dev = to_vp_device(vdev); > struct virtio_pci_modern_device *mdev = &vp_dev->mdev; > + unsigned long timeout = jiffies + msecs_to_jiffies(180000); > > /* 0 status means a reset. */ > vp_modern_set_status(mdev, 0); > @@ -169,9 +170,16 @@ static int vp_reset(struct virtio_device *vdev) > * device_status to return 0 before reinitializing the device. > * This will flush out the status write, and flush in device writes, > * including MSI-X interrupts, if any. > + * Set a timeout before giving up on the device. > */ > - while (vp_modern_get_status(mdev)) > + while (vp_modern_get_status(mdev)) { > + if (time_after(jiffies, timeout)) { > + dev_err(&vdev->dev, "virtio: device not ready. " > + "Aborting. Try again later\n"); > + return -EAGAIN; > + } > msleep(1); > + } > /* Flush pending VQ/configuration callbacks. */ > vp_synchronize_vectors(vdev); > return 0;Problem is everyone just ignores the return code from reset. Timing out like that has a chance to cause a lot of trouble if the device remains active - we need to make reset robust. What exactly is going on with the device that get status never returns 0? E.g. maybe it's in a state where it's returning all 1's because it's wedged permanently - using that would be better...> -- > 2.25.4