Ming Lei
2020-Feb-18 02:21 UTC
[PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic at linux.ibm.com> wrote:> > Since nobody else is going to restart our hw_queue for us, the > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient > necessarily sufficient to ensure that the queue will get started again. > In case of global resource outage (-ENOMEM because mapping failure, > because of swiotlb full) our virtqueue may be empty and we can get > stuck with a stopped hw_queue. > > Let us not stop the queue on arbitrary errors, but only on -EONSPC which > indicates a full virtqueue, where the hw_queue is guaranteed to get > started by virtblk_done() before when it makes sense to carry on > submitting requests. Let us also remove a stale comment.The generic solution may be to stop queue only when there is any in-flight request not completed. Checking -ENOMEM may not be enough, given -EIO can be returned from virtqueue_add() too in case of dma map failure. Thanks,
Halil Pasic
2020-Feb-18 12:35 UTC
[PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
On Tue, 18 Feb 2020 10:21:18 +0800 Ming Lei <tom.leiming at gmail.com> wrote:> On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic at linux.ibm.com> wrote: > > > > Since nobody else is going to restart our hw_queue for us, the > > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient > > necessarily sufficient to ensure that the queue will get started again. > > In case of global resource outage (-ENOMEM because mapping failure, > > because of swiotlb full) our virtqueue may be empty and we can get > > stuck with a stopped hw_queue. > > > > Let us not stop the queue on arbitrary errors, but only on -EONSPC which > > indicates a full virtqueue, where the hw_queue is guaranteed to get > > started by virtblk_done() before when it makes sense to carry on > > submitting requests. Let us also remove a stale comment. > > The generic solution may be to stop queue only when there is any > in-flight request > not completed. >I think this is a pretty close to that. The queue is stopped only on ENOSPC, which means virtqueue is full.> Checking -ENOMEM may not be enough, given -EIO can be returned from > virtqueue_add() > too in case of dma map failure.I'm not checking on -ENOMEM. So the queue would not be stopped on EIO. Maybe I'm misunderstanding something In any case, please have another look at the diff, and if your concerns persist please help me understand. Thanks for having a look! Regards, Halil> > Thanks,
Ming Lei
2020-Feb-19 01:46 UTC
[PATCH 1/2] virtio-blk: fix hw_queue stopped on arbitrary error
On Tue, Feb 18, 2020 at 8:35 PM Halil Pasic <pasic at linux.ibm.com> wrote:> > On Tue, 18 Feb 2020 10:21:18 +0800 > Ming Lei <tom.leiming at gmail.com> wrote: > > > On Thu, Feb 13, 2020 at 8:38 PM Halil Pasic <pasic at linux.ibm.com> wrote: > > > > > > Since nobody else is going to restart our hw_queue for us, the > > > blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient > > > necessarily sufficient to ensure that the queue will get started again. > > > In case of global resource outage (-ENOMEM because mapping failure, > > > because of swiotlb full) our virtqueue may be empty and we can get > > > stuck with a stopped hw_queue. > > > > > > Let us not stop the queue on arbitrary errors, but only on -EONSPC which > > > indicates a full virtqueue, where the hw_queue is guaranteed to get > > > started by virtblk_done() before when it makes sense to carry on > > > submitting requests. Let us also remove a stale comment. > > > > The generic solution may be to stop queue only when there is any > > in-flight request > > not completed. > > > > I think this is a pretty close to that. The queue is stopped only on > ENOSPC, which means virtqueue is full. > > > Checking -ENOMEM may not be enough, given -EIO can be returned from > > virtqueue_add() > > too in case of dma map failure. > > I'm not checking on -ENOMEM. So the queue would not be stopped on EIO. > Maybe I'm misunderstanding something In any case, please have another > look at the diff, and if your concerns persist please help me understand.Looks I misread the patch, and this patch is fine: Reviewed-by: Ming Lei <ming.lei at redhat.com> Thanks, Ming Lei