thr3ads.net - Virtualization - [PATCH 3/3] virtio_blk: implement blk_mq

If this information is useful, please help other people find it:
Share via:

Stefan Hajnoczi

2021-May-25 08:59 UTC

[PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang
wrote:> 
> ? 2021/5/20 ??10:13, Stefan Hajnoczi ??:
> > Request completion latency can be reduced by using polling instead of
> > irqs. Even Posted Interrupts or similar hardware support doesn't
beat
> > polling. The reason is that disabling virtqueue notifications saves
> > critical-path CPU cycles on the host by skipping irq injection and in
> > the guest by skipping the irq handler. So let's add
blk_mq_ops->poll()
> > support to virtio_blk.
> > 
> > The approach taken by this patch differs from the NVMe driver's
> > approach. NVMe dedicates hardware queues to polling and submits
> > REQ_HIPRI requests only on those queues. This patch does not require
> > exclusive polling queues for virtio_blk. Instead, it switches between
> > irqs and polling when one or more REQ_HIPRI requests are in flight on
a
> > virtqueue.
> > 
> > This is possible because toggling virtqueue notifications is cheap
even
> > while the virtqueue is running. NVMe cqs can't do this because
irqs are
> > only enabled/disabled at queue creation time.
> > 
> > This toggling approach requires no configuration. There is no need to
> > dedicate queues ahead of time or to teach users and orchestration
tools
> > how to set up polling queues.
> > 
> > Possible drawbacks of this approach:
> > 
> > - Hardware virtio_blk implementations may find virtqueue_disable_cb()
> >    expensive since it requires DMA.
> 
> 
> Note that it's probably not related to the behavior of the driver but
the
> design of the event suppression mechanism.
> 
> Device can choose to ignore the suppression flag and keep sending
> interrupts.
Yes, it's the design of the event suppression mechanism.

If we use dedicated polling virtqueues then the hardware doesn't need to
check whether interrupts are enabled for each notification. However,
there's no mechanism to tell the device that virtqueue interrupts are
permanently disabled. This means that as of today, even dedicated
virtqueues cannot suppress interrupts without the device checking for
each notification.
> >   If such devices become popular then
> >    the virtio_blk driver could use a similar approach to NVMe when
> >    VIRTIO_F_ACCESS_PLATFORM is detected in the future.
> > 
> > - If a blk_poll() thread is descheduled it not only hurts polling
> >    performance but also delays completion of non-REQ_HIPRI requests on
> >    that virtqueue since vq notifications are disabled.
> 
> 
> Can we poll only when only high pri requests are pending?
Yes, that's what this patch does.
> If the backend is a remote one, I think the polling may cause more cpu
> cycles.
Right, but polling is only done when userspace sets the RWF_HIPRI
request flag. Most applications don't support it and for those that do
it's probably an option that the user needs to enable explicitly.

Stefan
> > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> > index fc0fb1dcd399..f0243dcd745a 100644
> > --- a/drivers/block/virtio_blk.c
> > +++ b/drivers/block/virtio_blk.c
> > @@ -29,6 +29,16 @@ static struct workqueue_struct *virtblk_wq;
> >   struct virtio_blk_vq {
> >   	struct virtqueue *vq;
> >   	spinlock_t lock;
> > +
> > +	/* Number of non-REQ_HIPRI requests in flight. Protected by lock. */
> > +	unsigned int num_lopri;
> > +
> > +	/* Number of REQ_HIPRI requests in flight. Protected by lock. */
> > +	unsigned int num_hipri;
> > +
> > +	/* Are vq notifications enabled? Protected by lock. */
> > +	bool cb_enabled;
> 
> 
> We had event_flag_shadow, is it sufficient to introduce a new helper
> virtqueue_cb_is_enabled()?
Yes, I'll try that in the next revision.
> > +
> >   	char name[VQ_NAME_LEN];
> >   } ____cacheline_aligned_in_smp;
> > @@ -171,33 +181,67 @@ static inline void virtblk_request_done(struct
request *req)
> >   	blk_mq_end_request(req, virtblk_result(vbr));
> >   }
> > -static void virtblk_done(struct virtqueue *vq)
> > +/* Returns true if one or more requests completed */
> > +static bool virtblk_complete_requests(struct virtqueue *vq)
> >   {
> >   	struct virtio_blk *vblk = vq->vdev->priv;
> >   	struct virtio_blk_vq *vbq = &vblk->vqs[vq->index];
> >   	bool req_done = false;
> > +	bool last_hipri_done = false;
> >   	struct virtblk_req *vbr;
> >   	unsigned long flags;
> >   	unsigned int len;
> >   	spin_lock_irqsave(&vbq->lock, flags);
> > +
> >   	do {
> > -		virtqueue_disable_cb(vq);
> > +		if (vbq->cb_enabled)
> > +			virtqueue_disable_cb(vq);
> >   		while ((vbr = virtqueue_get_buf(vq, &len)) != NULL) {
> >   			struct request *req = blk_mq_rq_from_pdu(vbr);
> > +			if (req->cmd_flags & REQ_HIPRI) {
> > +				if (--vbq->num_hipri == 0)
> > +					last_hipri_done = true;
> > +			} else
> > +				vbq->num_lopri--;
> > +
> >   			if (likely(!blk_should_fake_timeout(req->q)))
> >   				blk_mq_complete_request(req);
> >   			req_done = true;
> >   		}
> >   		if (unlikely(virtqueue_is_broken(vq)))
> >   			break;
> > -	} while (!virtqueue_enable_cb(vq));
> > +
> > +		/* Enable vq notifications if non-polled requests remain */
> > +		if (last_hipri_done && vbq->num_lopri > 0) {
> > +			last_hipri_done = false;
> > +			vbq->cb_enabled = true;
> > +		}
> > +	} while (vbq->cb_enabled && !virtqueue_enable_cb(vq));
> >   	/* In case queue is stopped waiting for more buffers. */
> >   	if (req_done)
> >   		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
> >   	spin_unlock_irqrestore(&vbq->lock, flags);
> > +
> > +	return req_done;
> > +}
> > +
> > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
> > +{
> > +	struct virtio_blk *vblk = hctx->queue->queuedata;
> > +	struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
> > +
> > +	if (!virtqueue_more_used(vq))
> 
> 
> I'm not familiar with block polling but what happens if a buffer is
made
> available after virtqueue_more_used() returns false here?
Can you explain the scenario, I'm not sure I understand? "buffer is
made
available" -> are you thinking about additional requests being submitted
by the driver or an in-flight request being marked used by the device?

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210525/d243ee84/attachment-0001.sig>

Jason Wang

2021-May-27 05:48 UTC

head link

[PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

? 2021/5/25 ??4:59, Stefan Hajnoczi ??:> On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
>> ? 2021/5/20 ??10:13, Stefan Hajnoczi ??:
>>> Request completion latency can be reduced by using polling instead
of
>>> irqs. Even Posted Interrupts or similar hardware support
doesn't beat
>>> polling. The reason is that disabling virtqueue notifications saves
>>> critical-path CPU cycles on the host by skipping irq injection and
in
>>> the guest by skipping the irq handler. So let's add
blk_mq_ops->poll()
>>> support to virtio_blk.
>>>
>>> The approach taken by this patch differs from the NVMe driver's
>>> approach. NVMe dedicates hardware queues to polling and submits
>>> REQ_HIPRI requests only on those queues. This patch does not
require
>>> exclusive polling queues for virtio_blk. Instead, it switches
between
>>> irqs and polling when one or more REQ_HIPRI requests are in flight
on a
>>> virtqueue.
>>>
>>> This is possible because toggling virtqueue notifications is cheap
even
>>> while the virtqueue is running. NVMe cqs can't do this because
irqs are
>>> only enabled/disabled at queue creation time.
>>>
>>> This toggling approach requires no configuration. There is no need
to
>>> dedicate queues ahead of time or to teach users and orchestration
tools
>>> how to set up polling queues.
>>>
>>> Possible drawbacks of this approach:
>>>
>>> - Hardware virtio_blk implementations may find
virtqueue_disable_cb()
>>>     expensive since it requires DMA.
>>
>> Note that it's probably not related to the behavior of the driver
but the
>> design of the event suppression mechanism.
>>
>> Device can choose to ignore the suppression flag and keep sending
>> interrupts.
> Yes, it's the design of the event suppression mechanism.
>
> If we use dedicated polling virtqueues then the hardware doesn't need
to
> check whether interrupts are enabled for each notification. However,
> there's no mechanism to tell the device that virtqueue interrupts are
> permanently disabled. This means that as of today, even dedicated
> virtqueues cannot suppress interrupts without the device checking for
> each notification.

This can be detected via a transport specific way.

E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint.

>
>>>    If such devices become popular then
>>>     the virtio_blk driver could use a similar approach to NVMe when
>>>     VIRTIO_F_ACCESS_PLATFORM is detected in the future.
>>>
>>> - If a blk_poll() thread is descheduled it not only hurts polling
>>>     performance but also delays completion of non-REQ_HIPRI
requests on
>>>     that virtqueue since vq notifications are disabled.
>>
>> Can we poll only when only high pri requests are pending?
> Yes, that's what this patch does.
>
>> If the backend is a remote one, I think the polling may cause more cpu
>> cycles.
> Right, but polling is only done when userspace sets the RWF_HIPRI
> request flag. Most applications don't support it and for those that do
> it's probably an option that the user needs to enable explicitly.

I see.

>
> Stefan
>
>>> diff --git a/drivers/block/virtio_blk.c
b/drivers/block/virtio_blk.c
>>> index fc0fb1dcd399..f0243dcd745a 100644
>>> --- a/drivers/block/virtio_blk.c
>>> +++ b/drivers/block/virtio_blk.c
>>> @@ -29,6 +29,16 @@ static struct workqueue_struct *virtblk_wq;
>>>    struct virtio_blk_vq {
>>>    	struct virtqueue *vq;
>>>    	spinlock_t lock;
>>> +
>>> +	/* Number of non-REQ_HIPRI requests in flight. Protected by lock.
*/
>>> +	unsigned int num_lopri;
>>> +
>>> +	/* Number of REQ_HIPRI requests in flight. Protected by lock. */
>>> +	unsigned int num_hipri;
>>> +
>>> +	/* Are vq notifications enabled? Protected by lock. */
>>> +	bool cb_enabled;
>>
>> We had event_flag_shadow, is it sufficient to introduce a new helper
>> virtqueue_cb_is_enabled()?
> Yes, I'll try that in the next revision.
>
>>> +
>>>    	char name[VQ_NAME_LEN];
>>>    } ____cacheline_aligned_in_smp;
>>> @@ -171,33 +181,67 @@ static inline void
virtblk_request_done(struct request *req)
>>>    	blk_mq_end_request(req, virtblk_result(vbr));
>>>    }
>>> -static void virtblk_done(struct virtqueue *vq)
>>> +/* Returns true if one or more requests completed */
>>> +static bool virtblk_complete_requests(struct virtqueue *vq)
>>>    {
>>>    	struct virtio_blk *vblk = vq->vdev->priv;
>>>    	struct virtio_blk_vq *vbq = &vblk->vqs[vq->index];
>>>    	bool req_done = false;
>>> +	bool last_hipri_done = false;
>>>    	struct virtblk_req *vbr;
>>>    	unsigned long flags;
>>>    	unsigned int len;
>>>    	spin_lock_irqsave(&vbq->lock, flags);
>>> +
>>>    	do {
>>> -		virtqueue_disable_cb(vq);
>>> +		if (vbq->cb_enabled)
>>> +			virtqueue_disable_cb(vq);
>>>    		while ((vbr = virtqueue_get_buf(vq, &len)) != NULL) {
>>>    			struct request *req = blk_mq_rq_from_pdu(vbr);
>>> +			if (req->cmd_flags & REQ_HIPRI) {
>>> +				if (--vbq->num_hipri == 0)
>>> +					last_hipri_done = true;
>>> +			} else
>>> +				vbq->num_lopri--;
>>> +
>>>    			if (likely(!blk_should_fake_timeout(req->q)))
>>>    				blk_mq_complete_request(req);
>>>    			req_done = true;
>>>    		}
>>>    		if (unlikely(virtqueue_is_broken(vq)))
>>>    			break;
>>> -	} while (!virtqueue_enable_cb(vq));
>>> +
>>> +		/* Enable vq notifications if non-polled requests remain */
>>> +		if (last_hipri_done && vbq->num_lopri > 0) {
>>> +			last_hipri_done = false;
>>> +			vbq->cb_enabled = true;
>>> +		}
>>> +	} while (vbq->cb_enabled && !virtqueue_enable_cb(vq));
>>>    	/* In case queue is stopped waiting for more buffers. */
>>>    	if (req_done)
>>>    		blk_mq_start_stopped_hw_queues(vblk->disk->queue, true);
>>>    	spin_unlock_irqrestore(&vbq->lock, flags);
>>> +
>>> +	return req_done;
>>> +}
>>> +
>>> +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
>>> +{
>>> +	struct virtio_blk *vblk = hctx->queue->queuedata;
>>> +	struct virtqueue *vq = vblk->vqs[hctx->queue_num].vq;
>>> +
>>> +	if (!virtqueue_more_used(vq))
>>
>> I'm not familiar with block polling but what happens if a buffer is
made
>> available after virtqueue_more_used() returns false here?
> Can you explain the scenario, I'm not sure I understand? "buffer
is made
> available" -> are you thinking about additional requests being
submitted
> by the driver or an in-flight request being marked used by the device?

Something like that:

1) requests are submitted
2) poll but virtqueue_more_used() return false
3) device make buffer used

In this case, will poll() be triggered again by somebody else? (I think 
interrupt is disabled here).

Thanks


>
> Stefan

Stefan Hajnoczi

2021-Jun-03 15:24 UTC

head link

[PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

On Thu, May 27, 2021 at 01:48:36PM +0800, Jason Wang
wrote:> 
> ? 2021/5/25 ??4:59, Stefan Hajnoczi ??:
> > On Tue, May 25, 2021 at 11:21:41AM +0800, Jason Wang wrote:
> > > ? 2021/5/20 ??10:13, Stefan Hajnoczi ??:
> > > > Request completion latency can be reduced by using polling
instead of
> > > > irqs. Even Posted Interrupts or similar hardware support
doesn't beat
> > > > polling. The reason is that disabling virtqueue
notifications saves
> > > > critical-path CPU cycles on the host by skipping irq
injection and in
> > > > the guest by skipping the irq handler. So let's add
blk_mq_ops->poll()
> > > > support to virtio_blk.
> > > > 
> > > > The approach taken by this patch differs from the NVMe
driver's
> > > > approach. NVMe dedicates hardware queues to polling and
submits
> > > > REQ_HIPRI requests only on those queues. This patch does not
require
> > > > exclusive polling queues for virtio_blk. Instead, it
switches between
> > > > irqs and polling when one or more REQ_HIPRI requests are in
flight on a
> > > > virtqueue.
> > > > 
> > > > This is possible because toggling virtqueue notifications is
cheap even
> > > > while the virtqueue is running. NVMe cqs can't do this
because irqs are
> > > > only enabled/disabled at queue creation time.
> > > > 
> > > > This toggling approach requires no configuration. There is
no need to
> > > > dedicate queues ahead of time or to teach users and
orchestration tools
> > > > how to set up polling queues.
> > > > 
> > > > Possible drawbacks of this approach:
> > > > 
> > > > - Hardware virtio_blk implementations may find
virtqueue_disable_cb()
> > > >     expensive since it requires DMA.
> > > 
> > > Note that it's probably not related to the behavior of the
driver but the
> > > design of the event suppression mechanism.
> > > 
> > > Device can choose to ignore the suppression flag and keep sending
> > > interrupts.
> > Yes, it's the design of the event suppression mechanism.
> > 
> > If we use dedicated polling virtqueues then the hardware doesn't
need to
> > check whether interrupts are enabled for each notification. However,
> > there's no mechanism to tell the device that virtqueue interrupts
are
> > permanently disabled. This means that as of today, even dedicated
> > virtqueues cannot suppress interrupts without the device checking for
> > each notification.
> 
> 
> This can be detected via a transport specific way.
> 
> E.g in the case of MSI, VIRTIO_MSI_NO_VECTOR could be a hint.
Nice idea :). Then there would be no need for changes to the hardware
interface. IRQ-less virtqueues is could still be mentioned explicitly in
the VIRTIO spec so that driver/device authors are aware of the
VIRTIO_MSI_NO_VECTOR trick.
> > > > +static int virtblk_poll(struct blk_mq_hw_ctx *hctx)
> > > > +{
> > > > +	struct virtio_blk *vblk = hctx->queue->queuedata;
> > > > +	struct virtqueue *vq =
vblk->vqs[hctx->queue_num].vq;
> > > > +
> > > > +	if (!virtqueue_more_used(vq))
> > > 
> > > I'm not familiar with block polling but what happens if a
buffer is made
> > > available after virtqueue_more_used() returns false here?
> > Can you explain the scenario, I'm not sure I understand?
"buffer is made
> > available" -> are you thinking about additional requests being
submitted
> > by the driver or an in-flight request being marked used by the device?
> 
> 
> Something like that:
> 
> 1) requests are submitted
> 2) poll but virtqueue_more_used() return false
> 3) device make buffer used
> 
> In this case, will poll() be triggered again by somebody else? (I think
> interrupt is disabled here).
Yes. An example blk_poll() user is
fs/block_dev.c:__blkdev_direct_IO_simple():

  qc = submit_bio(&bio);
  for (;;) {
      set_current_state(TASK_UNINTERRUPTIBLE);
      if (!READ_ONCE(bio.bi_private))
          break;
      if (!(iocb->ki_flags & IOCB_HIPRI) ||
          !blk_poll(bdev_get_queue(bdev), qc, true))
          blk_io_schedule();
  }

That's the infinite loop. The block layer implements the generic portion
of blk_poll(). blk_poll() calls mq_ops->poll() (virtblk_poll()).

So in general the polling loop will keep iterating, but there are
exceptions:
1. need_resched() causes blk_poll() to return 0 and blk_io_schedule()
   will be called.
2. blk-mq has a fancier io_poll algorithm that estimates I/O time and
   sleeps until the expected completion time to save CPU cycles. I
   haven't looked into detail at this one.

Both these cases affect existing mq_ops->poll() implementations (e.g.
NVMe). What's new in this patch series is that virtio-blk could have
non-polling requests on the virtqueue which now has irqs disabled. So we
could wait for them.

I think there's an easy solution for this: don't disable virtqueue irqs
when there are non-REQ_HIPRI requests in flight. The disadvantage is
that we'll keep irqs disable in more situations so the performance
improvement may not apply in some configurations.

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20210603/91327b78/attachment.sig>

Virtualization - Jun 2021 - [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

[PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

[PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

[PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()